[[!template id=gitbranch branch=smcv/ready/depends-exact author="[[smcv]]"]]
I'm still trying to optimize ikiwiki for a site using
[[plugins/contrib/album]], and checking which pages depend on which pages
is still taking too long. Here's another go at fixing that, using [[Will]]'s
suggestion from [[todo/should_optimise_pagespecs]]:
A hash, by itself, is not optimal because
the dependency list holds two things: page names and page specs. The hash would
work well for the page names, but you'll still need to iterate through the page specs.
I was thinking of keeping a list and a hash. You use the list for pagespecs
and the hash for individual page names. To make this work you need to adjust the
API so it knows which you're adding. -- [[Will]]
If you have P pages and refresh after changing C of them, where an average
page has E dependencies on exact page names and D other dependencies, this
branch should drop the complexity of checking dependencies from
O(P * (D+E) * C) to O(C + PE + PD*C). Pages that use inline or map have
a large value for E (e.g. one per inlined page) and a small value for D (e.g.
one per inline).
Benchmarking:
Test 1: a wiki with about 3500 pages and 3500 photos, and a change that
touches about 350 pages and 350 photos
Test 2: the docwiki (about 700 objects not excluded by docwiki.setup, mostly
pages), docwiki.setup modified to turn off verbose, and a change that touches
the 98 pages of plugins/*.mdwn
In both tests I rebuilt the wiki with the target ikiwiki version, then touched
the appropriate pages and refreshed.
Results of test 1: without this branch it took around 5:45 to rebuild and
around 5:45 again to refresh (so rebuilding 10% of the pages, then deciding
that most of the remaining 90% didn't need to change, took about as long as
rebuilding everything). With this branch it took 5:47 to rebuild and 1:16
to refresh.
Results of test 2: rebuilding took 14.11s without, 13.96s with; refreshing
three times took 7.29/7.40/7.37s without, 6.62/6.56/6.63s with.
(This benchmarking was actually done with my [[plugins/contrib/album]] branch,
since that's what the huge wiki needs; that branch doesn't alter core code
beyond the ready/depends-exact branch point, so the results should be
equally valid.)
--[[smcv]]
We discussed this on irc; I had some worries that things may have been
switched to add_depends_exact
that were not pure page names. My current
feeling is it's all safe, but who knows. It's easy to miss something.
Which makes me think this is not a good interface.
Why not, instead, make add_depends
smart. If it's passed something
that is clearly a raw page name, it can add it to the exact depends hash.
Else, add it to the pagespec hash. You can tell if it's a pure page name
by matching on $config{wiki_file_regexp}
.
Also I think there may be little optimisation value left in
7227c2debfeef94b35f7d81f42900aa01820caa3, since the "regular" dependency
lists will be much shorter.
Sounds like inline pagenames has an already exstant bug WRT
pages moving, which this should not make worse. Would be good to verify.
Re coding, it would be nice if refresh()
could avoid duplicating
the debug message, etc in the two cases. --[[Joey]]