I think there is a problem in my "dependency graph". As an example,
here is the index
ikiwiki generated for my site
(note that the site changed since this index was generated).
Some HUGE dependencies appear, clearly non optimal, like
depends = A| B | A | C | A | D | A | E | A | F | A | G | ....
or
depends= A | B | C | D | A | B | C | D | A | B | C | D | ....
Couldn't isolate the cause, but some sources for this problem may be:
- related to the img module
- easily observable in my sire because one of my pages includes 80 resized images
Other special things in my templates and site:
- a sidebar with [[!include pages="notes/*" template=foo]] while notes.mdwn has
a [[!include pages="notes/*"]] and uses the sidebar; removed it, doesn't change
- a template (biblio.tmpl) calling the "img" plugin with a template parameter as the
image filename; removed it, doesn't change
- some strange games with tags whose page calls a "map" directive to show other tags
shile tags are also used in tagclouds (in the sidebar and in the main pages)
- ...
I observed these problems (same kind, I didn't check in details) on
- ikiwiki 2.00gpa1 + v5.8.4 + Debian 3.1
- ikiwiki 2.3 + v5.8.8 + Ubuntu 7.04
I can think about reducung the size of my wiki source and making it available online for analysis.
-- NicolasLimare
As long as these dependencies don't grow over time (ie, when a page is
edited and nothing changed that should add a dependency), I wouldn't
worry about them. There are many things that can cause non-optimal
dependencies to be recorded. For one thing, if you inline something, ikiwiki
creates a dependency like:
(PageSpec) or (file1 or file2 or file3 ...)
Where fileN are all the files that the PageSpec currently matches. (This
is ncessary to detect when a currently inlined file is deleted, and know
the inlining page needs an update.) Now consider what it does if you have
a single page with two inline statements, that inline the same set of
stuff twice:
((PageSpec) or (file1 or file2 or file3 ...) or (PageSpec) or (file1 or file2 or file3 ...)
Clearly non-optimal, indeed.
Ikiwiki doesn't bother to simplify complex PageSpecs
because it's difficult to do, and because all they use is some disk
space. Consider what ikiwiki uses these dependencies for.
All it wants to know is: does the PageSpec for this page it's considering
rebuilding match any of the pages that have changed? Determining this is
a simple operation -- the PageSpec is converted to perl code. The perl
code is run.
So the total impact of an ugly dependency like this is:
- Some extra data read/written to disk.
- Some extra space in memory.
- A bit more data for the PageSpec translation code to handle. But that
code is quite fast.
- Typically one extra function call when the generated perl code is run.
Ie, when the expression on the left-hand side fails, which typically
happens after one (inexpensive) function call, it has to check
the identical expression on the right hand side.
So this is at best a wishlist todo item, not a bug. A PageSpec simplifier
(or improved pagespec_merge()
function) could be written and improve
ikiwiki's memory and disk usage, but would it actually speed it up any?
We'd have to see the code to the simplifier to know.
--[[Joey]]
[[!template id=gitbranch branch=smcv/ready/optimize-depends author="[[smcv]]"]]
I've been looking at optimizing ikiwiki for a site using
[[plugins/contrib/album]] (which produces a lot of pages) and it seems
that checking which pages depend on which pages does take a significant
amount of time. The optimize-depends branch in my git repository
avoids using pagespec_merge()
for this (indeed it's no longer used
at all), and instead represents dependencies as a list of pagespecs
rather than a single pagespec. This does turn out to be faster, although
not as much as I'd like. --[[smcv]]
I just wanted to note that there is a whole long discussion of dependencies and pagespecs on the [[todo/tracking_bugs_with_dependencies]] page. -- [[Will]]
Yeah, I had a look at that (as the only other mention of pagespec_merge
).
I think I might have solved some of the problems mentioned there,
actually - pagespec_merge
no longer needs to exist in my branch (although
I haven't actually deleted it), because the "or" operation is now done in
the Perl code, rather than by merging pagespecs and translating. --[[smcv]]
[[!template id=gitbranch branch=smcv/ready/remove-pagespec-merge author="[[smcv]]"]]
I've now added a patch to the end of that branch that deletes
pagespec_merge
almost entirely (we do need to keep a copy around, in
ikiwiki-transition, but that copy doesn't have to be optimal or support
future features like [[tracking_bugs_with_dependencies]]). --[[smcv]]
Some questions on your optimize-depends branch. --[[Joey]]
In saveindex it still or'd together the depends list, but the {depends}
field seems only useful for backwards compatability (ie, ikiwiki-transition
uses it still), and otherwise just bloats the index.
If it's acceptable to declare that downgrading IkiWiki requires a complete
rebuild, I'm happy with that. I'd prefer to keep the (simple form of the)
transition done automatically during a load/save cycle, rather than
requiring ikiwiki-transition to be run; we should probably say in NEWS
that the performance increase won't fully apply until the next
rebuild. --[[smcv]]
It is acceptable not to support downgrades.
I don't think we need a NEWS file update since any sort of refresh,
not just a full rebuild, will cause the indexdb to be loaded and saved,
enabling the optimisation. --[[Joey]]
Is an array the right data structure? add_depends
has to loop through the
array to avoid dups, it would be better if a hash were used there. Since
inline (and other plugins) explicitly add all linked pages, each as a
separate item, the list can get rather long, and that single add_depends
loop has suddenly become O(N^2) to the number of pages, which is something
to avoid..
I was also thinking about this (I've been playing with some stuff based on the
remove-pagespec-merge
branch). A hash, by itself, is not optimal because
the dependency list holds two things: page names and page specs. The hash would
work well for the page names, but you'll still need to iterate through the page specs.
I was thinking of keeping a list and a hash. You use the list for pagespecs
and the hash for individual page names. To make this work you need to adjust the
API so it knows which you're adding. -- [[Will]]
I wasn't thinking about a lookup hash, just a dedup hash, FWIW.
--[[Joey]]
I was under the impression from previous code review that you preferred
to represent unordered sets as lists, rather than hashes with dummy
values. If I was wrong, great, I'll fix that and it'll probably go
a bit faster. --[[smcv]]
It depends, really. And it'd certianly make sense to benchmark such a
change. --[[Joey]]
Also, since a lot of places are calling add_depends in a loop, it probably
makes sense to just make it accept a list of dependencies to add. It'll be
marginally faster, probably, and should allow for better optimisation
when adding a lot of depends at once.
That'd be an API change; perhaps marginally faster, but I don't
see how it would allow better optimisation if we're de-duplicating
anyway? --[[smcv]]
Well, I was thinking that it might be sufficient to build a %seen
hash of dependencies inside add_depends
, if the places that call
it lots were changed to just call it once. Of course the only way to
tell is benchmarking. --[[Joey]]
In Render.pm, we now have a triply nested loop, which is a bit
scary for efficiency. It seems there should be a way to
rework this code so it can use the optimised pagespec_match_list
,
and/or hoist some of the inner loop calculations (like the pagename
)
out.
I don't think the complexity is any greater than it was: I've just
moved one level of "loop" out of the generated Perl, to be
in visible code. I'll see whether some of it can be hoisted, though.
--[[smcv]]
The call to pagename
is the only part I can see that's clearly
run more often than before. That function is pretty inexpensive, but..
--[[Joey]]
Very good catch on img/meta using the wrong dependency; verified in the wild!
(I've cherry-picked those bug fixes.)
[[!tag wishlist patch patch/core]]