summaryrefslogtreecommitdiff
path: root/doc/todo/Improving_the_efficiency_of_match__95__glob.mdwn
blob: 43571ead77346a1c5528360eec0e369958f2de7c (plain)

I've been profiling my IkiWiki to try to improve speed (with many pages makes speed even more important) and I've written a patch to improve the speed of match_glob. This matcher is a good one to improve the speed of, because it gets called so many times.

Here's my patch - please consider it! -- [[KathrynAndersen]]

It seems to me as though changing glob2re to return qr/$re/, and calling memoize(glob2re) next to the other memoize calls, would be a less verbose way to do this? --[[smcv]]

I think so, yeah. Anyway, do you have any benchmark results handy, Kathryn? --[[Joey]]

See below. Also, would it make more sense for glob2re to return qr/^$re$/i rather than qr/$re/? Everything that uses glob2re seems to use $foo =~ /^$re$/i rather than /$re/ so I think that would make sense. -- [[KathrynAndersen]]

Git branch smcv/ka-glob-cache has Kathryn's patch. Git branch smcv/memoize-glob2re does as I suggested, which is less verbose than Kathryn's patch but also not as fast; I'm not sure why, tbh. --[[smcv]]

I think it's because my patch focuses on match_glob while the memoize patch focuses on glob2re, and glob2re is called in filecheck, meta and po as well as in match_glob and match_user; thus the memoized glob2re is dealing with a bigger set of globs to look up, and thus could be just that little bit slower. -- [[KathrynAndersen]]


Benchmarks done with Devel::Profile on the same testbed IkiWiki setup. I'm just showing the start of the profile output, since that's what's relevant.

Before:

After:

Note that the seconds per call for match_glob in the "after" case has gone down by about a third.

K.A.


A second set of benchmarks, done by rebuilding the docwiki at commit f942c2db05e4 like so:

perl -Iblib/lib -d:Profile ikiwiki.in -setup docwiki.setup --no-verbose

The docwiki appears to use fewer glob matches than Kathryn's wiki.

With master:

time elapsed (wall):   29.6970
time running program:  24.6930  (83.15%)
time profiling (est.): 5.0041  (16.85%)
number of calls:       1359180
number of exceptions:  13

%Time    Sec.     #calls   sec/call  F  name
13.62    3.3629     3406   0.000987     Text::Balanced::_match_tagged
10.84    2.6773    79442   0.000034     IkiWiki::PageSpec::match_glob
 3.08    0.7598    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
 3.07    0.7593    29830   0.000025     IkiWiki::bestlink
 2.99    0.7378    10231   0.000072     IkiWiki::PageSpec::match_link

With my smcv/memoize-glob2re branch:

time elapsed (wall):   30.4931
time running program:  25.1248  (82.39%)
time profiling (est.): 5.3683  (17.61%)
number of calls:       1439943
number of exceptions:  13

%Time    Sec.     #calls   sec/call  F  name
13.19    3.3146     3406   0.000973     Text::Balanced::_match_tagged
 8.41    2.1123    79442   0.000027     IkiWiki::PageSpec::match_glob
 3.97    0.9979    86905   0.000011     Memoize::_memoizer
 3.05    0.7654    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
 3.02    0.7576    29830   0.000025     IkiWiki::bestlink

and in a repeated run:

 8.40    2.0905    79442   0.000026     IkiWiki::PageSpec::match_glob

With Kathryn's patch as seen in my smcv/ka-glob-cache branch:

time elapsed (wall):   27.7567
time running program:  22.9941  (82.84%)
time profiling (est.): 4.7627  (17.16%)
number of calls:       1279946
number of exceptions:  13

%Time    Sec.     #calls   sec/call  F  name
14.29    3.2867     3406   0.000965     Text::Balanced::_match_tagged
 7.89    1.8136    79442   0.000023     IkiWiki::PageSpec::match_glob
 3.30    0.7577    59454   0.000013     <anon>:IkiWiki/Plugin/inline.pm:223
 3.24    0.7461    29830   0.000025     IkiWiki::bestlink
 3.19    0.7332      143   0.005127  ?  IkiWiki::pagespec_match_list

and in a repeated run:

 7.84    1.8253    79442   0.000023     IkiWiki::PageSpec::match_glob

--[[smcv]]