summaryrefslogtreecommitdiff
path: root/IkiWiki.pm
diff options
context:
space:
mode:
authorJoey Hess <joey@kodama.kitenet.net>2008-11-12 17:19:41 -0500
committerJoey Hess <joey@kodama.kitenet.net>2008-11-12 17:30:54 -0500
commit716560b7f15b6e15b246c39c11eb8181d91c8662 (patch)
tree5b7a30dd2f18e02b02f064d0a1ab59fe891b6a71 /IkiWiki.pm
parent2c858c9c95e287ebe3740a94f983f6ae9d6fb080 (diff)
check for invalid utf-8, and toss it back to avoid crashes
Since ikiwiki uses open :utf8, perl assumes that files contain valid utf-8. If it turns out to be malformed it may later crash while processing strings read from them, with 'Malformed UTF-8 character (fatal)'. As at least a quick fix, use utf8::valid as soon as data is read, and if it's not valid, call encode_utf8 on the string, thus clearing the utf-8 flag. This may cause follow-on encoding problems, but will avoid this crash, and the input file was broken anyway, so GIGO is a reasonable response. (I looked at calling decode_utf8 after, but it seemed to cause more trouble than it was worth. BTW, use open ':encoding(utf8)' avaoids this problem, but the corrupted data later causes Storable to crash when writing the index.) This is a quick fix, clearly imperfect: - It might be better to explicitly call decode_utf8 when reading files, rather than using the IO layer. - Data read other than by readfile() can still sneak in bad utf-8. While ikiwiki does very little file input not using it, stdin for the CGI would be one way.
Diffstat (limited to 'IkiWiki.pm')
-rw-r--r--IkiWiki.pm4
1 files changed, 4 insertions, 0 deletions
diff --git a/IkiWiki.pm b/IkiWiki.pm
index 5e21e7090..735dc97b1 100644
--- a/IkiWiki.pm
+++ b/IkiWiki.pm
@@ -721,6 +721,10 @@ sub readfile ($;$$) { #{{{
binmode($in) if ($binary);
return \*$in if $wantfd;
my $ret=<$in>;
+ # check for invalid utf-8, and toss it back to avoid crashes
+ if (! utf8::valid($ret)) {
+ $ret=encode_utf8($ret);
+ }
close $in || error("failed to read $file: $!");
return $ret;
} #}}}