diff options
author | Joey Hess <joey@kodama.kitenet.net> | 2008-05-29 20:47:57 -0400 |
---|---|---|
committer | Joey Hess <joey@kodama.kitenet.net> | 2008-05-29 20:47:57 -0400 |
commit | 3a23cdde7d5621bc8d723d2226945ee5243115f3 (patch) | |
tree | a40ba2f133270b823a3f6fbd0b5be52dafa55817 /doc/bugs | |
parent | 9d93029f010c5eaa73ecbce8eb887d9132b7311a (diff) | |
parent | 62e1c9238a1fe3f964f6cfe3a345647141b7d050 (diff) |
Merge branch 'master' of ssh://git.ikiwiki.info/srv/git/ikiwiki.info
Diffstat (limited to 'doc/bugs')
-rw-r--r-- | doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn new file mode 100644 index 000000000..7e9bf84e2 --- /dev/null +++ b/doc/bugs/__38__uuml__59___in_markup_makes_ikiwiki_not_un-escape_HTML_at_all.mdwn @@ -0,0 +1,35 @@ +I'm experimenting with using Ikiwiki as a feed aggregator. + +The Planet Ubuntu RSS 2.0 feed (<http://planet.ubuntu.com/rss20.xml>) as of today +has someone whose name contains the character u-with-umlaut. In HTML 4.0, this is +specified as the character entity uuml. Ikiwiki 2.47 running on Debian etch does +not seem to understand that entity, and decides not to un-escape any markup in +the feed. This makes the feed hard to read. + +The following is the test input: + + <rss version="2.0"> + <channel> + <title>testfeed</title> + <link>http://example.com/</link> + <language>en</language> + <description>example</description> + <item> + <title>ü</title> + <guid>http://example.com</guid> + <link>http://example.com</link> + <description>foo</description> + <pubDate>Tue, 27 May 2008 22:42:42 +0000</pubDate> + </item> + </channel> + </rss> + +When I feed this to ikiwiki, it complains: +"processed ok at 2008-05-29 09:44:14 (invalid UTF-8 stripped from feed) (feed entities escaped" + +Note also that the test input contains only pure ASCII, no UTF-8 at all. + +If I remove the ampersand in the title, ikiwiki has no problem. However, the entity is +valid HTML, so it would be good for ikiwiki to understand it. At the minimum, stripping +the offending entity but un-escaping the rest seems like a reasonable thing to do, +unless that has security implications. |