From 17a19994331fcd720698cd72e27293a15d9cb338 Mon Sep 17 00:00:00 2001 From: HenrikBrixAndersen Date: Tue, 29 Jul 2008 16:51:18 -0400 Subject: Obsolete templates/estseek.conf --- doc/bugs/Obsolete_templates__47__estseek.conf.mdwn | 1 + 1 file changed, 1 insertion(+) create mode 100644 doc/bugs/Obsolete_templates__47__estseek.conf.mdwn (limited to 'doc/bugs') diff --git a/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn b/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn new file mode 100644 index 000000000..beee5aa08 --- /dev/null +++ b/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn @@ -0,0 +1 @@ +The templates/estseek.conf file can safely be removed now that ikiwiki has switched to using xapian-omega. -- cgit v1.2.3 From 3b72c23673e858338ad4791d99bacdef7d028608 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 29 Jul 2008 16:55:44 -0400 Subject: rm --- doc/bugs/Obsolete_templates__47__estseek.conf.mdwn | 2 ++ 1 file changed, 2 insertions(+) (limited to 'doc/bugs') diff --git a/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn b/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn index beee5aa08..99330a115 100644 --- a/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn +++ b/doc/bugs/Obsolete_templates__47__estseek.conf.mdwn @@ -1 +1,3 @@ The templates/estseek.conf file can safely be removed now that ikiwiki has switched to using xapian-omega. + +> Thanks for the reminder, [[done]] --[[Joey]] -- cgit v1.2.3 From 0a176059bb55acfc201c7ca4705da849831adb8e Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" Date: Wed, 30 Jul 2008 17:25:36 -0400 Subject: --- .../HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn (limited to 'doc/bugs') diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn new file mode 100644 index 000000000..8bf97910d --- /dev/null +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -0,0 +1,10 @@ +If a blog entry contains a HTML named entity, such as the `—` produced by [[plugins/rst]] for blockquote citations, it's pasted into the Atom feed as-is. However, Atom feeds don't have a DTD, so named entities beyond `<`, `>`, `"`, `&` and `'` aren't well-formed XML. + +Possible solutions: + +* Put HTML in Atom feeds as type="html" (and use ESCAPE=HTML) instead + +* Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, + like in the re-escape-entities branch in my repository: http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e + +(Also, the HTML in RSS feeds would probably get better interoperability if it was escaped with ESCAPE=HTML rather than being in a CDATA section?) -- cgit v1.2.3 From fe482079cc2d7f0bd2bad6f21bc91e3ff82308be Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" Date: Wed, 30 Jul 2008 17:26:30 -0400 Subject: --- doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc/bugs') diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn index 8bf97910d..09ff0e335 100644 --- a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -5,6 +5,6 @@ Possible solutions: * Put HTML in Atom feeds as type="html" (and use ESCAPE=HTML) instead * Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, - like in the re-escape-entities branch in my repository: http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e + like in the re-escape-entities branch in my repository ([diff here](http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e)) (Also, the HTML in RSS feeds would probably get better interoperability if it was escaped with ESCAPE=HTML rather than being in a CDATA section?) -- cgit v1.2.3 From 17dd9d6212bce16115519506c39d8ca9fca53d0c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 31 Jul 2008 15:52:20 -0400 Subject: rename --- ...n_Atom__44___RSS_feeds_don__39__t_validate.mdwn | 36 ---------------------- ...ended_encoding_of_entities_for_some_fields.mdwn | 36 ++++++++++++++++++++++ 2 files changed, 36 insertions(+), 36 deletions(-) delete mode 100644 doc/bugs/HTML-escaped_titles_in_Atom__44___RSS_feeds_don__39__t_validate.mdwn create mode 100644 doc/bugs/rss_feeds_do_not_use_recommended_encoding_of_entities_for_some_fields.mdwn (limited to 'doc/bugs') diff --git a/doc/bugs/HTML-escaped_titles_in_Atom__44___RSS_feeds_don__39__t_validate.mdwn b/doc/bugs/HTML-escaped_titles_in_Atom__44___RSS_feeds_don__39__t_validate.mdwn deleted file mode 100644 index 48c168997..000000000 --- a/doc/bugs/HTML-escaped_titles_in_Atom__44___RSS_feeds_don__39__t_validate.mdwn +++ /dev/null @@ -1,36 +0,0 @@ -The Atom and RSS templates use `ESCAPE=HTML` in the title elements. However, HTML-escaped characters aren't valid according to . - -Removing `ESCAPE=HTML` works fine, but I haven't checked to see if there are any characters it won't work for. - -For Atom, at least, I believe adding `type="xhtml"` to the title element will work. I don't think there's an equivalent for RSS. - -> Removing the ESCAPE=HTML will not work, feed validator hates that just as -> much. It wants rss feeds to use a specific style of escaping that happens -> to work in some large percentage of all rss consumers. (Most of which are -> broken). -> -> There's also no actual spec about how this should work. -> -> This will be a total beast to fix. The current design is very clean in -> that all (well, nearly all) xml/html escaping is pushed back to the -> templates. This allows plugins to substitute fields in the templates -> without worrying about getting escaping right in the plugins -- and a -> plugin doesn't even know what kind of template is being filled out when -> it changes a field's value, so it can't do different types of escaping -> for different templates. -> -> The only reasonable approach seems to be extending HTML::Template with an -> ESCAPE=RSS and using that. Unfortunately its design does not allow doing -> so without hacking its code in several places. I've contacted its author -> to see if he'd accept such a patch. -> -> (A secondary bug is that using meta title currently results in unnecessry -> escaping of the title value before it reaches the template. This makes -> the escaping issues show up much more than they need to, since lots more -> characters are currently being double-escaped in the rss.) -> -> --[[Joey]] - -> Update: Ok, I've fixed this for titles, as a special case, but the -> underlying problem remains for other fields in rss feeds (such as -> author), so I'm leaving this bug report open. --[[Joey]] diff --git a/doc/bugs/rss_feeds_do_not_use_recommended_encoding_of_entities_for_some_fields.mdwn b/doc/bugs/rss_feeds_do_not_use_recommended_encoding_of_entities_for_some_fields.mdwn new file mode 100644 index 000000000..48c168997 --- /dev/null +++ b/doc/bugs/rss_feeds_do_not_use_recommended_encoding_of_entities_for_some_fields.mdwn @@ -0,0 +1,36 @@ +The Atom and RSS templates use `ESCAPE=HTML` in the title elements. However, HTML-escaped characters aren't valid according to . + +Removing `ESCAPE=HTML` works fine, but I haven't checked to see if there are any characters it won't work for. + +For Atom, at least, I believe adding `type="xhtml"` to the title element will work. I don't think there's an equivalent for RSS. + +> Removing the ESCAPE=HTML will not work, feed validator hates that just as +> much. It wants rss feeds to use a specific style of escaping that happens +> to work in some large percentage of all rss consumers. (Most of which are +> broken). +> +> There's also no actual spec about how this should work. +> +> This will be a total beast to fix. The current design is very clean in +> that all (well, nearly all) xml/html escaping is pushed back to the +> templates. This allows plugins to substitute fields in the templates +> without worrying about getting escaping right in the plugins -- and a +> plugin doesn't even know what kind of template is being filled out when +> it changes a field's value, so it can't do different types of escaping +> for different templates. +> +> The only reasonable approach seems to be extending HTML::Template with an +> ESCAPE=RSS and using that. Unfortunately its design does not allow doing +> so without hacking its code in several places. I've contacted its author +> to see if he'd accept such a patch. +> +> (A secondary bug is that using meta title currently results in unnecessry +> escaping of the title value before it reaches the template. This makes +> the escaping issues show up much more than they need to, since lots more +> characters are currently being double-escaped in the rss.) +> +> --[[Joey]] + +> Update: Ok, I've fixed this for titles, as a special case, but the +> underlying problem remains for other fields in rss feeds (such as +> author), so I'm leaving this bug report open. --[[Joey]] -- cgit v1.2.3 From 33cd89c68b31c443f4683fb9d45e68ddd9a6daa9 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 31 Jul 2008 16:09:26 -0400 Subject: questions --- doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'doc/bugs') diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn index 09ff0e335..d89fe0502 100644 --- a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -4,7 +4,11 @@ Possible solutions: * Put HTML in Atom feeds as type="html" (and use ESCAPE=HTML) instead +> Are there any particular downsides to doing that ..? --[[Joey]] + * Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, like in the re-escape-entities branch in my repository ([diff here](http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e)) (Also, the HTML in RSS feeds would probably get better interoperability if it was escaped with ESCAPE=HTML rather than being in a CDATA section?) + +> Can't see why? --[[Joey]] -- cgit v1.2.3 From 53001f901106ec1846eded1de336083537b7f160 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" Date: Thu, 31 Jul 2008 17:26:49 -0400 Subject: --- doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'doc/bugs') diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn index d89fe0502..7ba95fb4b 100644 --- a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -6,9 +6,15 @@ Possible solutions: > Are there any particular downsides to doing that ..? --[[Joey]] +>> It's the usual XHTML/HTML distinction. type="html" will always be interpreted as "tag soup", I believe - this may lead to it being rendered differently in some browsers. In general ikiwiki seems to claim to produce XHTML (at least, the default page.tmpl makes it claim to be XHTML Strict). On the other hand, this is a much simpler solution... see escape-feed-html branch in my repository, which I'm now using instead --[[smcv]] + * Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, like in the re-escape-entities branch in my repository ([diff here](http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e)) +>> I can see why you think this is excessively complex! --[[smcv]] + (Also, the HTML in RSS feeds would probably get better interoperability if it was escaped with ESCAPE=HTML rather than being in a CDATA section?) > Can't see why? --[[Joey]] + +>> For a start, `]]>` in content wouldn't break the feed :-) but I was really thinking of non-XML, non-SGML parsers (more tag soup) that don't understand CDATA (I've suffered from CDATA damage when feeding generated code through gtkdoc, for instance). --[[smcv]] -- cgit v1.2.3 From 973e49e31dac3fc3e6642acac126e4140429b205 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 31 Jul 2008 18:49:40 -0400 Subject: response --- .../HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'doc/bugs') diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn index 7ba95fb4b..6c5c79672 100644 --- a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -8,6 +8,10 @@ Possible solutions: >> It's the usual XHTML/HTML distinction. type="html" will always be interpreted as "tag soup", I believe - this may lead to it being rendered differently in some browsers. In general ikiwiki seems to claim to produce XHTML (at least, the default page.tmpl makes it claim to be XHTML Strict). On the other hand, this is a much simpler solution... see escape-feed-html branch in my repository, which I'm now using instead --[[smcv]] +>>> Of course, browsers [probably don't treat xhtml pages as xhtml anyway](http://hixie.ch/advocacy/xhtml). +>>> And the same content will be treated as html (probably as tag soup) if it's +>>> in a rss feed. + * Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, like in the re-escape-entities branch in my repository ([diff here](http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e)) @@ -18,3 +22,12 @@ Possible solutions: > Can't see why? --[[Joey]] >> For a start, `]]>` in content wouldn't break the feed :-) but I was really thinking of non-XML, non-SGML parsers (more tag soup) that don't understand CDATA (I've suffered from CDATA damage when feeding generated code through gtkdoc, for instance). --[[smcv]] + +>>> FWIW, the htmlscrubber escapes the `]]>`. (Wouldn't hurt to make that +>>> more robust tho.) +>>> +>>> ikiwiki has used CDATA from the beginning -- this is the first time +>>> I've heard about rss 2.0 parsers that didn't know about CDATA. +>>> +>>> (IIRC, I used CDATA because the result is more space-efficient and less +>>> craptacular to read manually.) -- cgit v1.2.3 From 71eb56bcac199a31e02301852b210bf99fedfd2c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 31 Jul 2008 18:52:30 -0400 Subject: merged --- debian/changelog | 4 ++++ doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn | 2 ++ 2 files changed, 6 insertions(+) (limited to 'doc/bugs') diff --git a/debian/changelog b/debian/changelog index 7fd135700..af94c99c5 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,6 +2,10 @@ ikiwiki (2.56) UNRELEASED; urgency=low * autoindex: New plugin that generates missing index pages. (Sponsored by The TOVA Company.) + * Escape HTML is rss and atom feeds instead of respectively using CDATA and + treating it as XHTML. This avoids problems with escaping the end of the + CDATA when the htmlscrubber is not used, and it avoids problems with atom + XHTML using named entity references that are not in the atom DTD. (Simon McVittie) -- Joey Hess Tue, 29 Jul 2008 15:53:26 -0400 diff --git a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn index 6c5c79672..d2f8ca3dc 100644 --- a/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn +++ b/doc/bugs/HTML_inlined_into_Atom_not_necessarily_well-formed.mdwn @@ -12,6 +12,8 @@ Possible solutions: >>> And the same content will be treated as html (probably as tag soup) if it's >>> in a rss feed. +>>> [[merged|done]] + * Keep HTML in Atom feeds as type="xhtml", but replace named entities with numeric ones, like in the re-escape-entities branch in my repository ([diff here](http://git.debian.org/?p=users/smcv/ikiwiki.git;a=commitdiff;h=c0eb041c65d0653bacf0d4acb7a602e9bda8888e)) -- cgit v1.2.3