From 26bf69d17aff4c74dd6c368712091d8c1fc977a6 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" Date: Tue, 6 Apr 2010 00:49:38 +0000 Subject: as seen on IRC --- ...but_not_all_meta_fields_are_stored_escaped.mdwn | 32 ++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn (limited to 'doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn') diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn new file mode 100644 index 000000000..d79318dd8 --- /dev/null +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -0,0 +1,32 @@ +[[!template id=gitbranch branch=smcv/unescaped-meta author="[[Simon_McVittie|smcv]]"]] +(Warning: this branch has not been tested thoroughly.) + +While discussing the [[plugins/meta]] plugin on IRC, Joey pointed out that +it stores most meta fields unescaped, but 'title', 'guid' and 'description' +are special-cased and stored escaped (with numeric XML/HTML entities). This +is to avoid emitting markup in the of a HTML page, or in an RSS/Atom +feed, neither of which are subject to the [[plugins/htmlscrubber]]. + +However, having the meta fields "partially escaped" like this is somewhat +error-prone. Joey suggested that perhaps everything should be stored +unescaped, and the escaping should be done on output; this branch +implements that. + +Points of extra subtlety: + +* The title given to the [[plugins/search]] plugin was previously HTML; + now it's plain text, potentially containing markup characters. I suspect + that that's what Xapian wants anyway (which is why I didn't change it), + but I could be wrong... + +* Page descriptions in the HTML `<head>` were previously double-escaped: + the description was stored escaped with numeric entities, then that was + output with a second layer of escaping! In this branch, I just emit + the page description escaped once, as was presumably the intention. + +* It's safe to apply this change to a wiki and neglect to rebuild it + (assuming I implemented it correctly!), but until the wiki is rebuilt, + titles, descriptions and GUIDs for unchanged pages will appear + double-escaped on any page that inlines them in `quick=yes` mode, and + is rebuilt for some other reason. The failure mode is too much escaping + rather than too little, so it shouldn't be a security problem. -- cgit v1.2.3 From 0ed94696c0a449b425bc319297d39060ee24dcf2 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" <http://smcv.pseudorandom.co.uk/@web> Date: Tue, 6 Apr 2010 00:50:51 +0000 Subject: pages talking about escaping should really be escaped correctly --- doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn') diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn index d79318dd8..cbfcfd6b7 100644 --- a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -4,7 +4,7 @@ While discussing the [[plugins/meta]] plugin on IRC, Joey pointed out that it stores most meta fields unescaped, but 'title', 'guid' and 'description' are special-cased and stored escaped (with numeric XML/HTML entities). This -is to avoid emitting markup in the <title> of a HTML page, or in an RSS/Atom +is to avoid emitting markup in the `<title>` of a HTML page, or in an RSS/Atom feed, neither of which are subject to the [[plugins/htmlscrubber]]. However, having the meta fields "partially escaped" like this is somewhat -- cgit v1.2.3 From 315bcf866c490ae2041cef59f960e8021d52e840 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" <http://smcv.pseudorandom.co.uk/@web> Date: Tue, 6 Apr 2010 00:51:27 +0000 Subject: tag as patch --- doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn | 1 + 1 file changed, 1 insertion(+) (limited to 'doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn') diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn index cbfcfd6b7..6a934d4eb 100644 --- a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -1,4 +1,5 @@ [[!template id=gitbranch branch=smcv/unescaped-meta author="[[Simon_McVittie|smcv]]"]] +[[!tag patch]] (Warning: this branch has not been tested thoroughly.) While discussing the [[plugins/meta]] plugin on IRC, Joey pointed out that -- cgit v1.2.3 From 1f112d570ef235416c4001605e15980b2f628da4 Mon Sep 17 00:00:00 2001 From: "http://smcv.pseudorandom.co.uk/" <http://smcv.pseudorandom.co.uk/@web> Date: Tue, 6 Apr 2010 00:55:54 +0000 Subject: if applied, reverting this would be problematic --- doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn') diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn index 6a934d4eb..8e1ca42e0 100644 --- a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -31,3 +31,9 @@ Points of extra subtlety: double-escaped on any page that inlines them in `quick=yes` mode, and is rebuilt for some other reason. The failure mode is too much escaping rather than too little, so it shouldn't be a security problem. + +* Reverting this change, if applied, is more dangerous; until the wiki is + rebuilt, any titles, descriptions and GUIDs on unchanged pages that + contained markup could appear unescaped on any page that inlines them + in `quick=yes` mode, and is rebuilt for some other reason. The failure + mode here would be too little escaping, i.e. cross-site scripting. -- cgit v1.2.3 From d2c36a6f4b8b4bf30d59c430893a88352ac208fc Mon Sep 17 00:00:00 2001 From: Joey Hess <joey@gnu.kitenet.net> Date: Sat, 10 Apr 2010 15:29:31 -0400 Subject: close --- doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn') diff --git a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn index 8e1ca42e0..587771ba4 100644 --- a/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn +++ b/doc/bugs/some_but_not_all_meta_fields_are_stored_escaped.mdwn @@ -20,6 +20,9 @@ Points of extra subtlety: that that's what Xapian wants anyway (which is why I didn't change it), but I could be wrong... + > AFAICS, this if anything, fixes a bug, xapian definitely expects + > unescaped text here. --[[Joey]] + * Page descriptions in the HTML `<head>` were previously double-escaped: the description was stored escaped with numeric entities, then that was output with a second layer of escaping! In this branch, I just emit @@ -37,3 +40,5 @@ Points of extra subtlety: contained markup could appear unescaped on any page that inlines them in `quick=yes` mode, and is rebuilt for some other reason. The failure mode here would be too little escaping, i.e. cross-site scripting. + +[[!tag done]] -- cgit v1.2.3