From 0c8cd35934043059fe028e1a8e734533bc08537b Mon Sep 17 00:00:00 2001 From: John MacFarlane Date: Mon, 9 Sep 2019 08:20:13 -0700 Subject: Move "Backslash escapes" and "Character references" to "Preliminaries." It was confusing having them in the "Inline" section, since they also affect some block contexts (e.g. reference link definitions). Closes #600. --- spec.txt | 6774 +++++++++++++++++++++++++++++++------------------------------- 1 file changed, 3388 insertions(+), 3386 deletions(-) (limited to 'spec.txt') diff --git a/spec.txt b/spec.txt index 84b97af..4571a95 100644 --- a/spec.txt +++ b/spec.txt @@ -478,4591 +478,4433 @@ bar For security reasons, the Unicode character `U+0000` must be replaced with the REPLACEMENT CHARACTER (`U+FFFD`). -# Blocks and inlines - -We can think of a document as a sequence of -[blocks](@)---structural elements like paragraphs, block -quotations, lists, headings, rules, and code blocks. Some blocks (like -block quotes and list items) contain other blocks; others (like -headings and paragraphs) contain [inline](@) content---text, -links, emphasized text, images, code spans, and so on. -## Precedence +## Backslash escapes -Indicators of block structure always take precedence over indicators -of inline structure. So, for example, the following is a list with -two items, not a list with one item containing a code span: +Any ASCII punctuation character may be backslash-escaped: ```````````````````````````````` example -- `one -- two` +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ . - +

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

```````````````````````````````` -This means that parsing can proceed in two steps: first, the block -structure of the document can be discerned; second, text lines inside -paragraphs, headings, and other block constructs can be parsed for inline -structure. The second step requires information about link reference -definitions that will be available only at the end of the first -step. Note that the first step requires processing lines in sequence, -but the second can be parallelized, since the inline parsing of -one block element does not affect the inline parsing of any other. - -## Container blocks and leaf blocks - -We can divide blocks into two types: -[container blocks](@), -which can contain other blocks, and [leaf blocks](@), -which cannot. - -# Leaf blocks +Backslashes before other characters are treated as literal +backslashes: -This section describes the different kinds of leaf block that make up a -Markdown document. +```````````````````````````````` example +\→\A\a\ \3\φ\« +. +

\→\A\a\ \3\φ\«

+```````````````````````````````` -## Thematic breaks -A line consisting of 0-3 spaces of indentation, followed by a sequence -of three or more matching `-`, `_`, or `*` characters, each followed -optionally by any number of spaces or tabs, forms a -[thematic break](@). +Escaped characters are treated as regular characters and do +not have their usual Markdown meanings: ```````````````````````````````` example -*** ---- -___ +\*not emphasized* +\
not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a heading +\[foo]: /url "not a reference" +\ö not a character entity . -
-
-
+

*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a heading +[foo]: /url "not a reference" +&ouml; not a character entity

```````````````````````````````` -Wrong characters: +If a backslash is itself escaped, the following character is not: ```````````````````````````````` example -+++ +\\*emphasis* . -

+++

+

\emphasis

```````````````````````````````` +A backslash at the end of the line is a [hard line break]: + ```````````````````````````````` example -=== +foo\ +bar . -

===

+

foo
+bar

```````````````````````````````` -Not enough characters: +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: ```````````````````````````````` example --- -** -__ +`` \[\` `` . -

-- -** -__

+

\[\`

```````````````````````````````` -One to three spaces indent are allowed: - ```````````````````````````````` example - *** - *** - *** + \[\] . -
-
-
+
\[\]
+
```````````````````````````````` -Four spaces is too many: - ```````````````````````````````` example - *** +~~~ +\[\] +~~~ . -
***
+
\[\]
 
```````````````````````````````` ```````````````````````````````` example -Foo - *** + . -

Foo -***

+

http://example.com?find=\*

```````````````````````````````` -More than three characters may be used: - ```````````````````````````````` example -_____________________________________ + . -
+
```````````````````````````````` -Spaces are allowed between the characters: +But they work in all other contexts, including URLs and link titles, +link references, and [info strings] in [fenced code blocks]: ```````````````````````````````` example - - - - +[foo](/bar\* "ti\*tle") . -
+

foo

```````````````````````````````` ```````````````````````````````` example - ** * ** * ** * ** +[foo] + +[foo]: /bar\* "ti\*tle" . -
+

foo

```````````````````````````````` ```````````````````````````````` example -- - - - +``` foo\+bar +foo +``` . -
+
foo
+
```````````````````````````````` -Spaces are allowed at the end: +## Entity and numeric character references + +Valid HTML entity references and numeric character references +can be used in place of the corresponding Unicode character, +with the following exceptions: + +- Entity and character references are not recognized in code + blocks and code spans. + +- Entity and character references cannot stand in place of + special characters that define structural elements in + CommonMark. For example, although `*` can be used + in place of a literal `*` character, `*` cannot replace + `*` in emphasis delimiters, bullet list markers, or thematic + breaks. + +Conforming CommonMark parsers need not store information about +whether a particular character was represented in the source +using a Unicode character or an entity reference. + +[Entity references](@) consist of `&` + any of the valid +HTML5 entity names + `;`. The +document +is used as an authoritative source for the valid entity +references and their corresponding code points. ```````````````````````````````` example -- - - - +  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸ . -
+

  & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸

```````````````````````````````` -However, no other characters may occur in the line: +[Decimal numeric character +references](@) +consist of `&#` + a string of 1--7 arabic digits + `;`. A +numeric character reference is parsed as the corresponding +Unicode character. Invalid Unicode code points will be replaced by +the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons, +the code point `U+0000` will also be replaced by `U+FFFD`. ```````````````````````````````` example -_ _ _ _ a +# Ӓ Ϡ � +. +

# Ӓ Ϡ �

+```````````````````````````````` -a------ ----a--- +[Hexadecimal numeric character +references](@) consist of `&#` + +either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal). + +```````````````````````````````` example +" ആ ಫ . -

_ _ _ _ a

-

a------

-

---a---

+

" ആ ಫ

```````````````````````````````` -It is required that all of the [non-whitespace characters] be the same. -So, this is not a thematic break: +Here are some nonentities: ```````````````````````````````` example - *-* +  &x; &#; &#x; +� +&#abcdef0; +&ThisIsNotDefined; &hi?; . -

-

+

&nbsp &x; &#; &#x; +&#87654321; +&#abcdef0; +&ThisIsNotDefined; &hi?;

```````````````````````````````` -Thematic breaks do not need blank lines before or after: +Although HTML5 does accept some entity references +without a trailing semicolon (such as `©`), these are not +recognized here, because it makes the grammar too ambiguous: ```````````````````````````````` example -- foo -*** -- bar +© . -
    -
  • foo
  • -
-
-
    -
  • bar
  • -
+

&copy

```````````````````````````````` -Thematic breaks can interrupt a paragraph: +Strings that are not on the list of HTML5 named entities are not +recognized as entity references either: ```````````````````````````````` example -Foo -*** -bar +&MadeUpEntity; . -

Foo

-
-

bar

+

&MadeUpEntity;

```````````````````````````````` -If a line of dashes that meets the above conditions for being a -thematic break could also be interpreted as the underline of a [setext -heading], the interpretation as a -[setext heading] takes precedence. Thus, for example, -this is a setext heading, not a paragraph followed by a thematic break: +Entity and numeric character references are recognized in any +context besides code spans or code blocks, including +URLs, [link titles], and [fenced code block][] [info strings]: ```````````````````````````````` example -Foo ---- -bar + . -

Foo

-

bar

+
```````````````````````````````` -When both a thematic break and a list item are possible -interpretations of a line, the thematic break takes precedence: - ```````````````````````````````` example -* Foo -* * * -* Bar +[foo](/föö "föö") . -
    -
  • Foo
  • -
-
-
    -
  • Bar
  • -
+

foo

```````````````````````````````` -If you want a thematic break in a list item, use a different bullet: - ```````````````````````````````` example -- Foo -- * * * +[foo] + +[foo]: /föö "föö" . -
    -
  • Foo
  • -
  • -
    -
  • -
+

foo

```````````````````````````````` -## ATX headings +```````````````````````````````` example +``` föö +foo +``` +. +
foo
+
+```````````````````````````````` -An [ATX heading](@) -consists of a string of characters, parsed as inline content, between an -opening sequence of 1--6 unescaped `#` characters and an optional -closing sequence of any number of unescaped `#` characters. -The opening sequence of `#` characters must be followed by a -[space] or by the end of line. The optional closing sequence of `#`s must be -preceded by a [space] and may be followed by spaces only. The opening -`#` character may be indented 0-3 spaces. The raw contents of the -heading are stripped of leading and trailing spaces before being parsed -as inline content. The heading level is equal to the number of `#` -characters in the opening sequence. -Simple headings: +Entity and numeric character references are treated as literal +text in code spans and code blocks: ```````````````````````````````` example -# foo -## foo -### foo -#### foo -##### foo -###### foo +`föö` . -

foo

-

foo

-

foo

-

foo

-
foo
-
foo
+

f&ouml;&ouml;

```````````````````````````````` -More than six `#` characters is not a heading: - ```````````````````````````````` example -####### foo + föfö . -

####### foo

+
f&ouml;f&ouml;
+
```````````````````````````````` -At least one space is required between the `#` characters and the -heading's contents, unless the heading is empty. Note that many -implementations currently do not require the space. However, the -space was required by the -[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), -and it helps prevent things like the following from being parsed as -headings: +Entity and numeric character references cannot be used +in place of symbols indicating structure in CommonMark +documents. ```````````````````````````````` example -#5 bolt - -#hashtag +*foo* +*foo* . -

#5 bolt

-

#hashtag

+

*foo* +foo

```````````````````````````````` +```````````````````````````````` example +* foo -This is not a heading, because the first `#` is escaped: +* foo +. +

* foo

+
    +
  • foo
  • +
+```````````````````````````````` ```````````````````````````````` example -\## foo +foo bar . -

## foo

+

foo + +bar

```````````````````````````````` +```````````````````````````````` example + foo +. +

→foo

+```````````````````````````````` -Contents are parsed as inlines: ```````````````````````````````` example -# foo *bar* \*baz\* +[a](url "tit") . -

foo bar *baz*

+

[a](url "tit")

```````````````````````````````` -Leading and trailing [whitespace] is ignored in parsing inline content: + +# Blocks and inlines + +We can think of a document as a sequence of +[blocks](@)---structural elements like paragraphs, block +quotations, lists, headings, rules, and code blocks. Some blocks (like +block quotes and list items) contain other blocks; others (like +headings and paragraphs) contain [inline](@) content---text, +links, emphasized text, images, code spans, and so on. + +## Precedence + +Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span: ```````````````````````````````` example -# foo +- `one +- two` . -

foo

+
    +
  • `one
  • +
  • two`
  • +
```````````````````````````````` -One to three spaces indentation are allowed: +This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headings, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other. + +## Container blocks and leaf blocks + +We can divide blocks into two types: +[container blocks](@), +which can contain other blocks, and [leaf blocks](@), +which cannot. + +# Leaf blocks + +This section describes the different kinds of leaf block that make up a +Markdown document. + +## Thematic breaks + +A line consisting of 0-3 spaces of indentation, followed by a sequence +of three or more matching `-`, `_`, or `*` characters, each followed +optionally by any number of spaces or tabs, forms a +[thematic break](@). ```````````````````````````````` example - ### foo - ## foo - # foo +*** +--- +___ . -

foo

-

foo

-

foo

+
+
+
```````````````````````````````` -Four spaces are too much: +Wrong characters: ```````````````````````````````` example - # foo ++++ . -
# foo
-
+

+++

```````````````````````````````` ```````````````````````````````` example -foo - # bar +=== . -

foo -# bar

+

===

```````````````````````````````` -A closing sequence of `#` characters is optional: +Not enough characters: ```````````````````````````````` example -## foo ## - ### bar ### +-- +** +__ . -

foo

-

bar

+

-- +** +__

```````````````````````````````` -It need not be the same length as the opening sequence: +One to three spaces indent are allowed: ```````````````````````````````` example -# foo ################################## -##### foo ## + *** + *** + *** . -

foo

-
foo
+
+
+
```````````````````````````````` -Spaces are allowed after the closing sequence: +Four spaces is too many: ```````````````````````````````` example -### foo ### + *** . -

foo

+
***
+
```````````````````````````````` -A sequence of `#` characters with anything but [spaces] following it -is not a closing sequence, but counts as part of the contents of the -heading: - ```````````````````````````````` example -### foo ### b +Foo + *** . -

foo ### b

+

Foo +***

```````````````````````````````` -The closing sequence must be preceded by a space: +More than three characters may be used: ```````````````````````````````` example -# foo# +_____________________________________ . -

foo#

+
```````````````````````````````` -Backslash-escaped `#` characters do not count as part -of the closing sequence: +Spaces are allowed between the characters: ```````````````````````````````` example -### foo \### -## foo #\## -# foo \# + - - - . -

foo ###

-

foo ###

-

foo #

+
```````````````````````````````` -ATX headings need not be separated from surrounding content by blank -lines, and they can interrupt paragraphs: - ```````````````````````````````` example -**** -## foo -**** + ** * ** * ** * ** .
-

foo

-
```````````````````````````````` ```````````````````````````````` example -Foo bar -# baz -Bar foo +- - - - . -

Foo bar

-

baz

-

Bar foo

+
```````````````````````````````` -ATX headings can be empty: +Spaces are allowed at the end: ```````````````````````````````` example -## -# -### ### +- - - - . -

-

-

+
```````````````````````````````` -## Setext headings +However, no other characters may occur in the line: -A [setext heading](@) consists of one or more -lines of text, each containing at least one [non-whitespace -character], with no more than 3 spaces indentation, followed by -a [setext heading underline]. The lines of text must be such -that, were they not followed by the setext heading underline, -they would be interpreted as a paragraph: they cannot be -interpretable as a [code fence], [ATX heading][ATX headings], -[block quote][block quotes], [thematic break][thematic breaks], -[list item][list items], or [HTML block][HTML blocks]. +```````````````````````````````` example +_ _ _ _ a -A [setext heading underline](@) is a sequence of -`=` characters or a sequence of `-` characters, with no more than 3 -spaces indentation and any number of trailing spaces. If a line -containing a single `-` can be interpreted as an -empty [list items], it should be interpreted this way -and not as a [setext heading underline]. +a------ -The heading is a level 1 heading if `=` characters are used in -the [setext heading underline], and a level 2 heading if `-` -characters are used. The contents of the heading are the result -of parsing the preceding lines of text as CommonMark inline -content. +---a--- +. +

_ _ _ _ a

+

a------

+

---a---

+```````````````````````````````` -In general, a setext heading need not be preceded or followed by a -blank line. However, it cannot interrupt a paragraph, so when a -setext heading comes after a paragraph, a blank line is needed between -them. -Simple examples: +It is required that all of the [non-whitespace characters] be the same. +So, this is not a thematic break: ```````````````````````````````` example -Foo *bar* -========= - -Foo *bar* ---------- + *-* . -

Foo bar

-

Foo bar

+

-

```````````````````````````````` -The content of the header may span more than one line: +Thematic breaks do not need blank lines before or after: ```````````````````````````````` example -Foo *bar -baz* -==== +- foo +*** +- bar . -

Foo bar -baz

+
    +
  • foo
  • +
+
+
    +
  • bar
  • +
```````````````````````````````` -The contents are the result of parsing the headings's raw -content as inlines. The heading's raw content is formed by -concatenating the lines and removing initial and final -[whitespace]. + +Thematic breaks can interrupt a paragraph: ```````````````````````````````` example - Foo *bar -baz*→ -==== +Foo +*** +bar . -

Foo bar -baz

+

Foo

+
+

bar

```````````````````````````````` -The underlining can be any length: +If a line of dashes that meets the above conditions for being a +thematic break could also be interpreted as the underline of a [setext +heading], the interpretation as a +[setext heading] takes precedence. Thus, for example, +this is a setext heading, not a paragraph followed by a thematic break: ```````````````````````````````` example Foo -------------------------- - -Foo -= +--- +bar .

Foo

-

Foo

+

bar

```````````````````````````````` -The heading content can be indented up to three spaces, and need -not line up with the underlining: +When both a thematic break and a list item are possible +interpretations of a line, the thematic break takes precedence: ```````````````````````````````` example - Foo ---- - - Foo ------ - - Foo - === +* Foo +* * * +* Bar . -

Foo

-

Foo

-

Foo

+
    +
  • Foo
  • +
+
+
    +
  • Bar
  • +
```````````````````````````````` -Four spaces indent is too much: +If you want a thematic break in a list item, use a different bullet: ```````````````````````````````` example - Foo - --- - - Foo ---- +- Foo +- * * * . -
Foo
----
-
-Foo
-
+
    +
  • Foo
  • +

  • +
  • +
```````````````````````````````` -The setext heading underline can be indented up to three spaces, and -may have trailing spaces: +## ATX headings + +An [ATX heading](@) +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped `#` characters and an optional +closing sequence of any number of unescaped `#` characters. +The opening sequence of `#` characters must be followed by a +[space] or by the end of line. The optional closing sequence of `#`s must be +preceded by a [space] and may be followed by spaces only. The opening +`#` character may be indented 0-3 spaces. The raw contents of the +heading are stripped of leading and trailing spaces before being parsed +as inline content. The heading level is equal to the number of `#` +characters in the opening sequence. + +Simple headings: ```````````````````````````````` example -Foo - ---- +# foo +## foo +### foo +#### foo +##### foo +###### foo . -

Foo

+

foo

+

foo

+

foo

+

foo

+
foo
+
foo
```````````````````````````````` -Four spaces is too much: +More than six `#` characters is not a heading: ```````````````````````````````` example -Foo - --- +####### foo . -

Foo ----

+

####### foo

```````````````````````````````` -The setext heading underline cannot contain internal spaces: +At least one space is required between the `#` characters and the +heading's contents, unless the heading is empty. Note that many +implementations currently do not require the space. However, the +space was required by the +[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), +and it helps prevent things like the following from being parsed as +headings: ```````````````````````````````` example -Foo -= = +#5 bolt -Foo ---- - +#hashtag . -

Foo -= =

-

Foo

-
+

#5 bolt

+

#hashtag

```````````````````````````````` -Trailing spaces in the content line do not cause a line break: +This is not a heading, because the first `#` is escaped: ```````````````````````````````` example -Foo ------ +\## foo . -

Foo

+

## foo

```````````````````````````````` -Nor does a backslash at the end: +Contents are parsed as inlines: ```````````````````````````````` example -Foo\ ----- +# foo *bar* \*baz\* . -

Foo\

+

foo bar *baz*

```````````````````````````````` -Since indicators of block structure take precedence over -indicators of inline structure, the following are setext headings: +Leading and trailing [whitespace] is ignored in parsing inline content: ```````````````````````````````` example -`Foo ----- -` +# foo +. +

foo

+```````````````````````````````` - + +One to three spaces indentation are allowed: + +```````````````````````````````` example + ### foo + ## foo + # foo . -

`Foo

-

`

-

<a title="a lot

-

of dashes"/>

+

foo

+

foo

+

foo

```````````````````````````````` -The setext heading underline cannot be a [lazy continuation -line] in a list item or block quote: +Four spaces are too much: ```````````````````````````````` example -> Foo ---- + # foo . -
-

Foo

-
-
+
# foo
+
```````````````````````````````` ```````````````````````````````` example -> foo -bar -=== +foo + # bar . -

foo -bar -===

-
+# bar

```````````````````````````````` +A closing sequence of `#` characters is optional: + ```````````````````````````````` example -- Foo ---- +## foo ## + ### bar ### . -
    -
  • Foo
  • -
-
+

foo

+

bar

```````````````````````````````` -A blank line is needed between a paragraph and a following -setext heading, since otherwise the paragraph becomes part -of the heading's content: +It need not be the same length as the opening sequence: ```````````````````````````````` example -Foo -Bar ---- +# foo ################################## +##### foo ## . -

Foo -Bar

+

foo

+
foo
```````````````````````````````` -But in general a blank line is not required before or after -setext headings: +Spaces are allowed after the closing sequence: ```````````````````````````````` example ---- -Foo ---- -Bar ---- -Baz +### foo ### . -
-

Foo

-

Bar

-

Baz

+

foo

```````````````````````````````` -Setext headings cannot be empty: +A sequence of `#` characters with anything but [spaces] following it +is not a closing sequence, but counts as part of the contents of the +heading: ```````````````````````````````` example - -==== +### foo ### b . -

====

+

foo ### b

```````````````````````````````` -Setext heading text lines must not be interpretable as block -constructs other than paragraphs. So, the line of dashes -in these examples gets interpreted as a thematic break: +The closing sequence must be preceded by a space: ```````````````````````````````` example ---- ---- +# foo# . -
-
+

foo#

```````````````````````````````` +Backslash-escaped `#` characters do not count as part +of the closing sequence: + ```````````````````````````````` example -- foo ------ +### foo \### +## foo #\## +# foo \# . -
    -
  • foo
  • -
-
+

foo ###

+

foo ###

+

foo #

```````````````````````````````` +ATX headings need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs: + ```````````````````````````````` example - foo ---- +**** +## foo +**** . -
foo
-
+
+

foo


```````````````````````````````` ```````````````````````````````` example -> foo ------ +Foo bar +# baz +Bar foo . -
-

foo

-
-
+

Foo bar

+

baz

+

Bar foo

```````````````````````````````` -If you want a heading with `> foo` as its literal text, you can -use backslash escapes: +ATX headings can be empty: ```````````````````````````````` example -\> foo ------- +## +# +### ### . -

> foo

+

+

+

```````````````````````````````` -**Compatibility note:** Most existing Markdown implementations -do not allow the text of setext headings to span multiple lines. -But there is no consensus about how to interpret +## Setext headings -``` markdown -Foo -bar ---- -baz -``` +A [setext heading](@) consists of one or more +lines of text, each containing at least one [non-whitespace +character], with no more than 3 spaces indentation, followed by +a [setext heading underline]. The lines of text must be such +that, were they not followed by the setext heading underline, +they would be interpreted as a paragraph: they cannot be +interpretable as a [code fence], [ATX heading][ATX headings], +[block quote][block quotes], [thematic break][thematic breaks], +[list item][list items], or [HTML block][HTML blocks]. -One can find four different interpretations: +A [setext heading underline](@) is a sequence of +`=` characters or a sequence of `-` characters, with no more than 3 +spaces indentation and any number of trailing spaces. If a line +containing a single `-` can be interpreted as an +empty [list items], it should be interpreted this way +and not as a [setext heading underline]. -1. paragraph "Foo", heading "bar", paragraph "baz" -2. paragraph "Foo bar", thematic break, paragraph "baz" -3. paragraph "Foo bar --- baz" -4. heading "Foo bar", paragraph "baz" +The heading is a level 1 heading if `=` characters are used in +the [setext heading underline], and a level 2 heading if `-` +characters are used. The contents of the heading are the result +of parsing the preceding lines of text as CommonMark inline +content. -We find interpretation 4 most natural, and interpretation 4 -increases the expressive power of CommonMark, by allowing -multiline headings. Authors who want interpretation 1 can -put a blank line after the first paragraph: +In general, a setext heading need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext heading comes after a paragraph, a blank line is needed between +them. + +Simple examples: ```````````````````````````````` example -Foo +Foo *bar* +========= -bar ---- -baz +Foo *bar* +--------- . -

Foo

-

bar

-

baz

+

Foo bar

+

Foo bar

```````````````````````````````` -Authors who want interpretation 2 can put blank lines around -the thematic break, +The content of the header may span more than one line: ```````````````````````````````` example -Foo -bar - ---- - -baz +Foo *bar +baz* +==== . -

Foo -bar

-
-

baz

+

Foo bar +baz

```````````````````````````````` - -or use a thematic break that cannot count as a [setext heading -underline], such as +The contents are the result of parsing the headings's raw +content as inlines. The heading's raw content is formed by +concatenating the lines and removing initial and final +[whitespace]. ```````````````````````````````` example -Foo -bar -* * * -baz + Foo *bar +baz*→ +==== . -

Foo -bar

-
-

baz

+

Foo bar +baz

```````````````````````````````` -Authors who want interpretation 3 can use backslash escapes: +The underlining can be any length: ```````````````````````````````` example Foo -bar -\--- -baz +------------------------- + +Foo += . -

Foo -bar ---- -baz

+

Foo

+

Foo

```````````````````````````````` -## Indented code blocks +The heading content can be indented up to three spaces, and need +not line up with the underlining: -An [indented code block](@) is composed of one or more -[indented chunks] separated by blank lines. -An [indented chunk](@) is a sequence of non-blank lines, -each indented four or more spaces. The contents of the code block are -the literal contents of the lines, including trailing -[line endings], minus four spaces of indentation. -An indented code block has no [info string]. +```````````````````````````````` example + Foo +--- -An indented code block cannot interrupt a paragraph, so there must be -a blank line between a paragraph and a following indented code block. -(A blank line is not needed, however, between a code block and a following -paragraph.) + Foo +----- -```````````````````````````````` example - a simple - indented code block + Foo + === . -
a simple
-  indented code block
-
+

Foo

+

Foo

+

Foo

```````````````````````````````` -If there is any ambiguity between an interpretation of indentation -as a code block and as indicating that material belongs to a [list -item][list items], the list item interpretation takes precedence: +Four spaces indent is too much: ```````````````````````````````` example - - foo + Foo + --- - bar + Foo +--- . -
    -
  • -

    foo

    -

    bar

    -
  • -
+
Foo
+---
+
+Foo
+
+
```````````````````````````````` -```````````````````````````````` example -1. foo +The setext heading underline can be indented up to three spaces, and +may have trailing spaces: - - bar +```````````````````````````````` example +Foo + ---- . -
    -
  1. -

    foo

    -
      -
    • bar
    • -
    -
  2. -
+

Foo

```````````````````````````````` - -The contents of a code block are literal text, and do not get parsed -as Markdown: +Four spaces is too much: ```````````````````````````````` example -
- *hi* - - - one +Foo + --- . -
<a/>
-*hi*
-
-- one
-
+

Foo +---

```````````````````````````````` -Here we have three chunks separated by blank lines: +The setext heading underline cannot contain internal spaces: ```````````````````````````````` example - chunk1 +Foo += = - chunk2 - - - - chunk3 +Foo +--- - . -
chunk1
-
-chunk2
+

Foo += =

+

Foo

+
+```````````````````````````````` +Trailing spaces in the content line do not cause a line break: -chunk3 -
+```````````````````````````````` example +Foo +----- +. +

Foo

```````````````````````````````` -Any initial spaces beyond four will be included in the content, even -in interior blank lines: +Nor does a backslash at the end: ```````````````````````````````` example - chunk1 - - chunk2 +Foo\ +---- . -
chunk1
-  
-  chunk2
-
+

Foo\

```````````````````````````````` -An indented code block cannot interrupt a paragraph. (This -allows hanging indents and the like.) +Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headings: ```````````````````````````````` example -Foo - bar +`Foo +---- +` +
. -

Foo -bar

+

`Foo

+

`

+

<a title="a lot

+

of dashes"/>

```````````````````````````````` -However, any non-blank line with fewer than four leading spaces ends -the code block immediately. So a paragraph may occur immediately -after indented code: +The setext heading underline cannot be a [lazy continuation +line] in a list item or block quote: ```````````````````````````````` example - foo -bar +> Foo +--- . -
foo
-
-

bar

+
+

Foo

+
+
```````````````````````````````` -And indented code can occur immediately before and after other kinds of -blocks: +```````````````````````````````` example +> foo +bar +=== +. +
+

foo +bar +===

+
+```````````````````````````````` + ```````````````````````````````` example -# Heading - foo -Heading ------- - foo ----- +- Foo +--- . -

Heading

-
foo
-
-

Heading

-
foo
-
+
    +
  • Foo
  • +

```````````````````````````````` -The first line can be indented more than four spaces: +A blank line is needed between a paragraph and a following +setext heading, since otherwise the paragraph becomes part +of the heading's content: ```````````````````````````````` example - foo - bar +Foo +Bar +--- . -
    foo
-bar
-
+

Foo +Bar

```````````````````````````````` -Blank lines preceding or following an indented code block -are not included in it: +But in general a blank line is not required before or after +setext headings: ```````````````````````````````` example - - - foo - - +--- +Foo +--- +Bar +--- +Baz . -
foo
-
+
+

Foo

+

Bar

+

Baz

```````````````````````````````` -Trailing spaces are included in the code block's content: +Setext headings cannot be empty: ```````````````````````````````` example - foo + +==== . -
foo  
-
+

====

```````````````````````````````` +Setext heading text lines must not be interpretable as block +constructs other than paragraphs. So, the line of dashes +in these examples gets interpreted as a thematic break: -## Fenced code blocks +```````````````````````````````` example +--- +--- +. +
+
+```````````````````````````````` -A [code fence](@) is a sequence -of at least three consecutive backtick characters (`` ` ``) or -tildes (`~`). (Tildes and backticks cannot be mixed.) -A [fenced code block](@) -begins with a code fence, indented no more than three spaces. - -The line with the opening code fence may optionally contain some text -following the code fence; this is trimmed of leading and trailing -whitespace and called the [info string](@). If the [info string] comes -after a backtick fence, it may not contain any backtick -characters. (The reason for this restriction is that otherwise -some inline code would be incorrectly interpreted as the -beginning of a fenced code block.) - -The content of the code block consists of all subsequent lines, until -a closing [code fence] of the same type as the code block -began with (backticks or tildes), and with at least as many backticks -or tildes as the opening code fence. If the leading code fence is -indented N spaces, then up to N spaces of indentation are removed from -each line of the content (if present). (If a content line is not -indented, it is preserved unchanged. If it is indented less than N -spaces, all of the indentation is removed.) - -The closing code fence may be indented up to three spaces, and may be -followed only by spaces, which are ignored. If the end of the -containing block (or document) is reached and no closing code fence -has been found, the code block contains all of the lines after the -opening code fence until the end of the containing block (or -document). (An alternative spec would require backtracking in the -event that a closing code fence is not found. But this makes parsing -much less efficient, and there seems to be no real down side to the -behavior described here.) - -A fenced code block may interrupt a paragraph, and does not require -a blank line either before or after. - -The content of a code fence is treated as literal text, not parsed -as inlines. The first word of the [info string] is typically used to -specify the language of the code sample, and rendered in the `class` -attribute of the `code` tag. However, this spec does not mandate any -particular treatment of the [info string]. - -Here is a simple example with backticks: ```````````````````````````````` example -``` -< - > -``` +- foo +----- . -
<
- >
-
+
    +
  • foo
  • +
+
```````````````````````````````` -With tildes: - ```````````````````````````````` example -~~~ -< - > -~~~ + foo +--- . -
<
- >
+
foo
 
+
```````````````````````````````` -Fewer than three backticks is not enough: ```````````````````````````````` example -`` -foo -`` +> foo +----- . -

foo

+
+

foo

+
+
```````````````````````````````` -The closing code fence must use the same character as the opening -fence: + +If you want a heading with `> foo` as its literal text, you can +use backslash escapes: ```````````````````````````````` example -``` -aaa -~~~ -``` +\> foo +------ . -
aaa
-~~~
-
+

> foo

```````````````````````````````` -```````````````````````````````` example -~~~ -aaa -``` -~~~ -. -
aaa
+**Compatibility note:**  Most existing Markdown implementations
+do not allow the text of setext headings to span multiple lines.
+But there is no consensus about how to interpret
+
+``` markdown
+Foo
+bar
+---
+baz
 ```
-
-```````````````````````````````` +One can find four different interpretations: -The closing code fence must be at least as long as the opening fence: +1. paragraph "Foo", heading "bar", paragraph "baz" +2. paragraph "Foo bar", thematic break, paragraph "baz" +3. paragraph "Foo bar --- baz" +4. heading "Foo bar", paragraph "baz" + +We find interpretation 4 most natural, and interpretation 4 +increases the expressive power of CommonMark, by allowing +multiline headings. Authors who want interpretation 1 can +put a blank line after the first paragraph: ```````````````````````````````` example -```` -aaa -``` -`````` +Foo + +bar +--- +baz . -
aaa
-```
-
+

Foo

+

bar

+

baz

```````````````````````````````` +Authors who want interpretation 2 can put blank lines around +the thematic break, + ```````````````````````````````` example -~~~~ -aaa -~~~ -~~~~ +Foo +bar + +--- + +baz . -
aaa
-~~~
-
+

Foo +bar

+
+

baz

```````````````````````````````` -Unclosed code blocks are closed by the end of the document -(or the enclosing [block quote][block quotes] or [list item][list items]): +or use a thematic break that cannot count as a [setext heading +underline], such as ```````````````````````````````` example -``` +Foo +bar +* * * +baz . -
+

Foo +bar

+
+

baz

```````````````````````````````` -```````````````````````````````` example -````` +Authors who want interpretation 3 can use backslash escapes: -``` -aaa +```````````````````````````````` example +Foo +bar +\--- +baz . -

-```
-aaa
-
+

Foo +bar +--- +baz

```````````````````````````````` -```````````````````````````````` example -> ``` -> aaa +## Indented code blocks -bbb +An [indented code block](@) is composed of one or more +[indented chunks] separated by blank lines. +An [indented chunk](@) is a sequence of non-blank lines, +each indented four or more spaces. The contents of the code block are +the literal contents of the lines, including trailing +[line endings], minus four spaces of indentation. +An indented code block has no [info string]. + +An indented code block cannot interrupt a paragraph, so there must be +a blank line between a paragraph and a following indented code block. +(A blank line is not needed, however, between a code block and a following +paragraph.) + +```````````````````````````````` example + a simple + indented code block . -
-
aaa
+
a simple
+  indented code block
 
-
-

bbb

```````````````````````````````` -A code block can have all empty lines as its content: +If there is any ambiguity between an interpretation of indentation +as a code block and as indicating that material belongs to a [list +item][list items], the list item interpretation takes precedence: ```````````````````````````````` example -``` + - foo - -``` + bar . -

-  
-
+
    +
  • +

    foo

    +

    bar

    +
  • +
```````````````````````````````` -A code block can be empty: - ```````````````````````````````` example -``` -``` +1. foo + + - bar . -
+
    +
  1. +

    foo

    +
      +
    • bar
    • +
    +
  2. +
```````````````````````````````` -Fences can be indented. If the opening fence is indented, -content lines will have equivalent opening indentation removed, -if present: -```````````````````````````````` example - ``` - aaa -aaa -``` +The contents of a code block are literal text, and do not get parsed +as Markdown: + +```````````````````````````````` example +
+ *hi* + + - one . -
aaa
-aaa
+
<a/>
+*hi*
+
+- one
 
```````````````````````````````` +Here we have three chunks separated by blank lines: + ```````````````````````````````` example - ``` -aaa - aaa -aaa - ``` + chunk1 + + chunk2 + + + + chunk3 . -
aaa
-aaa
-aaa
+
chunk1
+
+chunk2
+
+
+
+chunk3
 
```````````````````````````````` +Any initial spaces beyond four will be included in the content, even +in interior blank lines: + ```````````````````````````````` example - ``` - aaa - aaa - aaa - ``` + chunk1 + + chunk2 . -
aaa
- aaa
-aaa
+
chunk1
+  
+  chunk2
 
```````````````````````````````` -Four spaces indentation produces an indented code block: +An indented code block cannot interrupt a paragraph. (This +allows hanging indents and the like.) ```````````````````````````````` example - ``` - aaa - ``` +Foo + bar + . -
```
-aaa
-```
-
+

Foo +bar

```````````````````````````````` -Closing fences may be indented by 0-3 spaces, and their indentation -need not match that of the opening fence: +However, any non-blank line with fewer than four leading spaces ends +the code block immediately. So a paragraph may occur immediately +after indented code: ```````````````````````````````` example -``` -aaa - ``` + foo +bar . -
aaa
+
foo
 
+

bar

```````````````````````````````` +And indented code can occur immediately before and after other kinds of +blocks: + ```````````````````````````````` example - ``` -aaa - ``` +# Heading + foo +Heading +------ + foo +---- . -
aaa
+

Heading

+
foo
+
+

Heading

+
foo
 
+
```````````````````````````````` -This is not a closing fence, because it is indented 4 spaces: +The first line can be indented more than four spaces: ```````````````````````````````` example -``` -aaa - ``` + foo + bar . -
aaa
-    ```
+
    foo
+bar
 
```````````````````````````````` - -Code fences (opening and closing) cannot contain internal spaces: +Blank lines preceding or following an indented code block +are not included in it: ```````````````````````````````` example -``` ``` -aaa + + + foo + + . -

-aaa

+
foo
+
```````````````````````````````` +Trailing spaces are included in the code block's content: + ```````````````````````````````` example -~~~~~~ -aaa -~~~ ~~ + foo . -
aaa
-~~~ ~~
+
foo  
 
```````````````````````````````` -Fenced code blocks can interrupt paragraphs, and can be followed -directly by paragraphs, without a blank line between: + +## Fenced code blocks + +A [code fence](@) is a sequence +of at least three consecutive backtick characters (`` ` ``) or +tildes (`~`). (Tildes and backticks cannot be mixed.) +A [fenced code block](@) +begins with a code fence, indented no more than three spaces. + +The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +whitespace and called the [info string](@). If the [info string] comes +after a backtick fence, it may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.) + +The content of the code block consists of all subsequent lines, until +a closing [code fence] of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +indented N spaces, then up to N spaces of indentation are removed from +each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented less than N +spaces, all of the indentation is removed.) + +The closing code fence may be indented up to three spaces, and may be +followed only by spaces, which are ignored. If the end of the +containing block (or document) is reached and no closing code fence +has been found, the code block contains all of the lines after the +opening code fence until the end of the containing block (or +document). (An alternative spec would require backtracking in the +event that a closing code fence is not found. But this makes parsing +much less efficient, and there seems to be no real down side to the +behavior described here.) + +A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after. + +The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the [info string] is typically used to +specify the language of the code sample, and rendered in the `class` +attribute of the `code` tag. However, this spec does not mandate any +particular treatment of the [info string]. + +Here is a simple example with backticks: ```````````````````````````````` example -foo ``` -bar +< + > ``` -baz . -

foo

-
bar
+
<
+ >
 
-

baz

```````````````````````````````` -Other blocks can also occur before and after fenced code blocks -without an intervening blank line: +With tildes: ```````````````````````````````` example -foo ---- ~~~ -bar +< + > ~~~ -# baz . -

foo

-
bar
+
<
+ >
 
-

baz

```````````````````````````````` +Fewer than three backticks is not enough: -An [info string] can be provided after the opening code fence. -Although this spec doesn't mandate any particular treatment of -the info string, the first word is typically used to specify -the language of the code block. In HTML output, the language is -normally indicated by adding a class to the `code` element consisting -of `language-` followed by the language name. +```````````````````````````````` example +`` +foo +`` +. +

foo

+```````````````````````````````` + +The closing code fence must use the same character as the opening +fence: ```````````````````````````````` example -```ruby -def foo(x) - return 3 -end +``` +aaa +~~~ ``` . -
def foo(x)
-  return 3
-end
+
aaa
+~~~
 
```````````````````````````````` ```````````````````````````````` example -~~~~ ruby startline=3 $%@#$ -def foo(x) - return 3 -end -~~~~~~~ +~~~ +aaa +``` +~~~ . -
def foo(x)
-  return 3
-end
+
aaa
+```
 
```````````````````````````````` -```````````````````````````````` example -````; -```` -. -
-```````````````````````````````` - - -[Info strings] for backtick code blocks cannot contain backticks: +The closing code fence must be at least as long as the opening fence: ```````````````````````````````` example -``` aa ``` -foo +```` +aaa +``` +`````` . -

aa -foo

+
aaa
+```
+
```````````````````````````````` -[Info strings] for tilde code blocks can contain backticks and tildes: - ```````````````````````````````` example -~~~ aa ``` ~~~ -foo +~~~~ +aaa ~~~ +~~~~ . -
foo
+
aaa
+~~~
 
```````````````````````````````` -Closing code fences cannot have [info strings]: +Unclosed code blocks are closed by the end of the document +(or the enclosing [block quote][block quotes] or [list item][list items]): ```````````````````````````````` example ``` -``` aaa -``` . -
``` aaa
-
+
```````````````````````````````` +```````````````````````````````` example +````` -## HTML blocks - -An [HTML block](@) is a group of lines that is treated -as raw HTML (and will not be escaped in HTML output). - -There are seven kinds of [HTML block], which can be defined by their -start and end conditions. The block begins with a line that meets a -[start condition](@) (after up to three spaces optional indentation). -It ends with the first subsequent line that meets a matching [end -condition](@), or the last line of the document, or the last line of -the [container block](#container-blocks) containing the current HTML -block, if no line is encountered that meets the [end condition]. If -the first line meets both the [start condition] and the [end -condition], the block will contain just that line. +``` +aaa +. +

+```
+aaa
+
+```````````````````````````````` -1. **Start condition:** line begins with the string ``, or the end of the line.\ -**End condition:** line contains an end tag -``, `
`, or `` (case-insensitive; it -need not match the start tag). -2. **Start condition:** line begins with the string ``. +```````````````````````````````` example +> ``` +> aaa -3. **Start condition:** line begins with the string ``. +bbb +. +
+
aaa
+
+
+

bbb

+```````````````````````````````` -4. **Start condition:** line begins with the string ``. -5. **Start condition:** line begins with the string -``. +A code block can have all empty lines as its content: -6. **Start condition:** line begins the string `<` or ``, or -the string `/>`.\ -**End condition:** line is followed by a [blank line]. +```````````````````````````````` example +``` -7. **Start condition:** line begins with a complete [open tag] -(with any [tag name] other than `script`, -`style`, or `pre`) or a complete [closing tag], -followed only by [whitespace] or the end of the line.\ -**End condition:** line is followed by a [blank line]. + +``` +. +

+  
+
+```````````````````````````````` -HTML blocks continue until they are closed by their appropriate -[end condition], or the last line of the document or other [container -block](#container-blocks). This means any HTML **within an HTML -block** that might otherwise be recognised as a start condition will -be ignored by the parser and passed through as-is, without changing -the parser's state. -For instance, `
` within a HTML block started by `` will not affect
-the parser state; as the HTML block was started in by start condition 6, it
-will end at any blank line. This can be surprising:
+A code block can be empty:
 
 ```````````````````````````````` example
-
-
-**Hello**,
-
-_world_.
-
-
+``` +``` . -
-
-**Hello**,
-

world. -

-
+
```````````````````````````````` -In this case, the HTML block is terminated by the newline — the `**Hello**` -text remains verbatim — and regular parsing resumes, with a paragraph, -emphasised `world` and inline and block HTML following. - -All types of [HTML blocks] except type 7 may interrupt -a paragraph. Blocks of type 7 may not interrupt a paragraph. -(This restriction is intended to prevent unwanted interpretation -of long tags inside a wrapped paragraph as starting HTML blocks.) -Some simple examples follow. Here are some basic HTML blocks -of type 6: +Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present: ```````````````````````````````` example - - - - -
- hi -
- -okay. + ``` + aaa +aaa +``` . - - - - -
- hi -
-

okay.

+
aaa
+aaa
+
```````````````````````````````` ```````````````````````````````` example - -*foo* +
aaa
+ aaa
+aaa
+
```````````````````````````````` -Here we have two HTML blocks with a Markdown paragraph between them: +Four spaces indentation produces an indented code block: ```````````````````````````````` example -
- -*Markdown* - -
+ ``` + aaa + ``` . -
-

Markdown

-
+
```
+aaa
+```
+
```````````````````````````````` -The tag on the first line can be partial, as long -as it is split where there would be whitespace: +Closing fences may be indented by 0-3 spaces, and their indentation +need not match that of the opening fence: ```````````````````````````````` example -
-
+``` +aaa + ``` . -
-
+
aaa
+
```````````````````````````````` ```````````````````````````````` example -
-
+ ``` +aaa + ``` . -
-
+
aaa
+
```````````````````````````````` -An open tag need not be closed: -```````````````````````````````` example -
-*foo* +This is not a closing fence, because it is indented 4 spaces: -*bar* +```````````````````````````````` example +``` +aaa + ``` . -
-*foo* -

bar

+
aaa
+    ```
+
```````````````````````````````` -A partial tag need not even be completed (garbage -in, garbage out): +Code fences (opening and closing) cannot contain internal spaces: ```````````````````````````````` example -
+aaa

```````````````````````````````` ```````````````````````````````` example -
aaa +~~~ ~~ +
```````````````````````````````` -The initial tag doesn't even need to be a valid -tag, as long as it starts like one: +Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between: ```````````````````````````````` example -
foo

+
bar
+
+

baz

```````````````````````````````` -In type 6 blocks, the initial tag need not be on a line by -itself: - -```````````````````````````````` example -
-. -
-```````````````````````````````` - +Other blocks can also occur before and after fenced code blocks +without an intervening blank line: ```````````````````````````````` example -
foo -
+--- +~~~ +bar +~~~ +# baz . -
-foo -
+

foo

+
bar
+
+

baz

```````````````````````````````` -Everything until the next blank line or end of document -gets included in the HTML block. So, in the following -example, what looks like a Markdown code block -is actually part of the HTML block, which continues until a blank -line or the end of the document is reached: +An [info string] can be provided after the opening code fence. +Although this spec doesn't mandate any particular treatment of +the info string, the first word is typically used to specify +the language of the code block. In HTML output, the language is +normally indicated by adding a class to the `code` element consisting +of `language-` followed by the language name. ```````````````````````````````` example -
-``` c -int x = 33; +```ruby +def foo(x) + return 3 +end ``` . -
-``` c -int x = 33; -``` +
def foo(x)
+  return 3
+end
+
```````````````````````````````` -To start an [HTML block] with a tag that is *not* in the -list of block-level tags in (6), you must put the tag by -itself on the first line (and it must be complete): - ```````````````````````````````` example - -*bar* - +~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ . - -*bar* - +
def foo(x)
+  return 3
+end
+
```````````````````````````````` -In type 7 blocks, the [tag name] can be anything: - ```````````````````````````````` example - -*bar* - +````; +```` . - -*bar* - +
```````````````````````````````` -```````````````````````````````` example - -*bar* - -. - -*bar* - -```````````````````````````````` - +[Info strings] for backtick code blocks cannot contain backticks: ```````````````````````````````` example - -*bar* +``` aa ``` +foo . - -*bar* +

aa +foo

```````````````````````````````` -These rules are designed to allow us to work with tags that -can function as either block-level or inline-level tags. -The `` tag is a nice example. We can surround content with -`` tags in three different ways. In this case, we get a raw -HTML block, because the `` tag is on a line by itself: +[Info strings] for tilde code blocks can contain backticks and tildes: ```````````````````````````````` example - -*foo* - +~~~ aa ``` ~~~ +foo +~~~ . - -*foo* - +
foo
+
```````````````````````````````` -In this case, we get a raw HTML block that just includes -the `` tag (because it ends with the following blank -line). So the contents get interpreted as CommonMark: +Closing code fences cannot have [info strings]: ```````````````````````````````` example - - -*foo* - - +``` +``` aaa +``` . - -

foo

-
+
``` aaa
+
```````````````````````````````` -Finally, in this case, the `` tags are interpreted -as [raw HTML] *inside* the CommonMark paragraph. (Because -the tag is not on a line by itself, we get inline HTML -rather than an [HTML block].) - -```````````````````````````````` example -*foo* -. -

foo

-```````````````````````````````` - -HTML tags designed to contain literal content -(`script`, `style`, `pre`), comments, processing instructions, -and declarations are treated somewhat differently. -Instead of ending at the first blank line, these blocks -end at the first line containing a corresponding end tag. -As a result, these blocks can contain blank lines: +## HTML blocks -A pre tag (type 1): +An [HTML block](@) is a group of lines that is treated +as raw HTML (and will not be escaped in HTML output). -```````````````````````````````` example -

-import Text.HTML.TagSoup
+There are seven kinds of [HTML block], which can be defined by their
+start and end conditions.  The block begins with a line that meets a
+[start condition](@) (after up to three spaces optional indentation).
+It ends with the first subsequent line that meets a matching [end
+condition](@), or the last line of the document, or the last line of
+the [container block](#container-blocks) containing the current HTML
+block, if no line is encountered that meets the [end condition].  If
+the first line meets both the [start condition] and the [end
+condition], the block will contain just that line.
 
-main :: IO ()
-main = print $ parseTags tags
-
-okay -. -

-import Text.HTML.TagSoup
+1.  **Start condition:**  line begins with the string ``, or the end of the line.\
+**End condition:**  line contains an end tag
+``, `
`, or `` (case-insensitive; it +need not match the start tag). -main :: IO () -main = print $ parseTags tags -
-

okay

-```````````````````````````````` +2. **Start condition:** line begins with the string ``. +3. **Start condition:** line begins with the string ``. -A script tag (type 1): +4. **Start condition:** line begins with the string ``. -```````````````````````````````` example - -okay -. - -

okay

-```````````````````````````````` +7. **Start condition:** line begins with a complete [open tag] +(with any [tag name] other than `script`, +`style`, or `pre`) or a complete [closing tag], +followed only by [whitespace] or the end of the line.\ +**End condition:** line is followed by a [blank line]. +HTML blocks continue until they are closed by their appropriate +[end condition], or the last line of the document or other [container +block](#container-blocks). This means any HTML **within an HTML +block** that might otherwise be recognised as a start condition will +be ignored by the parser and passed through as-is, without changing +the parser's state. -A style tag (type 1): +For instance, `
` within a HTML block started by `` will not affect
+the parser state; as the HTML block was started in by start condition 6, it
+will end at any blank line. This can be surprising:
 
 ```````````````````````````````` example
-
-okay
+_world_.
+
+
. - -

okay

+
+
+**Hello**,
+

world. +

+
```````````````````````````````` +In this case, the HTML block is terminated by the newline — the `**Hello**` +text remains verbatim — and regular parsing resumes, with a paragraph, +emphasised `world` and inline and block HTML following. -If there is no matching end tag, the block will end at the -end of the document (or the enclosing [block quote][block quotes] -or [list item][list items]): +All types of [HTML blocks] except type 7 may interrupt +a paragraph. Blocks of type 7 may not interrupt a paragraph. +(This restriction is intended to prevent unwanted interpretation +of long tags inside a wrapped paragraph as starting HTML blocks.) + +Some simple examples follow. Here are some basic HTML blocks +of type 6: ```````````````````````````````` example - -*foo* +
+ +*Markdown* + +
. - -

foo

+
+

Markdown

+
```````````````````````````````` +The tag on the first line can be partial, as long +as it is split where there would be whitespace: + ```````````````````````````````` example -*bar* -*baz* +
+
. -*bar* -

baz

+
+
```````````````````````````````` -Note that anything on the last line after the -end tag will be included in the [HTML block]: - ```````````````````````````````` example -1. *bar* +
+
. -1. *bar* +
+
```````````````````````````````` -A comment (type 2): - +An open tag need not be closed: ```````````````````````````````` example - -okay +*bar* . - -

okay

+
+*foo* +

bar

```````````````````````````````` -A processing instruction (type 3): +A partial tag need not even be completed (garbage +in, garbage out): ```````````````````````````````` example -'; - -?> -okay +
'; - -?> -

okay

+
+
+
-okay +
-

okay

+
- - + . - -
<!-- foo -->
-
+ ```````````````````````````````` ```````````````````````````````` example -
- -
+
+foo +
. -
-
<div>
-
+
+foo +
```````````````````````````````` -An HTML block of types 1--6 can interrupt a paragraph, and need not be -preceded by a blank line. +Everything until the next blank line or end of document +gets included in the HTML block. So, in the following +example, what looks like a Markdown code block +is actually part of the HTML block, which continues until a blank +line or the end of the document is reached: ```````````````````````````````` example -Foo -
-bar -
+
+``` c +int x = 33; +``` . -

Foo

-
-bar -
+
+``` c +int x = 33; +``` ```````````````````````````````` -However, a following blank line is needed, except at the end of -a document, and except for blocks of types 1--5, [above][HTML -block]: +To start an [HTML block] with a tag that is *not* in the +list of block-level tags in (6), you must put the tag by +itself on the first line (and it must be complete): ```````````````````````````````` example -
-bar -
-*foo* + +*bar* + . -
-bar -
-*foo* + +*bar* + ```````````````````````````````` -HTML blocks of type 7 cannot interrupt a paragraph: +In type 7 blocks, the [tag name] can be anything: ```````````````````````````````` example -Foo - -baz + +*bar* + . -

Foo - -baz

+ +*bar* + ```````````````````````````````` -This rule differs from John Gruber's original Markdown syntax -specification, which says: +```````````````````````````````` example + +*bar* + +. + +*bar* + +```````````````````````````````` -> The only restrictions are that block-level HTML elements — -> e.g. `
`, ``, `
`, `

`, etc. — must be separated from -> surrounding content by blank lines, and the start and end tags of the -> block should not be indented with tabs or spaces. -In some ways Gruber's rule is more restrictive than the one given -here: +```````````````````````````````` example + +*bar* +. + +*bar* +```````````````````````````````` -- It requires that an HTML block be preceded by a blank line. -- It does not allow the start tag to be indented. -- It requires a matching end tag, which it also does not allow to - be indented. -Most Markdown implementations (including some of Gruber's own) do not -respect all of these restrictions. +These rules are designed to allow us to work with tags that +can function as either block-level or inline-level tags. +The `` tag is a nice example. We can surround content with +`` tags in three different ways. In this case, we get a raw +HTML block, because the `` tag is on a line by itself: -There is one respect, however, in which Gruber's rule is more liberal -than the one given here, since it allows blank lines to occur inside -an HTML block. There are two reasons for disallowing them here. -First, it removes the need to parse balanced tags, which is -expensive and can require backtracking from the end of the document -if no matching end tag is found. Second, it provides a very simple -and flexible way of including Markdown content inside HTML tags: -simply separate the Markdown from the HTML using blank lines: +```````````````````````````````` example + +*foo* + +. + +*foo* + +```````````````````````````````` -Compare: + +In this case, we get a raw HTML block that just includes +the `` tag (because it ends with the following blank +line). So the contents get interpreted as CommonMark: ```````````````````````````````` example -

+ -*Emphasized* text. +*foo* -
+ . -
-

Emphasized text.

-
+ +

foo

+
```````````````````````````````` +Finally, in this case, the `` tags are interpreted +as [raw HTML] *inside* the CommonMark paragraph. (Because +the tag is not on a line by itself, we get inline HTML +rather than an [HTML block].) + ```````````````````````````````` example -
-*Emphasized* text. -
+*foo* . -
-*Emphasized* text. -
+

foo

```````````````````````````````` -Some Markdown implementations have adopted a convention of -interpreting content inside tags as text if the open tag has -the attribute `markdown=1`. The rule given above seems a simpler and -more elegant way of achieving the same expressive power, which is also -much simpler to parse. +HTML tags designed to contain literal content +(`script`, `style`, `pre`), comments, processing instructions, +and declarations are treated somewhat differently. +Instead of ending at the first blank line, these blocks +end at the first line containing a corresponding end tag. +As a result, these blocks can contain blank lines: -The main potential drawback is that one can no longer paste HTML -blocks into Markdown documents with 100% reliability. However, -*in most cases* this will work fine, because the blank lines in -HTML are usually followed by HTML block tags. For example: +A pre tag (type 1): ```````````````````````````````` example -
+

+import Text.HTML.TagSoup
 
-
+main :: IO () +main = print $ parseTags tags + +okay +. +

+import Text.HTML.TagSoup
 
-
+main :: IO () +main = print $ parseTags tags + +

okay

+```````````````````````````````` - -
-Hi -
+A script tag (type 1): + +```````````````````````````````` example + +okay . - - - - -
-Hi -
+ +

okay

```````````````````````````````` -There are problems, however, if the inner tags are indented -*and* separated by spaces, as then they will be interpreted as -an indented code block: +A style tag (type 1): ```````````````````````````````` example - + +okay +. + +

okay

+```````````````````````````````` - -
+If there is no matching end tag, the block will end at the +end of the document (or the enclosing [block quote][block quotes] +or [list item][list items]): + +```````````````````````````````` example + +*foo* . -

Foo bar

+ +

foo

```````````````````````````````` -The title may extend over multiple lines: - ```````````````````````````````` example -[foo]: /url ' -title -line1 -line2 -' - -[foo] +*bar* +*baz* . -

foo

+*bar* +

baz

```````````````````````````````` -However, it may not contain a [blank line]: +Note that anything on the last line after the +end tag will be included in the [HTML block]: ```````````````````````````````` example -[foo]: /url 'title - -with blank line' - -[foo] +1. *bar* . -

[foo]: /url 'title

-

with blank line'

-

[foo]

+1. *bar* ```````````````````````````````` -The title may be omitted: +A comment (type 2): ```````````````````````````````` example -[foo]: -/url + +okay . -

foo

-```````````````````````````````` - + +

okay

+```````````````````````````````` -```````````````````````````````` example -[foo]: -[foo] -. -

[foo]:

-

[foo]

-```````````````````````````````` - However, an empty link destination may be specified using - angle brackets: +A processing instruction (type 3): ```````````````````````````````` example -[foo]: <> +foo

-```````````````````````````````` + echo '>'; -The title must be separated from the link destination by -whitespace: +?> +okay +. +(baz) + echo '>'; -[foo] -. -

[foo]: (baz)

-

[foo]

+?> +

okay

```````````````````````````````` -Both title and destination can contain backslash escapes -and literal backslashes: +A declaration (type 4): ```````````````````````````````` example -[foo]: /url\bar\*baz "foo\"bar\baz" - -[foo] + . -

foo

+ ```````````````````````````````` -A link can come before its corresponding definition: +CDATA (type 5): ```````````````````````````````` example -[foo] - -[foo]: url -. -

foo

-```````````````````````````````` + +okay +. +foo

+ return 0; + } +} +]]> +

okay

```````````````````````````````` -As noted in the section on [Links], matching of labels is -case-insensitive (see [matches]). +The opening tag can be indented 1-3 spaces, but not 4: ```````````````````````````````` example -[FOO]: /url + -[Foo] + . -

Foo

+ +
<!-- foo -->
+
```````````````````````````````` ```````````````````````````````` example -[ΑΓΩ]: /φου +
-[αγω] +
. -

αγω

+
+
<div>
+
```````````````````````````````` -Here is a link reference definition with no corresponding link. -It contributes nothing to the document. +An HTML block of types 1--6 can interrupt a paragraph, and need not be +preceded by a blank line. ```````````````````````````````` example -[foo]: /url +Foo +
+bar +
. +

Foo

+
+bar +
```````````````````````````````` -Here is another one: +However, a following blank line is needed, except at the end of +a document, and except for blocks of types 1--5, [above][HTML +block]: ```````````````````````````````` example -[ -foo -]: /url +
bar +
+*foo* . -

bar

+
+bar +
+*foo* ```````````````````````````````` -This is not a link reference definition, because there are -[non-whitespace characters] after the title: +HTML blocks of type 7 cannot interrupt a paragraph: ```````````````````````````````` example -[foo]: /url "title" ok +Foo + +baz . -

[foo]: /url "title" ok

-```````````````````````````````` +

Foo + +baz

+```````````````````````````````` -This is a link reference definition, but it has no title: +This rule differs from John Gruber's original Markdown syntax +specification, which says: + +> The only restrictions are that block-level HTML elements — +> e.g. `
`, ``, `
`, `

`, etc. — must be separated from +> surrounding content by blank lines, and the start and end tags of the +> block should not be indented with tabs or spaces. + +In some ways Gruber's rule is more restrictive than the one given +here: + +- It requires that an HTML block be preceded by a blank line. +- It does not allow the start tag to be indented. +- It requires a matching end tag, which it also does not allow to + be indented. + +Most Markdown implementations (including some of Gruber's own) do not +respect all of these restrictions. + +There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including Markdown content inside HTML tags: +simply separate the Markdown from the HTML using blank lines: + +Compare: ```````````````````````````````` example -[foo]: /url -"title" ok +

+ +*Emphasized* text. + +
. -

"title" ok

+
+

Emphasized text.

+
```````````````````````````````` -This is not a link reference definition, because it is indented -four spaces: - ```````````````````````````````` example - [foo]: /url "title" - -[foo] +
+*Emphasized* text. +
. -
[foo]: /url "title"
-
-

[foo]

+
+*Emphasized* text. +
```````````````````````````````` -This is not a link reference definition, because it occurs inside -a code block: +Some Markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute `markdown=1`. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse. + +The main potential drawback is that one can no longer paste HTML +blocks into Markdown documents with 100% reliability. However, +*in most cases* this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example: ```````````````````````````````` example -``` -[foo]: /url -``` +
-[foo] + + + + + + +
+Hi +
. -
[foo]: /url
-
-

[foo]

+ + + + +
+Hi +
```````````````````````````````` -A [link reference definition] cannot interrupt a paragraph. +There are problems, however, if the inner tags are indented +*and* separated by spaces, as then they will be interpreted as +an indented code block: ```````````````````````````````` example -Foo -[bar]: /baz + -[bar] + + + + + + +
+ Hi +
. -

Foo -[bar]: /baz

-

[bar]

+ + +
<td>
+  Hi
+</td>
+
+ +
```````````````````````````````` -However, it can directly follow other block elements, such as headings -and thematic breaks, and it need not be followed by a blank line. +Fortunately, blank lines are usually not necessary and can be +deleted. The exception is inside `
` tags, but as described
+[above][HTML blocks], raw HTML blocks starting with `
`
+*can* contain blank lines.
+
+## Link reference definitions
+
+A [link reference definition](@)
+consists of a [link label], indented up to three spaces, followed
+by a colon (`:`), optional [whitespace] (including up to one
+[line ending]), a [link destination],
+optional [whitespace] (including up to one
+[line ending]), and an optional [link
+title], which if it is present must be separated
+from the [link destination] by [whitespace].
+No further [non-whitespace characters] may occur on the line.
+
+A [link reference definition]
+does not correspond to a structural element of a document.  Instead, it
+defines a label which can be used in [reference links]
+and reference-style [images] elsewhere in the document.  [Link
+reference definitions] can come either before or after the links that use
+them.
 
 ```````````````````````````````` example
-# [Foo]
-[foo]: /url
-> bar
+[foo]: /url "title"
+
+[foo]
 .
-

Foo

-
-

bar

-
+

foo

```````````````````````````````` + ```````````````````````````````` example -[foo]: /url -bar -=== + [foo]: + /url + 'the title' + [foo] . -

bar

-

foo

+

foo

```````````````````````````````` + ```````````````````````````````` example -[foo]: /url -=== -[foo] +[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] . -

=== -foo

+

Foo*bar]

```````````````````````````````` -Several [link reference definitions] -can occur one after another, without intervening blank lines. - ```````````````````````````````` example -[foo]: /foo-url "foo" -[bar]: /bar-url - "bar" -[baz]: /baz-url +[Foo bar]: + +'title' -[foo], -[bar], -[baz] +[Foo bar] . -

foo, -bar, -baz

+

Foo bar

```````````````````````````````` -[Link reference definitions] can occur -inside block containers, like lists and block quotations. They -affect the entire document, not just the container in which they -are defined: +The title may extend over multiple lines: ```````````````````````````````` example -[foo] +[foo]: /url ' +title +line1 +line2 +' -> [foo]: /url +[foo] . -

foo

-
-
+

foo

```````````````````````````````` -Whether something is a [link reference definition] is -independent of whether the link reference it defines is -used in the document. Thus, for example, the following -document contains just a link reference definition, and -no visible content: +However, it may not contain a [blank line]: ```````````````````````````````` example -[foo]: /url -. -```````````````````````````````` +[foo]: /url 'title +with blank line' -## Paragraphs +[foo] +. +

[foo]: /url 'title

+

with blank line'

+

[foo]

+```````````````````````````````` -A sequence of non-blank lines that cannot be interpreted as other -kinds of blocks forms a [paragraph](@). -The contents of the paragraph are the result of parsing the -paragraph's raw content as inlines. The paragraph's raw content -is formed by concatenating the lines and removing initial and final -[whitespace]. -A simple example with two paragraphs: +The title may be omitted: ```````````````````````````````` example -aaa +[foo]: +/url -bbb +[foo] . -

aaa

-

bbb

+

foo

```````````````````````````````` -Paragraphs can contain multiple lines, but no blank lines: +The link destination may not be omitted: ```````````````````````````````` example -aaa -bbb +[foo]: -ccc -ddd +[foo] . -

aaa -bbb

-

ccc -ddd

+

[foo]:

+

[foo]

```````````````````````````````` - -Multiple blank lines between paragraph have no effect: + However, an empty link destination may be specified using + angle brackets: ```````````````````````````````` example -aaa - +[foo]: <> -bbb +[foo] . -

aaa

-

bbb

+

foo

```````````````````````````````` - -Leading spaces are skipped: +The title must be separated from the link destination by +whitespace: ```````````````````````````````` example - aaa - bbb +[foo]: (baz) + +[foo] . -

aaa -bbb

+

[foo]: (baz)

+

[foo]

```````````````````````````````` -Lines after the first may be indented any amount, since indented -code blocks cannot interrupt paragraphs. +Both title and destination can contain backslash escapes +and literal backslashes: ```````````````````````````````` example -aaa - bbb - ccc +[foo]: /url\bar\*baz "foo\"bar\baz" + +[foo] . -

aaa -bbb -ccc

+

foo

```````````````````````````````` -However, the first line may be indented at most three spaces, -or an indented code block will be triggered: +A link can come before its corresponding definition: ```````````````````````````````` example - aaa -bbb -. -

aaa -bbb

-```````````````````````````````` - +[foo] -```````````````````````````````` example - aaa -bbb +[foo]: url . -
aaa
-
-

bbb

+

foo

```````````````````````````````` -Final spaces are stripped before inline parsing, so a paragraph -that ends with two or more spaces will not end with a [hard line -break]: +If there are several matching definitions, the first one takes +precedence: ```````````````````````````````` example -aaa -bbb +[foo] + +[foo]: first +[foo]: second . -

aaa
-bbb

+

foo

```````````````````````````````` -## Blank lines - -[Blank lines] between block-level elements are ignored, -except for the role they play in determining whether a [list] -is [tight] or [loose]. - -Blank lines at the beginning and end of the document are also ignored. +As noted in the section on [Links], matching of labels is +case-insensitive (see [matches]). ```````````````````````````````` example - - -aaa - - -# aaa +[FOO]: /url - +[Foo] . -

aaa

-

aaa

+

Foo

```````````````````````````````` +```````````````````````````````` example +[ΑΓΩ]: /φου -# Container blocks - -A [container block](#container-blocks) is a block that has other -blocks as its contents. There are two basic kinds of container blocks: -[block quotes] and [list items]. -[Lists] are meta-containers for [list items]. - -We define the syntax for container blocks recursively. The general -form of the definition is: - -> If X is a sequence of blocks, then the result of -> transforming X in such-and-such a way is a container of type Y -> with these blocks as its content. - -So, we explain what counts as a block quote or list item by explaining -how these can be *generated* from their contents. This should suffice -to define the syntax, although it does not give a recipe for *parsing* -these constructions. (A recipe is provided below in the section entitled -[A parsing strategy](#appendix-a-parsing-strategy).) +[αγω] +. +

αγω

+```````````````````````````````` -## Block quotes -A [block quote marker](@) -consists of 0-3 spaces of initial indent, plus (a) the character `>` together -with a following space, or (b) a single character `>` not followed by a space. +Here is a link reference definition with no corresponding link. +It contributes nothing to the document. -The following rules define [block quotes]: +```````````````````````````````` example +[foo]: /url +. +```````````````````````````````` -1. **Basic case.** If a string of lines *Ls* constitute a sequence - of blocks *Bs*, then the result of prepending a [block quote - marker] to the beginning of each line in *Ls* - is a [block quote](#block-quotes) containing *Bs*. -2. **Laziness.** If a string of lines *Ls* constitute a [block - quote](#block-quotes) with contents *Bs*, then the result of deleting - the initial [block quote marker] from one or - more lines in which the next [non-whitespace character] after the [block - quote marker] is [paragraph continuation - text] is a block quote with *Bs* as its content. - [Paragraph continuation text](@) is text - that will be parsed as part of the content of a paragraph, but does - not occur at the beginning of the paragraph. +Here is another one: -3. **Consecutiveness.** A document cannot contain two [block - quotes] in a row unless there is a [blank line] between them. +```````````````````````````````` example +[ +foo +]: /url +bar +. +

bar

+```````````````````````````````` -Nothing else counts as a [block quote](#block-quotes). -Here is a simple example: +This is not a link reference definition, because there are +[non-whitespace characters] after the title: ```````````````````````````````` example -> # Foo -> bar -> baz +[foo]: /url "title" ok . -
-

Foo

-

bar -baz

-
+

[foo]: /url "title" ok

```````````````````````````````` -The spaces after the `>` characters can be omitted: +This is a link reference definition, but it has no title: ```````````````````````````````` example -># Foo ->bar -> baz +[foo]: /url +"title" ok . -
-

Foo

-

bar -baz

-
+

"title" ok

```````````````````````````````` -The `>` characters can be indented 1-3 spaces: +This is not a link reference definition, because it is indented +four spaces: ```````````````````````````````` example - > # Foo - > bar - > baz + [foo]: /url "title" + +[foo] . -
-

Foo

-

bar -baz

-
+
[foo]: /url "title"
+
+

[foo]

```````````````````````````````` -Four spaces gives us a code block: +This is not a link reference definition, because it occurs inside +a code block: ```````````````````````````````` example - > # Foo - > bar - > baz +``` +[foo]: /url +``` + +[foo] . -
> # Foo
-> bar
-> baz
+
[foo]: /url
 
+

[foo]

```````````````````````````````` -The Laziness clause allows us to omit the `>` before -[paragraph continuation text]: +A [link reference definition] cannot interrupt a paragraph. ```````````````````````````````` example -> # Foo -> bar -baz +Foo +[bar]: /baz + +[bar] . -
-

Foo

-

bar -baz

-
+

Foo +[bar]: /baz

+

[bar]

```````````````````````````````` -A block quote can contain some lazy and some non-lazy -continuation lines: +However, it can directly follow other block elements, such as headings +and thematic breaks, and it need not be followed by a blank line. ```````````````````````````````` example +# [Foo] +[foo]: /url > bar -baz -> foo . +

Foo

-

bar -baz -foo

+

bar

```````````````````````````````` - -Laziness only applies to lines that would have been continuations of -paragraphs had they been prepended with [block quote markers]. -For example, the `> ` cannot be omitted in the second line of - -``` markdown -> foo -> --- -``` - -without changing the meaning: - ```````````````````````````````` example -> foo ---- +[foo]: /url +bar +=== +[foo] . -
-

foo

-
-
+

bar

+

foo

```````````````````````````````` - -Similarly, if we omit the `> ` in the second line of - -``` markdown -> - foo -> - bar -``` - -then the block quote ends after the first line: - ```````````````````````````````` example -> - foo -- bar +[foo]: /url +=== +[foo] . -
-
    -
  • foo
  • -
-
-
    -
  • bar
  • -
+

=== +foo

```````````````````````````````` -For the same reason, we can't omit the `> ` in front of -subsequent lines of an indented or fenced code block: +Several [link reference definitions] +can occur one after another, without intervening blank lines. ```````````````````````````````` example -> foo - bar +[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url + +[foo], +[bar], +[baz] . -
-
foo
-
-
-
bar
-
+

foo, +bar, +baz

```````````````````````````````` +[Link reference definitions] can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined: + ```````````````````````````````` example -> ``` -foo -``` +[foo] + +> [foo]: /url . +

foo

-
-

foo

-
```````````````````````````````` -Note that in the following case, we have a [lazy -continuation line]: +Whether something is a [link reference definition] is +independent of whether the link reference it defines is +used in the document. Thus, for example, the following +document contains just a link reference definition, and +no visible content: ```````````````````````````````` example -> foo - - bar +[foo]: /url . -
-

foo -- bar

-
```````````````````````````````` -To see why, note that in - -```markdown -> foo -> - bar -``` +## Paragraphs -the `- bar` is indented too far to start a list, and can't -be an indented code block because indented code blocks cannot -interrupt paragraphs, so it is [paragraph continuation text]. +A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a [paragraph](@). +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +[whitespace]. -A block quote can be empty: +A simple example with two paragraphs: ```````````````````````````````` example -> -. -
-
-```````````````````````````````` - +aaa -```````````````````````````````` example -> -> -> +bbb . -
-
+

aaa

+

bbb

```````````````````````````````` -A block quote can have initial or final blank lines: +Paragraphs can contain multiple lines, but no blank lines: ```````````````````````````````` example -> -> foo -> +aaa +bbb + +ccc +ddd . -
-

foo

-
+

aaa +bbb

+

ccc +ddd

```````````````````````````````` -A blank line always separates block quotes: +Multiple blank lines between paragraph have no effect: ```````````````````````````````` example -> foo +aaa -> bar + +bbb . -
-

foo

-
-
-

bar

-
+

aaa

+

bbb

```````````````````````````````` -(Most current Markdown implementations, including John Gruber's -original `Markdown.pl`, will parse this example as a single block quote -with two paragraphs. But it seems better to allow the author to decide -whether two block quotes or one are wanted.) - -Consecutiveness means that if we put these block quotes together, -we get a single block quote: +Leading spaces are skipped: ```````````````````````````````` example -> foo -> bar + aaa + bbb . -
-

foo -bar

-
+

aaa +bbb

```````````````````````````````` -To get a block quote with two paragraphs, use: +Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs. ```````````````````````````````` example -> foo -> -> bar +aaa + bbb + ccc . -
-

foo

-

bar

-
+

aaa +bbb +ccc

```````````````````````````````` -Block quotes can interrupt paragraphs: +However, the first line may be indented at most three spaces, +or an indented code block will be triggered: ```````````````````````````````` example -foo -> bar + aaa +bbb . -

foo

-
-

bar

-
+

aaa +bbb

```````````````````````````````` -In general, blank lines are not needed before or after block -quotes: - ```````````````````````````````` example -> aaa -*** -> bbb + aaa +bbb . -
-

aaa

-
-
-
+
aaa
+

bbb

-
```````````````````````````````` -However, because of laziness, a blank line is needed between -a block quote and a following paragraph: +Final spaces are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a [hard line +break]: ```````````````````````````````` example -> bar -baz +aaa +bbb . -
-

bar -baz

-
+

aaa
+bbb

```````````````````````````````` +## Blank lines + +[Blank lines] between block-level elements are ignored, +except for the role they play in determining whether a [list] +is [tight] or [loose]. + +Blank lines at the beginning and end of the document are also ignored. + ```````````````````````````````` example -> bar + -baz +aaa + + +# aaa + + . -
-

bar

-
-

baz

+

aaa

+

aaa

```````````````````````````````` + +# Container blocks + +A [container block](#container-blocks) is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes] and [list items]. +[Lists] are meta-containers for [list items]. + +We define the syntax for container blocks recursively. The general +form of the definition is: + +> If X is a sequence of blocks, then the result of +> transforming X in such-and-such a way is a container of type Y +> with these blocks as its content. + +So, we explain what counts as a block quote or list item by explaining +how these can be *generated* from their contents. This should suffice +to define the syntax, although it does not give a recipe for *parsing* +these constructions. (A recipe is provided below in the section entitled +[A parsing strategy](#appendix-a-parsing-strategy).) + +## Block quotes + +A [block quote marker](@) +consists of 0-3 spaces of initial indent, plus (a) the character `>` together +with a following space, or (b) a single character `>` not followed by a space. + +The following rules define [block quotes]: + +1. **Basic case.** If a string of lines *Ls* constitute a sequence + of blocks *Bs*, then the result of prepending a [block quote + marker] to the beginning of each line in *Ls* + is a [block quote](#block-quotes) containing *Bs*. + +2. **Laziness.** If a string of lines *Ls* constitute a [block + quote](#block-quotes) with contents *Bs*, then the result of deleting + the initial [block quote marker] from one or + more lines in which the next [non-whitespace character] after the [block + quote marker] is [paragraph continuation + text] is a block quote with *Bs* as its content. + [Paragraph continuation text](@) is text + that will be parsed as part of the content of a paragraph, but does + not occur at the beginning of the paragraph. + +3. **Consecutiveness.** A document cannot contain two [block + quotes] in a row unless there is a [blank line] between them. + +Nothing else counts as a [block quote](#block-quotes). + +Here is a simple example: + ```````````````````````````````` example +> # Foo > bar -> -baz +> baz .
-

bar

+

Foo

+

bar +baz

-

baz

```````````````````````````````` -It is a consequence of the Laziness rule that any number -of initial `>`s may be omitted on a continuation line of a -nested block quote: +The spaces after the `>` characters can be omitted: ```````````````````````````````` example -> > > foo -bar +># Foo +>bar +> baz .
-
-
-

foo -bar

-
-
+

Foo

+

bar +baz

```````````````````````````````` +The `>` characters can be indented 1-3 spaces: + ```````````````````````````````` example ->>> foo -> bar ->>baz + > # Foo + > bar + > baz .
-
-
-

foo -bar +

Foo

+

bar baz

-
-
```````````````````````````````` -When including an indented code block in a block quote, -remember that the [block quote marker] includes -both the `>` and a following space. So *five spaces* are needed after -the `>`: +Four spaces gives us a code block: ```````````````````````````````` example -> code - -> not code + > # Foo + > bar + > baz . -
-
code
+
> # Foo
+> bar
+> baz
 
-
-
-

not code

-
```````````````````````````````` +The Laziness clause allows us to omit the `>` before +[paragraph continuation text]: -## List items - -A [list marker](@) is a -[bullet list marker] or an [ordered list marker]. - -A [bullet list marker](@) -is a `-`, `+`, or `*` character. - -An [ordered list marker](@) -is a sequence of 1--9 arabic digits (`0-9`), followed by either a -`.` character or a `)` character. (The reason for the length -limit is that with 10 digits we start seeing integer overflows -in some browsers.) - -The following rules define [list items]: - -1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of - blocks *Bs* starting with a [non-whitespace character], and *M* is a - list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result - of prepending *M* and the following spaces to the first line of - *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a - list item with *Bs* as its contents. The type of the list item - (bullet or ordered) is determined by the type of its list marker. - If the list item is ordered, then it is also assigned a start - number, based on the ordered list marker. - - Exceptions: +```````````````````````````````` example +> # Foo +> bar +baz +. +
+

Foo

+

bar +baz

+
+```````````````````````````````` - 1. When the first list item in a [list] interrupts - a paragraph---that is, when it starts on a line that would - otherwise count as [paragraph continuation text]---then (a) - the lines *Ls* must not begin with a blank line, and (b) if - the list item is ordered, the start number must be 1. - 2. If any line is a [thematic break][thematic breaks] then - that line is not a list item. -For example, let *Ls* be the lines +A block quote can contain some lazy and some non-lazy +continuation lines: ```````````````````````````````` example -A paragraph -with two lines. - - indented code - -> A block quote. +> bar +baz +> foo . -

A paragraph -with two lines.

-
indented code
-
-

A block quote.

+

bar +baz +foo

```````````````````````````````` -And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says -that the following is an ordered list item with start number 1, -and the same contents as *Ls*: +Laziness only applies to lines that would have been continuations of +paragraphs had they been prepended with [block quote markers]. +For example, the `> ` cannot be omitted in the second line of -```````````````````````````````` example -1. A paragraph - with two lines. +``` markdown +> foo +> --- +``` - indented code +without changing the meaning: - > A block quote. +```````````````````````````````` example +> foo +--- . -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -

    A block quote.

    +

    foo

    -
  2. -
+
```````````````````````````````` -The most important thing to notice is that the position of -the text after the list marker determines how much indentation -is needed in subsequent blocks in the list item. If the list -marker takes up two spaces, and there are three spaces between -the list marker and the next [non-whitespace character], then blocks -must be indented five spaces in order to fall under the list -item. +Similarly, if we omit the `> ` in the second line of -Here are some examples showing how far content must be indented to be -put under the list item: +``` markdown +> - foo +> - bar +``` -```````````````````````````````` example -- one +then the block quote ends after the first line: - two +```````````````````````````````` example +> - foo +- bar . +
    -
  • one
  • +
  • foo
  • +
+
+
    +
  • bar
-

two

```````````````````````````````` -```````````````````````````````` example -- one - - two -. -
    -
  • -

    one

    -

    two

    -
  • -
-```````````````````````````````` - +For the same reason, we can't omit the `> ` in front of +subsequent lines of an indented or fenced code block: ```````````````````````````````` example - - one - - two +> foo + bar . -
    -
  • one
  • -
-
 two
+
+
foo
+
+
+
bar
 
```````````````````````````````` ```````````````````````````````` example - - one - - two +> ``` +foo +``` . -
    -
  • -

    one

    -

    two

    -
  • -
+
+
+
+

foo

+
```````````````````````````````` -It is tempting to think of this in terms of columns: the continuation -blocks must be indented at least to the column of the first -[non-whitespace character] after the list marker. However, that is not quite right. -The spaces after the list marker determine how much relative indentation -is needed. Which column this indentation reaches will depend on -how the list item is embedded in other constructions, as shown by -this example: +Note that in the following case, we have a [lazy +continuation line]: ```````````````````````````````` example - > > 1. one ->> ->> two +> foo + - bar .
-
-
    -
  1. -

    one

    -

    two

    -
  2. -
-
+

foo +- bar

```````````````````````````````` -Here `two` occurs in the same column as the list marker `1.`, -but is actually contained in the list item, because there is -sufficient indentation after the last containing blockquote marker. +To see why, note that in -The converse is also possible. In the following example, the word `two` -occurs far to the right of the initial text of the list item, `one`, but -it is not considered part of the list item, because it is not indented -far enough past the blockquote marker: +```markdown +> foo +> - bar +``` + +the `- bar` is indented too far to start a list, and can't +be an indented code block because indented code blocks cannot +interrupt paragraphs, so it is [paragraph continuation text]. + +A block quote can be empty: ```````````````````````````````` example ->>- one ->> - > > two +> .
-
-
    -
  • one
  • -
-

two

-
```````````````````````````````` -Note that at least one space is needed between the list marker and -any following content, so these are not list items: - ```````````````````````````````` example --one - -2.two +> +> +> . -

-one

-

2.two

+
+
```````````````````````````````` -A list item may contain blocks that are separated by more than -one blank line. +A block quote can have initial or final blank lines: ```````````````````````````````` example -- foo - - - bar +> +> foo +> . -
    -
  • +

    foo

    -

    bar

    -
  • -
+ ```````````````````````````````` -A list item may contain any kind of block: +A blank line always separates block quotes: ```````````````````````````````` example -1. foo - - ``` - bar - ``` - - baz +> foo - > bam +> bar . -
    -
  1. +

    foo

    -
    bar
    -
    -

    baz

    +
    -

    bam

    +

    bar

    -
  2. -
```````````````````````````````` -A list item that contains an indented code block will preserve -empty lines within the code block verbatim. +(Most current Markdown implementations, including John Gruber's +original `Markdown.pl`, will parse this example as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.) + +Consecutiveness means that if we put these block quotes together, +we get a single block quote: ```````````````````````````````` example -- Foo +> foo +> bar +. +
+

foo +bar

+
+```````````````````````````````` - bar +To get a block quote with two paragraphs, use: - baz +```````````````````````````````` example +> foo +> +> bar . -
    -
  • -

    Foo

    -
    bar
    -
    -
    -baz
    -
    -
  • -
+
+

foo

+

bar

+
```````````````````````````````` -Note that ordered list start numbers must be nine digits or less: + +Block quotes can interrupt paragraphs: ```````````````````````````````` example -123456789. ok +foo +> bar . -
    -
  1. ok
  2. -
+

foo

+
+

bar

+
```````````````````````````````` +In general, blank lines are not needed before or after block +quotes: + ```````````````````````````````` example -1234567890. not ok +> aaa +*** +> bbb . -

1234567890. not ok

+
+

aaa

+
+
+
+

bbb

+
```````````````````````````````` -A start number may begin with 0s: +However, because of laziness, a blank line is needed between +a block quote and a following paragraph: ```````````````````````````````` example -0. ok +> bar +baz . -
    -
  1. ok
  2. -
+
+

bar +baz

+
```````````````````````````````` ```````````````````````````````` example -003. ok +> bar + +baz . -
    -
  1. ok
  2. -
+
+

bar

+
+

baz

```````````````````````````````` -A start number may not be negative: - ```````````````````````````````` example --1. not ok +> bar +> +baz . -

-1. not ok

+
+

bar

+
+

baz

```````````````````````````````` +It is a consequence of the Laziness rule that any number +of initial `>`s may be omitted on a continuation line of a +nested block quote: -2. **Item starting with indented code.** If a sequence of lines *Ls* - constitute a sequence of blocks *Bs* starting with an indented code - block, and *M* is a list marker of width *W* followed by - one space, then the result of prepending *M* and the following - space to the first line of *Ls*, and indenting subsequent lines of - *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. - If a line is empty, then it need not be indented. The type of the - list item (bullet or ordered) is determined by the type of its list - marker. If the list item is ordered, then it is also assigned a - start number, based on the ordered list marker. +```````````````````````````````` example +> > > foo +bar +. +
+
+
+

foo +bar

+
+
+
+```````````````````````````````` -An indented code block will have to be indented four spaces beyond -the edge of the region where text will be included in the list item. -In the following case that is 6 spaces: ```````````````````````````````` example -- foo - - bar +>>> foo +> bar +>>baz . -
    -
  • -

    foo

    -
    bar
    -
    -
  • -
+
+
+
+

foo +bar +baz

+
+
+
```````````````````````````````` -And in this case it is 11 spaces: +When including an indented code block in a block quote, +remember that the [block quote marker] includes +both the `>` and a following space. So *five spaces* are needed after +the `>`: ```````````````````````````````` example - 10. foo +> code - bar +> not code . -
    -
  1. -

    foo

    -
    bar
    +
    +
    code
     
    -
  2. -
+ +
+

not code

+
```````````````````````````````` -If the *first* block in the list item is an indented code block, -then by rule #2, the contents must be indented *one* space after the -list marker: -```````````````````````````````` example - indented code +## List items -paragraph +A [list marker](@) is a +[bullet list marker] or an [ordered list marker]. - more code -. -
indented code
-
-

paragraph

-
more code
-
-```````````````````````````````` +A [bullet list marker](@) +is a `-`, `+`, or `*` character. + +An [ordered list marker](@) +is a sequence of 1--9 arabic digits (`0-9`), followed by either a +`.` character or a `)` character. (The reason for the length +limit is that with 10 digits we start seeing integer overflows +in some browsers.) + +The following rules define [list items]: + +1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of + blocks *Bs* starting with a [non-whitespace character], and *M* is a + list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result + of prepending *M* and the following spaces to the first line of + *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a + list item with *Bs* as its contents. The type of the list item + (bullet or ordered) is determined by the type of its list marker. + If the list item is ordered, then it is also assigned a start + number, based on the ordered list marker. + + Exceptions: + + 1. When the first list item in a [list] interrupts + a paragraph---that is, when it starts on a line that would + otherwise count as [paragraph continuation text]---then (a) + the lines *Ls* must not begin with a blank line, and (b) if + the list item is ordered, the start number must be 1. + 2. If any line is a [thematic break][thematic breaks] then + that line is not a list item. +For example, let *Ls* be the lines ```````````````````````````````` example -1. indented code +A paragraph +with two lines. - paragraph + indented code - more code +> A block quote. . -
    -
  1. +

    A paragraph +with two lines.

    indented code
     
    -

    paragraph

    -
    more code
    -
    -
  2. -
+
+

A block quote.

+
```````````````````````````````` -Note that an additional space indent is interpreted as space -inside the code block: +And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as *Ls*: ```````````````````````````````` example -1. indented code +1. A paragraph + with two lines. - paragraph + indented code - more code + > A block quote. .
  1. -
     indented code
    -
    -

    paragraph

    -
    more code
    +

    A paragraph +with two lines.

    +
    indented code
     
    +
    +

    A block quote.

    +
```````````````````````````````` -Note that rules #1 and #2 only apply to two cases: (a) cases -in which the lines to be included in a list item begin with a -[non-whitespace character], and (b) cases in which -they begin with an indented code -block. In a case like the following, where the first block begins with -a three-space indent, the rules do not allow us to form a list item by -indenting the whole thing and prepending a list marker: - -```````````````````````````````` example - foo - -bar -. -

foo

-

bar

-```````````````````````````````` +The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces, and there are three spaces between +the list marker and the next [non-whitespace character], then blocks +must be indented five spaces in order to fall under the list +item. +Here are some examples showing how far content must be indented to be +put under the list item: ```````````````````````````````` example -- foo +- one - bar + two .
    -
  • foo
  • +
  • one
-

bar

+

two

```````````````````````````````` -This is not a significant restriction, because when a block begins -with 1-3 spaces indent, the indentation can always be removed without -a change in interpretation, allowing rule #1 to be applied. So, in -the above case: - ```````````````````````````````` example -- foo +- one - bar + two .
  • -

    foo

    -

    bar

    +

    one

    +

    two

```````````````````````````````` -3. **Item starting with a blank line.** If a sequence of lines *Ls* - starting with a single [blank line] constitute a (possibly empty) - sequence of blocks *Bs*, not separated from each other by more than - one blank line, and *M* is a list marker of width *W*, - then the result of prepending *M* to the first line of *Ls*, and - indenting subsequent lines of *Ls* by *W + 1* spaces, is a list - item with *Bs* as its contents. - If a line is empty, then it need not be indented. The type of the - list item (bullet or ordered) is determined by the type of its list - marker. If the list item is ordered, then it is also assigned a - start number, based on the ordered list marker. - -Here are some list items that start with a blank line but are not empty: - ```````````````````````````````` example -- - foo -- - ``` - bar - ``` -- - baz + - one + + two .
    -
  • foo
  • -
  • -
    bar
    -
    -
  • -
  • -
    baz
    -
    -
  • +
  • one
+
 two
+
```````````````````````````````` -When the list item starts with a blank line, the number of spaces -following the list marker doesn't change the required indentation: ```````````````````````````````` example -- - foo + - one + + two .
    -
  • foo
  • +
  • +

    one

    +

    two

    +
```````````````````````````````` -A list item can begin with at most one blank line. -In the following example, `foo` is not part of the list -item: - -```````````````````````````````` example -- - - foo -. -
    -
  • -
-

foo

-```````````````````````````````` - - -Here is an empty bullet list item: - -```````````````````````````````` example -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-```````````````````````````````` - - -It does not matter whether there are spaces following the [list marker]: - -```````````````````````````````` example -- foo -- -- bar -. -
    -
  • foo
  • -
  • -
  • bar
  • -
-```````````````````````````````` - - -Here is an empty ordered list item: +It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first +[non-whitespace character] after the list marker. However, that is not quite right. +The spaces after the list marker determine how much relative indentation +is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as shown by +this example: ```````````````````````````````` example -1. foo -2. -3. bar + > > 1. one +>> +>> two . +
+
    -
  1. foo
  2. -
  3. -
  4. bar
  5. +
  6. +

    one

    +

    two

    +
+
+
```````````````````````````````` -A list may start or end with an empty list item: +Here `two` occurs in the same column as the list marker `1.`, +but is actually contained in the list item, because there is +sufficient indentation after the last containing blockquote marker. + +The converse is also possible. In the following example, the word `two` +occurs far to the right of the initial text of the list item, `one`, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker: ```````````````````````````````` example -* +>>- one +>> + > > two . +
+
    -
  • +
  • one
+

two

+
+
```````````````````````````````` -However, an empty list item cannot interrupt a paragraph: + +Note that at least one space is needed between the list marker and +any following content, so these are not list items: ```````````````````````````````` example -foo -* +-one -foo -1. +2.two . -

foo -*

-

foo -1.

+

-one

+

2.two

```````````````````````````````` -4. **Indentation.** If a sequence of lines *Ls* constitutes a list item - according to rule #1, #2, or #3, then the result of indenting each line - of *Ls* by 1-3 spaces (the same for each line) also constitutes a - list item with the same contents and attributes. If a line is - empty, then it need not be indented. - -Indented one space: +A list item may contain blocks that are separated by more than +one blank line. ```````````````````````````````` example - 1. A paragraph - with two lines. +- foo - indented code - > A block quote. + bar . -
    +
    • -

      A paragraph -with two lines.

      -
      indented code
      -
      -
      -

      A block quote.

      -
      +

      foo

      +

      bar

    • -
+ ```````````````````````````````` -Indented two spaces: +A list item may contain any kind of block: ```````````````````````````````` example - 1. A paragraph - with two lines. +1. foo - indented code + ``` + bar + ``` - > A block quote. + baz + + > bam .
  1. -

    A paragraph -with two lines.

    -
    indented code
    +

    foo

    +
    bar
     
    +

    baz

    -

    A block quote.

    +

    bam

```````````````````````````````` -Indented three spaces: +A list item that contains an indented code block will preserve +empty lines within the code block verbatim. ```````````````````````````````` example - 1. A paragraph - with two lines. +- Foo - indented code + bar - > A block quote. + + baz . -
    +
    • -

      A paragraph -with two lines.

      -
      indented code
      +

      Foo

      +
      bar
      +
      +
      +baz
       
      -
      -

      A block quote.

      -
    • -
+ ```````````````````````````````` - -Four spaces indent gives a code block: +Note that ordered list start numbers must be nine digits or less: ```````````````````````````````` example - 1. A paragraph - with two lines. - - indented code - - > A block quote. +123456789. ok . -
1.  A paragraph
-    with two lines.
-
-        indented code
-
-    > A block quote.
-
+
    +
  1. ok
  2. +
```````````````````````````````` +```````````````````````````````` example +1234567890. not ok +. +

1234567890. not ok

+```````````````````````````````` -5. **Laziness.** If a string of lines *Ls* constitute a [list - item](#list-items) with contents *Bs*, then the result of deleting - some or all of the indentation from one or more lines in which the - next [non-whitespace character] after the indentation is - [paragraph continuation text] is a - list item with the same contents and attributes. The unindented - lines are called - [lazy continuation line](@)s. -Here is an example with [lazy continuation lines]: +A start number may begin with 0s: ```````````````````````````````` example - 1. A paragraph -with two lines. - - indented code - - > A block quote. +0. ok . -
    -
  1. -

    A paragraph -with two lines.

    -
    indented code
    -
    -
    -

    A block quote.

    -
    -
  2. +
      +
    1. ok
    ```````````````````````````````` -Indentation can be partially deleted: - ```````````````````````````````` example - 1. A paragraph - with two lines. +003. ok . -
      -
    1. A paragraph -with two lines.
    2. +
        +
      1. ok
      ```````````````````````````````` -These examples show how laziness can work in nested structures: +A start number may not be negative: ```````````````````````````````` example -> 1. > Blockquote -continued here. +-1. not ok . -
      -
        -
      1. -
        -

        Blockquote -continued here.

        -
        -
      2. -
      -
      -```````````````````````````````` - - -```````````````````````````````` example -> 1. > Blockquote -> continued here. -. -
      -
        -
      1. -
        -

        Blockquote -continued here.

        -
        -
      2. -
      -
      +

      -1. not ok

      ```````````````````````````````` -6. **That's all.** Nothing that is not counted as a list item by rules - #1--5 counts as a [list item](#list-items). - -The rules for sublists follow from the general rules -[above][List items]. A sublist must be indented the same number -of spaces a paragraph would need to be in order to be included -in the list item. +2. **Item starting with indented code.** If a sequence of lines *Ls* + constitute a sequence of blocks *Bs* starting with an indented code + block, and *M* is a list marker of width *W* followed by + one space, then the result of prepending *M* and the following + space to the first line of *Ls*, and indenting subsequent lines of + *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. -So, in this case we need two spaces indent: +An indented code block will have to be indented four spaces beyond +the edge of the region where text will be included in the list item. +In the following case that is 6 spaces: ```````````````````````````````` example - foo - - bar - - baz - - boo + + bar .
        -
      • foo -
          -
        • bar -
            -
          • baz -
              -
            • boo
            • -
            -
          • -
          -
        • -
        +
      • +

        foo

        +
        bar
        +
      ```````````````````````````````` -One is not enough: +And in this case it is 11 spaces: ```````````````````````````````` example -- foo - - bar - - baz - - boo + 10. foo + + bar . -
        -
      • foo
      • -
      • bar
      • -
      • baz
      • -
      • boo
      • -
      +
        +
      1. +

        foo

        +
        bar
        +
        +
      2. +
      ```````````````````````````````` -Here we need four, because the list marker is wider: +If the *first* block in the list item is an indented code block, +then by rule #2, the contents must be indented *one* space after the +list marker: ```````````````````````````````` example -10) foo - - bar + indented code + +paragraph + + more code . -
        -
      1. foo -
          -
        • bar
        • -
        +
        indented code
        +
        +

        paragraph

        +
        more code
        +
        +```````````````````````````````` + + +```````````````````````````````` example +1. indented code + + paragraph + + more code +. +
          +
        1. +
          indented code
          +
          +

          paragraph

          +
          more code
          +
        ```````````````````````````````` -Three is not enough: +Note that an additional space indent is interpreted as space +inside the code block: ```````````````````````````````` example -10) foo - - bar +1. indented code + + paragraph + + more code . -
          -
        1. foo
        2. +
            +
          1. +
             indented code
            +
            +

            paragraph

            +
            more code
            +
            +
          -
            -
          • bar
          • -
          ```````````````````````````````` -A list may be the first block in a list item: +Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a +[non-whitespace character], and (b) cases in which +they begin with an indented code +block. In a case like the following, where the first block begins with +a three-space indent, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker: ```````````````````````````````` example -- - foo + foo + +bar +. +

          foo

          +

          bar

          +```````````````````````````````` + + +```````````````````````````````` example +- foo + + bar . -
            -
            • foo
            -
          • -
          +

          bar

          ```````````````````````````````` +This is not a significant restriction, because when a block begins +with 1-3 spaces indent, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case: + ```````````````````````````````` example -1. - 2. foo +- foo + + bar . -
            -
            • -
                -
              1. foo
              2. -
              +

              foo

              +

              bar

            -
          1. -
          ```````````````````````````````` -A list item can contain a heading: +3. **Item starting with a blank line.** If a sequence of lines *Ls* + starting with a single [blank line] constitute a (possibly empty) + sequence of blocks *Bs*, not separated from each other by more than + one blank line, and *M* is a list marker of width *W*, + then the result of prepending *M* to the first line of *Ls*, and + indenting subsequent lines of *Ls* by *W + 1* spaces, is a list + item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. + +Here are some list items that start with a blank line but are not empty: ```````````````````````````````` example -- # Foo -- Bar - --- - baz +- + foo +- + ``` + bar + ``` +- + baz .
            +
          • foo
          • -

            Foo

            +
            bar
            +
          • -

            Bar

            -baz
          • +
            baz
            +
            +
          ```````````````````````````````` +When the list item starts with a blank line, the number of spaces +following the list marker doesn't change the required indentation: -### Motivation +```````````````````````````````` example +- + foo +. +
            +
          • foo
          • +
          +```````````````````````````````` -John Gruber's Markdown spec says the following about list items: -1. "List markers typically start at the left margin, but may be indented - by up to three spaces. List markers must be followed by one or more - spaces or a tab." +A list item can begin with at most one blank line. +In the following example, `foo` is not part of the list +item: -2. "To make lists look nice, you can wrap items with hanging indents.... - But if you don't want to, you don't have to." +```````````````````````````````` example +- -3. "List items may consist of multiple paragraphs. Each subsequent - paragraph in a list item must be indented by either 4 spaces or one - tab." + foo +. +
            +
          • +
          +

          foo

          +```````````````````````````````` -4. "It looks nice if you indent every line of the subsequent paragraphs, - but here again, Markdown will allow you to be lazy." -5. "To put a blockquote within a list item, the blockquote's `>` - delimiters need to be indented." - -6. "To put a code block within a list item, the code block needs to be - indented twice — 8 spaces or two tabs." - -These rules specify that a paragraph under a list item must be indented -four spaces (presumably, from the left margin, rather than the start of -the list marker, but this is not said), and that code under a list item -must be indented eight spaces instead of the usual four. They also say -that a block quote must be indented, but not by how much; however, the -example given has four spaces indentation. Although nothing is said -about other kinds of block-level content, it is certainly reasonable to -infer that *all* block elements under a list item, including other -lists, must be indented four spaces. This principle has been called the -*four-space rule*. - -The four-space rule is clear and principled, and if the reference -implementation `Markdown.pl` had followed it, it probably would have -become the standard. However, `Markdown.pl` allowed paragraphs and -sublists to start with only two spaces indentation, at least on the -outer level. Worse, its behavior was inconsistent: a sublist of an -outer-level list needed two spaces indentation, but a sublist of this -sublist needed three spaces. It is not surprising, then, that different -implementations of Markdown have developed very different rules for -determining what comes under a list item. (Pandoc and python-Markdown, -for example, stuck with Gruber's syntax description and the four-space -rule, while discount, redcarpet, marked, PHP Markdown, and others -followed `Markdown.pl`'s behavior more closely.) - -Unfortunately, given the divergences between implementations, there -is no way to give a spec for list items that will be guaranteed not -to break any existing documents. However, the spec given here should -correctly handle lists formatted with either the four-space rule or -the more forgiving `Markdown.pl` behavior, provided they are laid out -in a way that is natural for a human to read. - -The strategy here is to let the width and indentation of the list marker -determine the indentation necessary for blocks to fall under the list -item, rather than having a fixed and arbitrary number. The writer can -think of the body of the list item as a unit which gets indented to the -right enough to fit the list marker (and any indentation on the list -marker). (The laziness rule, #5, then allows continuation lines to be -unindented if needed.) - -This rule is superior, we claim, to any rule requiring a fixed level of -indentation from the margin. The four-space rule is clear but -unnatural. It is quite unintuitive that +Here is an empty bullet list item: -``` markdown +```````````````````````````````` example - foo - - bar - - - baz -``` - -should be parsed as two lists with an intervening paragraph, - -``` html +- +- bar +.
          • foo
          • +
          • +
          • bar
          -

          bar

          -
            -
          • baz
          • -
          -``` +```````````````````````````````` -as the four-space rule demands, rather than a single list, -``` html -
            -
          • -

            foo

            -

            bar

            +It does not matter whether there are spaces following the [list marker]: + +```````````````````````````````` example +- foo +- +- bar +.
              -
            • baz
            • -
            -
          • +
          • foo
          • +
          • +
          • bar
          -``` - -The choice of four spaces is arbitrary. It can be learned, but it is -not likely to be guessed, and it trips up beginners regularly. - -Would it help to adopt a two-space rule? The problem is that such -a rule, together with the rule allowing 1--3 spaces indentation of the -initial list marker, allows text that is indented *less than* the -original list marker to be included in the list item. For example, -`Markdown.pl` parses - -``` markdown - - one - - two -``` +```````````````````````````````` -as a single list item, with `two` a continuation paragraph: -``` html -
            -
          • -

            one

            -

            two

            -
          • -
          -``` +Here is an empty ordered list item: -and similarly +```````````````````````````````` example +1. foo +2. +3. bar +. +
            +
          1. foo
          2. +
          3. +
          4. bar
          5. +
          +```````````````````````````````` -``` markdown -> - one -> -> two -``` -as +A list may start or end with an empty list item: -``` html -
          +```````````````````````````````` example +* +.
            -
          • -

            one

            -

            two

            -
          • +
          -
          -``` - -This is extremely unintuitive. - -Rather than requiring a fixed indent from the margin, we could require -a fixed indent (say, two spaces, or even one space) from the list marker (which -may itself be indented). This proposal would remove the last anomaly -discussed. Unlike the spec presented above, it would count the following -as a list item with a subparagraph, even though the paragraph `bar` -is not indented as far as the first paragraph `foo`: - -``` markdown - 10. foo - - bar -``` - -Arguably this text does read like a list item with `bar` as a subparagraph, -which may count in favor of the proposal. However, on this proposal indented -code would have to be indented six spaces after the list marker. And this -would break a lot of existing Markdown, which has the pattern: +```````````````````````````````` -``` markdown -1. foo +However, an empty list item cannot interrupt a paragraph: - indented code -``` +```````````````````````````````` example +foo +* -where the code is indented eight spaces. The spec above, by contrast, will -parse this text as expected, since the code block's indentation is measured -from the beginning of `foo`. +foo +1. +. +

          foo +*

          +

          foo +1.

          +```````````````````````````````` -The one case that needs special treatment is a list item that *starts* -with indented code. How much indentation is required in that case, since -we don't have a "first paragraph" to measure from? Rule #2 simply stipulates -that in such cases, we require one space indentation from the list marker -(and then the normal four spaces for the indented code). This will match the -four-space rule in cases where the list marker plus its initial indentation -takes four spaces (a common case), but diverge in other cases. -## Lists +4. **Indentation.** If a sequence of lines *Ls* constitutes a list item + according to rule #1, #2, or #3, then the result of indenting each line + of *Ls* by 1-3 spaces (the same for each line) also constitutes a + list item with the same contents and attributes. If a line is + empty, then it need not be indented. -A [list](@) is a sequence of one or more -list items [of the same type]. The list items -may be separated by any number of blank lines. +Indented one space: -Two list items are [of the same type](@) -if they begin with a [list marker] of the same type. -Two list markers are of the -same type if (a) they are bullet list markers using the same character -(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same -delimiter (either `.` or `)`). +```````````````````````````````` example + 1. A paragraph + with two lines. -A list is an [ordered list](@) -if its constituent list items begin with -[ordered list markers], and a -[bullet list](@) if its constituent list -items begin with [bullet list markers]. + indented code -The [start number](@) -of an [ordered list] is determined by the list number of -its initial list item. The numbers of subsequent list items are -disregarded. + > A block quote. +. +
            +
          1. +

            A paragraph +with two lines.

            +
            indented code
            +
            +
            +

            A block quote.

            +
            +
          2. +
          +```````````````````````````````` -A list is [loose](@) if any of its constituent -list items are separated by blank lines, or if any of its constituent -list items directly contain two block-level elements with a blank line -between them. Otherwise a list is [tight](@). -(The difference in HTML output is that paragraphs in a loose list are -wrapped in `

          ` tags, while paragraphs in a tight list are not.) -Changing the bullet or ordered list delimiter starts a new list: +Indented two spaces: ```````````````````````````````` example -- foo -- bar -+ baz -. -

            -
          • foo
          • -
          • bar
          • -
          -
            -
          • baz
          • -
          -```````````````````````````````` + 1. A paragraph + with two lines. + indented code -```````````````````````````````` example -1. foo -2. bar -3) baz + > A block quote. .
            -
          1. foo
          2. -
          3. bar
          4. -
          -
            -
          1. baz
          2. +
          3. +

            A paragraph +with two lines.

            +
            indented code
            +
            +
            +

            A block quote.

            +
            +
          ```````````````````````````````` -In CommonMark, a list can interrupt a paragraph. That is, -no blank line is needed to separate a paragraph from a following -list: +Indented three spaces: ```````````````````````````````` example -Foo -- bar -- baz -. -

          Foo

          -
            -
          • bar
          • -
          • baz
          • -
          -```````````````````````````````` - -`Markdown.pl` does not allow this, through fear of triggering a list -via a numeral in a hard-wrapped line: + 1. A paragraph + with two lines. -``` markdown -The number of windows in my house is -14. The number of doors is 6. -``` + indented code -Oddly, though, `Markdown.pl` *does* allow a blockquote to -interrupt a paragraph, even though the same considerations might -apply. + > A block quote. +. +
            +
          1. +

            A paragraph +with two lines.

            +
            indented code
            +
            +
            +

            A block quote.

            +
            +
          2. +
          +```````````````````````````````` -In CommonMark, we do allow lists to interrupt paragraphs, for -two reasons. First, it is natural and not uncommon for people -to start lists without blank lines: -``` markdown -I need to buy -- new shoes -- a coat -- a plane ticket -``` +Four spaces indent gives a code block: -Second, we are attracted to a +```````````````````````````````` example + 1. A paragraph + with two lines. -> [principle of uniformity](@): -> if a chunk of text has a certain -> meaning, it will continue to have the same meaning when put into a -> container block (such as a list item or blockquote). + indented code -(Indeed, the spec for [list items] and [block quotes] presupposes -this principle.) This principle implies that if + > A block quote. +. +
          1.  A paragraph
          +    with two lines.
           
          -``` markdown
          -  * I need to buy
          -    - new shoes
          -    - a coat
          -    - a plane ticket
          -```
          +        indented code
           
          -is a list item containing a paragraph followed by a nested sublist,
          -as all Markdown implementations agree it is (though the paragraph
          -may be rendered without `

          ` tags, since the list is "tight"), -then + > A block quote. +

          +```````````````````````````````` -``` markdown -I need to buy -- new shoes -- a coat -- a plane ticket -``` -by itself should be a paragraph followed by a nested sublist. -Since it is well established Markdown practice to allow lists to -interrupt paragraphs inside list items, the [principle of -uniformity] requires us to allow this outside list items as -well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) -takes a different approach, requiring blank lines before lists -even inside other list items.) +5. **Laziness.** If a string of lines *Ls* constitute a [list + item](#list-items) with contents *Bs*, then the result of deleting + some or all of the indentation from one or more lines in which the + next [non-whitespace character] after the indentation is + [paragraph continuation text] is a + list item with the same contents and attributes. The unindented + lines are called + [lazy continuation line](@)s. -In order to solve of unwanted lists in paragraphs with -hard-wrapped numerals, we allow only lists starting with `1` to -interrupt paragraphs. Thus, +Here is an example with [lazy continuation lines]: ```````````````````````````````` example -The number of windows in my house is -14. The number of doors is 6. -. -

          The number of windows in my house is -14. The number of doors is 6.

          -```````````````````````````````` + 1. A paragraph +with two lines. -We may still get an unintended result in cases like + indented code -```````````````````````````````` example -The number of windows in my house is -1. The number of doors is 6. + > A block quote. . -

          The number of windows in my house is

            -
          1. The number of doors is 6.
          2. +
          3. +

            A paragraph +with two lines.

            +
            indented code
            +
            +
            +

            A block quote.

            +
            +
          ```````````````````````````````` -but this rule should prevent most spurious list captures. -There can be any number of blank lines between items: +Indentation can be partially deleted: ```````````````````````````````` example -- foo + 1. A paragraph + with two lines. +. +
            +
          1. A paragraph +with two lines.
          2. +
          +```````````````````````````````` -- bar +These examples show how laziness can work in nested structures: -- baz +```````````````````````````````` example +> 1. > Blockquote +continued here. . -
            -
          • -

            foo

            -
          • +
            +
            1. -

              bar

              +
              +

              Blockquote +continued here.

              +
            2. +
            +
            +```````````````````````````````` + + +```````````````````````````````` example +> 1. > Blockquote +> continued here. +. +
            +
            1. -

              baz

              +
              +

              Blockquote +continued here.

              +
            2. -
          +
        + ```````````````````````````````` + + +6. **That's all.** Nothing that is not counted as a list item by rules + #1--5 counts as a [list item](#list-items). + +The rules for sublists follow from the general rules +[above][List items]. A sublist must be indented the same number +of spaces a paragraph would need to be in order to be included +in the list item. + +So, in this case we need two spaces indent: + ```````````````````````````````` example - foo - bar - baz - - - bim + - boo .
        • foo
          • bar
              -
            • -

              baz

              -

              bim

              +
            • baz +
                +
              • boo
              • +
          • @@ -5072,778 +4914,938 @@ There can be any number of blank lines between items: ```````````````````````````````` -To separate consecutive lists of the same type, or to separate a -list from an indented code block that would otherwise be parsed -as a subparagraph of the final list item, you can insert a blank HTML -comment: +One is not enough: ```````````````````````````````` example - foo -- bar - - - -- baz -- bim + - bar + - baz + - boo .
            • foo
            • bar
            • -
            - -
            • baz
            • -
            • bim
            • -
            -```````````````````````````````` - - -```````````````````````````````` example -- foo - - notcode - -- foo - - - - code -. -
              -
            • -

              foo

              -

              notcode

              -
            • -
            • -

              foo

              -
            • -
            - -
            code
            -
            -```````````````````````````````` - - -List items need not be indented to the same level. The following -list items will be treated as items at the same list level, -since none is indented enough to belong to the previous list -item: - -```````````````````````````````` example -- a - - b - - c - - d - - e - - f -- g -. -
              -
            • a
            • -
            • b
            • -
            • c
            • -
            • d
            • -
            • e
            • -
            • f
            • -
            • g
            • +
            • boo
            ```````````````````````````````` -```````````````````````````````` example -1. a - - 2. b - - 3. c -. -
              -
            1. -

              a

              -
            2. -
            3. -

              b

              -
            4. -
            5. -

              c

              -
            6. -
            -```````````````````````````````` - -Note, however, that list items may not be indented more than -three spaces. Here `- e` is treated as a paragraph continuation -line, because it is indented more than three spaces: +Here we need four, because the list marker is wider: ```````````````````````````````` example -- a - - b - - c - - d - - e +10) foo + - bar . +
              +
            1. foo
                -
              • a
              • -
              • b
              • -
              • c
              • -
              • d -- e
              • +
              • bar
              -```````````````````````````````` - -And here, `3. c` is treated as in indented code block, -because it is indented four spaces and preceded by a -blank line. - -```````````````````````````````` example -1. a - - 2. b - - 3. c -. -
                -
              1. -

                a

                -
              2. -
              3. -

                b

              -
              3. c
              -
              ```````````````````````````````` -This is a loose list, because there is a blank line between -two of the list items: +Three is not enough: ```````````````````````````````` example -- a -- b - -- c +10) foo + - bar . -
                -
              • -

                a

                -
              • -
              • -

                b

                -
              • -
              • -

                c

                -
              • +
                  +
                1. foo
                2. +
                +
                  +
                • bar
                ```````````````````````````````` -So is this, with a empty second item: +A list may be the first block in a list item: ```````````````````````````````` example -* a -* - -* c +- - foo .
                • -

                  a

                  -
                • -
                • -
                • -

                  c

                  +
                    +
                  • foo
                  • +
                ```````````````````````````````` -These are loose lists, even though there is no space between the items, -because one of the items directly contains two block-level elements -with a blank line between them: - ```````````````````````````````` example -- a -- b - - c -- d +1. - 2. foo . -
                  -
                • -

                  a

                  -
                • +
                  1. -

                    b

                    -

                    c

                    -
                  2. +
                    • -

                      d

                      +
                        +
                      1. foo
                      2. +
                    + +
                  ```````````````````````````````` -```````````````````````````````` example -- a -- b +A list item can contain a heading: - [ref]: /url -- d +```````````````````````````````` example +- # Foo +- Bar + --- + baz .
                  • -

                    a

                    -
                  • -
                  • -

                    b

                    +

                    Foo

                  • -

                    d

                    -
                  • +

                    Bar

                    +baz
                  ```````````````````````````````` -This is a tight list, because the blank lines are in a code block: +### Motivation -```````````````````````````````` example -- a -- ``` - b +John Gruber's Markdown spec says the following about list items: +1. "List markers typically start at the left margin, but may be indented + by up to three spaces. List markers must be followed by one or more + spaces or a tab." - ``` -- c -. +2. "To make lists look nice, you can wrap items with hanging indents.... + But if you don't want to, you don't have to." + +3. "List items may consist of multiple paragraphs. Each subsequent + paragraph in a list item must be indented by either 4 spaces or one + tab." + +4. "It looks nice if you indent every line of the subsequent paragraphs, + but here again, Markdown will allow you to be lazy." + +5. "To put a blockquote within a list item, the blockquote's `>` + delimiters need to be indented." + +6. "To put a code block within a list item, the code block needs to be + indented twice — 8 spaces or two tabs." + +These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that *all* block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +*four-space rule*. + +The four-space rule is clear and principled, and if the reference +implementation `Markdown.pl` had followed it, it probably would have +become the standard. However, `Markdown.pl` allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of Markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-Markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP Markdown, and others +followed `Markdown.pl`'s behavior more closely.) + +Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving `Markdown.pl` behavior, provided they are laid out +in a way that is natural for a human to read. + +The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #5, then allows continuation lines to be +unindented if needed.) + +This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that + +``` markdown +- foo + + bar + + - baz +``` + +should be parsed as two lists with an intervening paragraph, + +``` html
                    -
                  • a
                  • -
                  • -
                    b
                    +
                  • foo
                  • +
                  +

                  bar

                  +
                    +
                  • baz
                  • +
                  +``` +as the four-space rule demands, rather than a single list, -
+``` html +
    +
  • +

    foo

    +

    bar

    +
      +
    • baz
    • +
  • -
  • c
-```````````````````````````````` +``` +The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly. -This is a tight list, because the blank line is between two -paragraphs of a sublist. So the sublist is loose while -the outer list is tight: +Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing 1--3 spaces indentation of the +initial list marker, allows text that is indented *less than* the +original list marker to be included in the list item. For example, +`Markdown.pl` parses -```````````````````````````````` example -- a - - b +``` markdown + - one - c -- d -. -
    -
  • a + two +``` + +as a single list item, with `two` a continuation paragraph: + +``` html
    • -

      b

      -

      c

      +

      one

      +

      two

    +``` + +and similarly + +``` markdown +> - one +> +> two +``` + +as + +``` html +
    +
      +
    • +

      one

      +

      two

    • -
    • d
    -```````````````````````````````` +
    +``` + +This is extremely unintuitive. + +Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph `bar` +is not indented as far as the first paragraph `foo`: + +``` markdown + 10. foo + + bar +``` + +Arguably this text does read like a list item with `bar` as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing Markdown, which has the pattern: + +``` markdown +1. foo + + indented code +``` + +where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of `foo`. +The one case that needs special treatment is a list item that *starts* +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases. -This is a tight list, because the blank line is inside the -block quote: +## Lists -```````````````````````````````` example -* a - > b - > -* c -. -
      -
    • a -
      -

      b

      -
      -
    • -
    • c
    • -
    -```````````````````````````````` +A [list](@) is a sequence of one or more +list items [of the same type]. The list items +may be separated by any number of blank lines. +Two list items are [of the same type](@) +if they begin with a [list marker] of the same type. +Two list markers are of the +same type if (a) they are bullet list markers using the same character +(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same +delimiter (either `.` or `)`). -This list is tight, because the consecutive block elements -are not separated by blank lines: +A list is an [ordered list](@) +if its constituent list items begin with +[ordered list markers], and a +[bullet list](@) if its constituent list +items begin with [bullet list markers]. -```````````````````````````````` example -- a - > b - ``` - c - ``` -- d -. -
      -
    • a -
      -

      b

      -
      -
      c
      -
      -
    • -
    • d
    • -
    -```````````````````````````````` +The [start number](@) +of an [ordered list] is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded. +A list is [loose](@) if any of its constituent +list items are separated by blank lines, or if any of its constituent +list items directly contain two block-level elements with a blank line +between them. Otherwise a list is [tight](@). +(The difference in HTML output is that paragraphs in a loose list are +wrapped in `

    ` tags, while paragraphs in a tight list are not.) -A single-paragraph list is tight: +Changing the bullet or ordered list delimiter starts a new list: ```````````````````````````````` example -- a +- foo +- bar ++ baz .

      -
    • a
    • +
    • foo
    • +
    • bar
    -```````````````````````````````` - - -```````````````````````````````` example -- a - - b -. -
      -
    • a
        -
      • b
      • -
      -
    • +
    • baz
    ```````````````````````````````` -This list is loose, because of the blank line between the -two block elements in the list item: - ```````````````````````````````` example -1. ``` - foo - ``` - - bar +1. foo +2. bar +3) baz .
      -
    1. -
      foo
      -
      -

      bar

      -
    2. +
    3. foo
    4. +
    5. bar
    6. +
    +
      +
    1. baz
    ```````````````````````````````` -Here the outer list is loose, the inner list tight: +In CommonMark, a list can interrupt a paragraph. That is, +no blank line is needed to separate a paragraph from a following +list: ```````````````````````````````` example -* foo - * bar - - baz +Foo +- bar +- baz . -
      -
    • -

      foo

      +

      Foo

      • bar
      • -
      -

      baz

      -
    • +
    • baz
    ```````````````````````````````` +`Markdown.pl` does not allow this, through fear of triggering a list +via a numeral in a hard-wrapped line: -```````````````````````````````` example -- a - - b - - c +``` markdown +The number of windows in my house is +14. The number of doors is 6. +``` -- d - - e - - f -. -
      -
    • -

      a

      -
        -
      • b
      • -
      • c
      • -
      -
    • -
    • -

      d

      -
        -
      • e
      • -
      • f
      • -
      -
    • -
    -```````````````````````````````` +Oddly, though, `Markdown.pl` *does* allow a blockquote to +interrupt a paragraph, even though the same considerations might +apply. +In CommonMark, we do allow lists to interrupt paragraphs, for +two reasons. First, it is natural and not uncommon for people +to start lists without blank lines: -# Inlines +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` -Inlines are parsed sequentially from the beginning of the character -stream to the end (left to right, in left-to-right languages). -Thus, for example, in +Second, we are attracted to a -```````````````````````````````` example -`hi`lo` -. -

    hilo`

    -```````````````````````````````` +> [principle of uniformity](@): +> if a chunk of text has a certain +> meaning, it will continue to have the same meaning when put into a +> container block (such as a list item or blockquote). -`hi` is parsed as code, leaving the backtick at the end as a literal -backtick. +(Indeed, the spec for [list items] and [block quotes] presupposes +this principle.) This principle implies that if +``` markdown + * I need to buy + - new shoes + - a coat + - a plane ticket +``` -## Backslash escapes +is a list item containing a paragraph followed by a nested sublist, +as all Markdown implementations agree it is (though the paragraph +may be rendered without `

    ` tags, since the list is "tight"), +then -Any ASCII punctuation character may be backslash-escaped: +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` -```````````````````````````````` example -\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ -. -

    !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

    -```````````````````````````````` +by itself should be a paragraph followed by a nested sublist. +Since it is well established Markdown practice to allow lists to +interrupt paragraphs inside list items, the [principle of +uniformity] requires us to allow this outside list items as +well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) +takes a different approach, requiring blank lines before lists +even inside other list items.) -Backslashes before other characters are treated as literal -backslashes: +In order to solve of unwanted lists in paragraphs with +hard-wrapped numerals, we allow only lists starting with `1` to +interrupt paragraphs. Thus, ```````````````````````````````` example -\→\A\a\ \3\φ\« +The number of windows in my house is +14. The number of doors is 6. . -

    \→\A\a\ \3\φ\«

    +

    The number of windows in my house is +14. The number of doors is 6.

    ```````````````````````````````` - -Escaped characters are treated as regular characters and do -not have their usual Markdown meanings: +We may still get an unintended result in cases like ```````````````````````````````` example -\*not emphasized* -\
    not a tag -\[not a link](/foo) -\`not code` -1\. not a list -\* not a list -\# not a heading -\[foo]: /url "not a reference" -\ö not a character entity +The number of windows in my house is +1. The number of doors is 6. . -

    *not emphasized* -<br/> not a tag -[not a link](/foo) -`not code` -1. not a list -* not a list -# not a heading -[foo]: /url "not a reference" -&ouml; not a character entity

    +

    The number of windows in my house is

    +
      +
    1. The number of doors is 6.
    2. +
    ```````````````````````````````` +but this rule should prevent most spurious list captures. -If a backslash is itself escaped, the following character is not: +There can be any number of blank lines between items: ```````````````````````````````` example -\\*emphasis* -. -

    \emphasis

    -```````````````````````````````` +- foo +- bar -A backslash at the end of the line is a [hard line break]: -```````````````````````````````` example -foo\ -bar +- baz . -

    foo
    -bar

    +
      +
    • +

      foo

      +
    • +
    • +

      bar

      +
    • +
    • +

      baz

      +
    • +
    ```````````````````````````````` +```````````````````````````````` example +- foo + - bar + - baz -Backslash escapes do not work in code blocks, code spans, autolinks, or -raw HTML: -```````````````````````````````` example -`` \[\` `` + bim . -

    \[\`

    +
      +
    • foo +
        +
      • bar +
          +
        • +

          baz

          +

          bim

          +
        • +
        +
      • +
      +
    • +
    ```````````````````````````````` +To separate consecutive lists of the same type, or to separate a +list from an indented code block that would otherwise be parsed +as a subparagraph of the final list item, you can insert a blank HTML +comment: + ```````````````````````````````` example - \[\] -. -
    \[\]
    -
    -```````````````````````````````` +- foo +- bar + -```````````````````````````````` example -~~~ -\[\] -~~~ +- baz +- bim . -
    \[\]
    -
    +
      +
    • foo
    • +
    • bar
    • +
    + +
      +
    • baz
    • +
    • bim
    • +
    ```````````````````````````````` ```````````````````````````````` example - -. -

    http://example.com?find=\*

    -```````````````````````````````` +- foo + notcode -```````````````````````````````` example - +- foo + + + + code . - +
      +
    • +

      foo

      +

      notcode

      +
    • +
    • +

      foo

      +
    • +
    + +
    code
    +
    ```````````````````````````````` -But they work in all other contexts, including URLs and link titles, -link references, and [info strings] in [fenced code blocks]: +List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item: ```````````````````````````````` example -[foo](/bar\* "ti\*tle") +- a + - b + - c + - d + - e + - f +- g . -

    foo

    +
      +
    • a
    • +
    • b
    • +
    • c
    • +
    • d
    • +
    • e
    • +
    • f
    • +
    • g
    • +
    ```````````````````````````````` ```````````````````````````````` example -[foo] +1. a -[foo]: /bar\* "ti\*tle" + 2. b + + 3. c . -

    foo

    +
      +
    1. +

      a

      +
    2. +
    3. +

      b

      +
    4. +
    5. +

      c

      +
    6. +
    ```````````````````````````````` +Note, however, that list items may not be indented more than +three spaces. Here `- e` is treated as a paragraph continuation +line, because it is indented more than three spaces: ```````````````````````````````` example -``` foo\+bar -foo -``` +- a + - b + - c + - d + - e . -
    foo
    -
    +
      +
    • a
    • +
    • b
    • +
    • c
    • +
    • d +- e
    • +
    ```````````````````````````````` +And here, `3. c` is treated as in indented code block, +because it is indented four spaces and preceded by a +blank line. +```````````````````````````````` example +1. a -## Entity and numeric character references - -Valid HTML entity references and numeric character references -can be used in place of the corresponding Unicode character, -with the following exceptions: - -- Entity and character references are not recognized in code - blocks and code spans. + 2. b -- Entity and character references cannot stand in place of - special characters that define structural elements in - CommonMark. For example, although `*` can be used - in place of a literal `*` character, `*` cannot replace - `*` in emphasis delimiters, bullet list markers, or thematic - breaks. + 3. c +. +
      +
    1. +

      a

      +
    2. +
    3. +

      b

      +
    4. +
    +
    3. c
    +
    +```````````````````````````````` -Conforming CommonMark parsers need not store information about -whether a particular character was represented in the source -using a Unicode character or an entity reference. -[Entity references](@) consist of `&` + any of the valid -HTML5 entity names + `;`. The -document -is used as an authoritative source for the valid entity -references and their corresponding code points. +This is a loose list, because there is a blank line between +two of the list items: ```````````````````````````````` example -  & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸ +- a +- b + +- c . -

      & © Æ Ď -¾ ℋ ⅆ -∲ ≧̸

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +
    • +
    • +

      c

      +
    • +
    ```````````````````````````````` -[Decimal numeric character -references](@) -consist of `&#` + a string of 1--7 arabic digits + `;`. A -numeric character reference is parsed as the corresponding -Unicode character. Invalid Unicode code points will be replaced by -the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons, -the code point `U+0000` will also be replaced by `U+FFFD`. +So is this, with a empty second item: ```````````````````````````````` example -# Ӓ Ϡ � +* a +* + +* c . -

    # Ӓ Ϡ �

    +
      +
    • +

      a

      +
    • +
    • +
    • +

      c

      +
    • +
    ```````````````````````````````` -[Hexadecimal numeric character -references](@) consist of `&#` + -either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. -They too are parsed as the corresponding Unicode character (this -time specified with a hexadecimal numeral instead of decimal). +These are loose lists, even though there is no space between the items, +because one of the items directly contains two block-level elements +with a blank line between them: ```````````````````````````````` example -" ആ ಫ +- a +- b + + c +- d . -

    " ആ ಫ

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +

      c

      +
    • +
    • +

      d

      +
    • +
    ```````````````````````````````` -Here are some nonentities: - ```````````````````````````````` example -  &x; &#; &#x; -� -&#abcdef0; -&ThisIsNotDefined; &hi?; +- a +- b + + [ref]: /url +- d . -

    &nbsp &x; &#; &#x; -&#87654321; -&#abcdef0; -&ThisIsNotDefined; &hi?;

    +
      +
    • +

      a

      +
    • +
    • +

      b

      +
    • +
    • +

      d

      +
    • +
    ```````````````````````````````` -Although HTML5 does accept some entity references -without a trailing semicolon (such as `©`), these are not -recognized here, because it makes the grammar too ambiguous: +This is a tight list, because the blank lines are in a code block: ```````````````````````````````` example -© +- a +- ``` + b + + + ``` +- c . -

    &copy

    +
      +
    • a
    • +
    • +
      b
      +
      +
      +
      +
    • +
    • c
    • +
    ```````````````````````````````` -Strings that are not on the list of HTML5 named entities are not -recognized as entity references either: +This is a tight list, because the blank line is between two +paragraphs of a sublist. So the sublist is loose while +the outer list is tight: ```````````````````````````````` example -&MadeUpEntity; +- a + - b + + c +- d . -

    &MadeUpEntity;

    +
      +
    • a +
        +
      • +

        b

        +

        c

        +
      • +
      +
    • +
    • d
    • +
    ```````````````````````````````` -Entity and numeric character references are recognized in any -context besides code spans or code blocks, including -URLs, [link titles], and [fenced code block][] [info strings]: +This is a tight list, because the blank line is inside the +block quote: ```````````````````````````````` example - +* a + > b + > +* c . - +
      +
    • a +
      +

      b

      +
      +
    • +
    • c
    • +
    ```````````````````````````````` +This list is tight, because the consecutive block elements +are not separated by blank lines: + ```````````````````````````````` example -[foo](/föö "föö") +- a + > b + ``` + c + ``` +- d . -

    foo

    +
      +
    • a +
      +

      b

      +
      +
      c
      +
      +
    • +
    • d
    • +
    ```````````````````````````````` -```````````````````````````````` example -[foo] +A single-paragraph list is tight: -[foo]: /föö "föö" +```````````````````````````````` example +- a . -

    foo

    +
      +
    • a
    • +
    ```````````````````````````````` ```````````````````````````````` example -``` föö -foo -``` +- a + - b . -
    foo
    -
    +
      +
    • a +
        +
      • b
      • +
      +
    • +
    ```````````````````````````````` -Entity and numeric character references are treated as literal -text in code spans and code blocks: +This list is loose, because of the blank line between the +two block elements in the list item: ```````````````````````````````` example -`föö` -. -

    f&ouml;&ouml;

    -```````````````````````````````` - +1. ``` + foo + ``` -```````````````````````````````` example - föfö + bar . -
    f&ouml;f&ouml;
    +
      +
    1. +
      foo
       
      +

      bar

      +
    2. +
    ```````````````````````````````` -Entity and numeric character references cannot be used -in place of symbols indicating structure in CommonMark -documents. +Here the outer list is loose, the inner list tight: ```````````````````````````````` example -*foo* -*foo* +* foo + * bar + + baz . -

    *foo* -foo

    +
      +
    • +

      foo

      +
        +
      • bar
      • +
      +

      baz

      +
    • +
    ```````````````````````````````` + ```````````````````````````````` example -* foo +- a + - b + - c -* foo +- d + - e + - f . -

    * foo

      -
    • foo
    • +
    • +

      a

      +
        +
      • b
      • +
      • c
      • +
      +
    • +
    • +

      d

      +
        +
      • e
      • +
      • f
      • +
      +
    ```````````````````````````````` -```````````````````````````````` example -foo bar -. -

    foo -bar

    -```````````````````````````````` +# Inlines + +Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in ```````````````````````````````` example - foo +`hi`lo` . -

    →foo

    +

    hilo`

    ```````````````````````````````` +`hi` is parsed as code, leaving the backtick at the end as a literal +backtick. -```````````````````````````````` example -[a](url "tit") -. -

    [a](url "tit")

    -```````````````````````````````` ## Code spans -- cgit v1.2.3