diff options
Diffstat (limited to 'spec.txt')
-rw-r--r-- | spec.txt | 6044 |
1 files changed, 6044 insertions, 0 deletions
diff --git a/spec.txt b/spec.txt new file mode 100644 index 0000000..96721e6 --- /dev/null +++ b/spec.txt @@ -0,0 +1,6044 @@ +--- +title: Standard Markdown Spec +author: +- John MacFarlane +version: 1 +date: 2014-07-21 +... + +# Introduction + +## What is markdown? + +Markdown is a plain text format for writing structured documents, +based on conventions used for indicating formatting in email and +usenet posts. It was developed in 2004 by John Gruber, who wrote +the first markdown-to-HTML converter in perl, and it soon became +widely used in websites. By 2014 there were dozens of +implementations in many languages. Some of them extended basic +markdown syntax with conventions for footnotes, definition lists, +tables, and other constructs, and some allowed output not just in +HTML but in LaTeX and many other formats. + +## Why is a spec needed? + +John Gruber's [canonical description of markdown's +syntax](http://daringfireball.net/projects/markdown/syntax) +does not specify the syntax unambiguously. Here are some examples of +questions it does not answer: + +1. How much indentation is needed for a sublist? The spec says that + continuation paragraphs need to be indented four spaces, but is + not fully explicit about sublists. It is natural to think that + they, too, must be indented four spaces, but `Markdown.pl` does + not require that. This is hardly a "corner case," and divergences + between implementations on this issue often lead to surprises for + users in real documents. (See [this comment by John + Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) + +2. Is a blank line needed before a block quote or header? + Most implementations do not require the blank line. However, + this can lead to unexpected results in hard-wrapped text, and + also to ambiguities in parsing (note that some implementations + put the header inside the blockquote, while others do not). + (John Gruber has also spoken [in favor of requiring the blank + lines](http://article.gmane.org/gmane.text.markdown.general/2146).) + +3. What is the exact rule for determining when list items get + wrapped in `<p>` tags? Can a list be partially "loose" and partially + "tight"? What should we do with a list like this? + + ``` markdown + 1. one + + 2. two + 3. three + ``` + + Or this? + + ``` markdown + 1. one + + - a + + - b + 2. two + ``` + + (There are some relevant comments by John Gruber + [here](http://article.gmane.org/gmane.text.markdown.general/2554).) + +4. When list markers change from bullets to numbers, should we have + two lists or one? + + ``` markdown + 1. fee + 2. fie + - foe + - fum + ``` + +5. What are the precedence rules for the markers of inline structure? + For example, is the following a valid link, or does the code span + take precedence ? + + ``` markdown + [foo `](bar)` + ``` + +6. What are the precedence rules for markers of emphasis and strong + emphasis? For example, how should the following be parsed? + + ``` markdown + *foo *bar** baz* + ``` + +7. Can list items include headers? + + ``` markdown + - # Heading + ``` + +8. Can link references be defined inside block quotes or list items? + + ``` markdown + > Blockquote [foo]. + > + > [foo]: /url + ``` + +In the absence of a spec, early implementers consulted `Markdown.pl` +to resolve these ambiguities. But `Markdown.pl` was quite buggy, and +gave manifestly bad results in many cases, so it was not a +satisfactory replacement for a spec. + +Because there is no unambiguous spec, implementations have diverged +considerably. As a result, users are often surprised to find that +a document that renders one way on one system (say, a github wiki) +renders differently on another (say, converting to docbook using +pandoc). To make matters worse, because nothing in markdown counts +as a "syntax error," the divergence often isn't discovered right away. + +## About this document + +This document attempts to specify markdown syntax unambiguously. +It contains many examples with side-by-side markdown and +HTML. These are intended to double as conformance tests. An +accompanying script `runtests.pl` can be used to run the tests +against any markdown program: + + perl runtests.pl PROGRAM spec.html + +Since this document describes how markdown is to be parsed into +an abstract syntax tree, it would have made sense to use an abstract +representation of the syntax tree instead of HTML. But HTML is capable +of representing the structural distinctions we need to make, and the +choice of HTML for the tests makes it possible to run the tests against +an implementation without writing an abstract syntax tree renderer. + +This document is generated from a text file, `spec.txt`, written +in markdown with a small extension for the side-by-side tests. +The script `spec2md.pl` can be used to turn `spec.txt` into pandoc +markdown, which can then be converted into other formats. + +In the examples, the `→` character is used to represent tabs. + +# Preprocessing + +A [line](#line) <a id="line"/> +is a sequence of one or more characters followed by a line +ending (CR, LF, or CRLF, depending on the platform) or by the end of +file. + +This spec does not specify an encoding; it thinks of lines as composed +of characters rather than bytes. A conforming parser may be limited +to a certain encoding. + +Tabs in lines are expanded to spaces, with a tab stop of 4 characters: + +. +foo→baz→→bim +. +<p>foo baz bim</p> +. + +. +οὐ→χρῆν +. +<p>οὐ χρῆν</p> +. + +Line endings are replaced by newline characters (LF). + +A line containing only spaces (after tab expansion) followed by +a line ending is called a [blank line](#blank-line). <a +id="blank-line"/> + +# Blocks and inlines + +We can think of a document as a sequence of [blocks](#block)<a +id="block"/>---structural elements like paragraphs, block quotations, +lists, headers, rules, and code blocks. Blocks can contain other +blocks, or they can contain [inline](#inline)<a id="inline"/> content: +words, spaces, links, emphasized text, images, and inline code. + +## Precedence + +Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span: + +. +- `one +- two` +. +<ul> +<li>`one</li> +<li>two`</li> +</ul> +. + +This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headers, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other. + +## Container blocks and leaf blocks + +We can divide blocks into two types: +[container blocks](#container-block), <a id="container-block"/> +which can contain other blocks, and [leaf blocks](#leaf-block), +<a id="leaf-block"/> which cannot. + +# Leaf blocks + +This section describes the different kinds of leaf block that make up a +markdown document. + +## Horizontal rules + +A line consisting of 0-3 spaces of indentation, followed by a sequence +of three or more matching `-`, `_`, or `*` characters, each followed +optionally any number of spaces, forms a [horizontal +rule](#horizontal-rule). <a id="horizontal-rule"/> + +. +*** +--- +___ +. +<hr /> +<hr /> +<hr /> +. + +Wrong characters: + +. ++++ +. +<p>+++</p> +. + +. +=== +. +<p>===</p> +. + +Not enough characters: + +. +-- +** +__ +. +<p>-- +** +__</p> +. + +One to three spaces indent are allowed: + +. + *** + *** + *** +. +<hr /> +<hr /> +<hr /> +. + +Four spaces is too many: + +. + *** +. +<pre><code>*** +</code></pre> +. + +. +Foo + *** +. +<p>Foo +***</p> +. + +More than three characters may be used: + +. +_____________________________________ +. +<hr /> +. + +Spaces are allowed between the characters: + +. + - - - +. +<hr /> +. + +. + ** * ** * ** * ** +. +<hr /> +. + +. +- - - - +. +<hr /> +. + +Spaces are allowed at the end: + +. +- - - - +. +<hr /> +. + +However, no other characters may occur at the end or the +beginning: + +. +_ _ _ _ a + +a------ +. +<p>_ _ _ _ a</p> +<p>a------</p> +. + +It is required that all of the non-space characters be the same. +So, this is not a horizontal rule: + +. + *-* +. +<p><em>-</em></p> +. + +Horizontal rules do not need blank lines before or after: + +. +- foo +*** +- bar +. +<ul> +<li>foo</li> +</ul> +<hr /> +<ul> +<li>bar</li> +</ul> +. + +Horizontal rules can interrupt a paragraph: + +. +Foo +*** +bar +. +<p>Foo</p> +<hr /> +<p>bar</p> +. + +Note, however, that this is a setext header, not a paragraph followed +by a horizontal rule: + +. +Foo +--- +bar +. +<h2>Foo</h2> +<p>bar</p> +. + +When both a horizontal rule and a list item are possible +interpretations of a line, the horizontal rule is preferred: + +. +* Foo +* * * +* Bar +. +<ul> +<li>Foo</li> +</ul> +<hr /> +<ul> +<li>Bar</li> +</ul> +. + +If you want a horizontal rule in a list item, use a different bullet: + +. +- Foo +- * * * +. +<ul> +<li>Foo</li> +<li><hr /></li> +</ul> +. + +## ATX headers + +An [ATX header](#atx-header) <a id="atx-header"/> +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped `#` characters and an optional +closing sequence of any number of `#` characters. The opening sequence +of `#` characters cannot be followed directly by a nonspace character. +The closing `#` characters may be followed by spaces only. The opening +`#` character may be indented 0-3 spaces. The raw contents of the +header are stripped of leading and trailing spaces before being parsed +as inline content. The header level is equal to the number of `#` +characters in the opening sequence. + +Simple headers: + +. +# foo +## foo +### foo +#### foo +##### foo +###### foo +. +<h1>foo</h1> +<h2>foo</h2> +<h3>foo</h3> +<h4>foo</h4> +<h5>foo</h5> +<h6>foo</h6> +. + +More than six `#` characters is not a header: + +. +####### foo +. +<p>####### foo</p> +. + +A space is required between the `#` characters and the header's +contents. Note that many implementations currently do not require +the space. However, the space was required by the [original ATX +implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps +prevent things like the following from being parsed as headers: + +. +#5 bolt +. +<p>#5 bolt</p> +. + +This is not a header, because the first `#` is escaped: + +. +\## foo +. +<p>## foo</p> +. + +Contents are parsed as inlines: + +. +# foo *bar* \*baz\* +. +<h1>foo <em>bar</em> *baz*</h1> +. + +Leading and trailing blanks are ignored in parsing inline content: + +. +# foo +. +<h1>foo</h1> +. + +One to three spaces indentation are allowed: + +. + ### foo + ## foo + # foo +. +<h3>foo</h3> +<h2>foo</h2> +<h1>foo</h1> +. + +Four spaces are too much: + +. + # foo +. +<pre><code># foo +</code></pre> +. + +. +foo + # bar +. +<p>foo +# bar</p> +. + +A closing sequence of `#` characters is optional: + +. +## foo ## + ### bar ### +. +<h2>foo</h2> +<h3>bar</h3> +. + +It need not be the same length as the opening sequence: + +. +# foo ################################## +##### foo ## +. +<h1>foo</h1> +<h5>foo</h5> +. + +Spaces are allowed after the closing sequence: + +. +### foo ### +. +<h3>foo</h3> +. + +A sequence of `#` characters with a nonspace character following it +is not a closing sequence, but counts as part of the contents of the +header: + +. +### foo ### b +. +<h3>foo ### b</h3> +. + +Backslash-escaped `#` characters do not count as part +of the closing sequence: + +. +### foo \### +## foo \#\## +. +<h3>foo #</h3> +<h2>foo ##</h2> +. + +ATX headers need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs: + +. +**** +## foo +**** +. +<hr /> +<h2>foo</h2> +<hr /> +. + +. +Foo bar +# baz +Bar foo +. +<p>Foo bar</p> +<h1>baz</h1> +<p>Bar foo</p> +. + +ATX headers can be empty: + +. +## +# +### ### +. +<h2></h2> +<h1></h1> +<h3></h3> +. + +## Setext headers + +A [setext header](#setext-header) <a id="setext-header"/> +consists of a line of text, containing at least one nonspace character, +with no more than 3 spaces indentation, followed by a [setext header +underline](#setext-header-underline). A [setext header +underline](#setext-header-underline) <a id="setext-header-underline"/> +is a sequence of `=` characters or a sequence of `-` characters, with no +more than 3 spaces indentation and any number of leading or trailing +spaces. The header is a level 1 header if `=` characters are used, and +a level 2 header if `-` characters are used. The contents of the header +are the result of parsing the first line as markdown inline content. + +In general, a setext header need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext header comes after a paragraph, a blank line is needed between +them. + +Simple examples: + +. +Foo *bar* +========= + +Foo *bar* +--------- +. +<h1>Foo <em>bar</em></h1> +<h2>Foo <em>bar</em></h2> +. + +The underlining can be any length: + +. +Foo +------------------------- + +Foo += +. +<h2>Foo</h2> +<h1>Foo</h1> +. + +The header content can be indented up to three spaces, and need +not line up with the underlining: + +. + Foo +--- + + Foo +----- + + Foo + === +. +<h2>Foo</h2> +<h2>Foo</h2> +<h1>Foo</h1> +. + +Four spaces indent is too much: + +. + Foo + --- + + Foo +--- +. +<pre><code>Foo +--- + +Foo +</code></pre> +<hr /> +. + +The setext header underline can be indented up to three spaces, and +may have trailing spaces: + +. +Foo + ---- +. +<h2>Foo</h2> +. + +Four spaces is too much: + +. +Foo + --- +. +<p>Foo +---</p> +. + +The setext header underline cannot contain internal spaces: + +. +Foo += = + +Foo +--- - +. +<p>Foo += =</p> +<p>Foo</p> +<hr /> +. + +Trailing spaces in the content line do not cause a line break: + +. +Foo +----- +. +<h2>Foo</h2> +. + +Nor does a backslash at the end: + +. +Foo\ +---- +. +<h2>Foo\</h2> +. + +Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headers: + +. +`Foo +---- +` + +<a title="a lot +--- +of dashes"/> +. +<h2>`Foo</h2> +<p>`</p> +<h2><a title="a lot</h2> +<p>of dashes"/></p> +. + +The setext header underline cannot be a lazy line: + +. +> Foo +--- +. +<blockquote> +<p>Foo</p> +</blockquote> +<hr /> +. + +A setext header cannot interrupt a paragraph: + +. +Foo +Bar +--- + +Foo +Bar +=== +. +<p>Foo +Bar</p> +<hr /> +<p>Foo +Bar +===</p> +. + +But in general a blank line is not required before or after: + +. +--- +Foo +--- +Bar +--- +Baz +. +<hr /> +<h2>Foo</h2> +<h2>Bar</h2> +<p>Baz</p> +. + +Setext headers cannot be empty: + +. + +==== +. +<p>====</p> +. + + +## Indented code blocks + +An [indented code block](#indented-code-block) +<a id="indented-code-block"/> is composed of one or more +[indented chunks](#indented-chunk) separated by blank lines. +An [indented chunk](#indented-chunk) <a id="indented-chunk"/> +is a sequence of non-blank lines, each indented four or more +spaces. An indented code block cannot interrupt a paragraph, so +if it occurs before or after a paragraph, there must be an +intervening blank line. The contents of the code block are +the literal contents of the lines, including trailing newlines, +minus four spaces of indentation. An indented code block has no +attributes. + +. + a simple + indented code block +. +<pre><code>a simple + indented code block +</code></pre> +. + +The contents are literal text, and do not get parsed as markdown: + +. + <a/> + *hi* + + - one +. +<pre><code><a/> +*hi* + +- one +</code></pre> +. + +Here we have three chunks separated by blank lines: + +. + chunk1 + + chunk2 + + + + chunk3 +. +<pre><code>chunk1 + +chunk2 + + + +chunk3 +</code></pre> +. + +Any initial spaces beyond four will be included in the content, even +in interior blank lines: + +. + chunk1 + + chunk2 +. +<pre><code>chunk1 + + chunk2 +</code></pre> +. + +An indented code code block cannot interrupt a paragraph. (This +allows hanging indents and the like.) + +. +Foo + bar + +. +<p>Foo +bar</p> +. + +However, any non-blank line with fewer than four leading spaces ends +the code block immediately. So a paragraph may occur immediately +after indented code: + +. + foo +bar +. +<pre><code>foo +</code></pre> +<p>bar</p> +. + +And indented code can occur immediately before and after other kinds of +blocks: + +. +# Header + foo +Header +------ + foo +---- +. +<h1>Header</h1> +<pre><code>foo +</code></pre> +<h2>Header</h2> +<pre><code>foo +</code></pre> +<hr /> +. + +The first line can be indented more than four spaces: + +. + foo + bar +. +<pre><code> foo +bar +</code></pre> +. + +Blank lines preceding or following an indented code block +are not included in it: + +. + + + foo + + +. +<pre><code>foo +</code></pre> +. + +Trailing spaces are included in the code block's content: + +. + foo +. +<pre><code>foo +</code></pre> +. + + +## Fenced code blocks + +A [code fence](#code-fence) <a id="code-fence"/> is a sequence +of at least three consecutive backtick characters (`` ` ``) or +tildes (`~`). (Tildes and backticks cannot be mixed.). +A [fenced code block](#fenced-code-block) <a id="fenced-code-block"/> +begins with a code fence, indented no more than three spaces. + +The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +spaces and called the [info string](#info-string). <a +id="info-string"/> The [info string] may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.) + +The content of the code block consists of all subsequent lines, until +a closing [code fence](#code-fence) of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +indented N spaces, then up to N spaces of indentation are removed from +each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented less than N +spaces, all of the indentation is removed.) + +The closing code fence may be indented up to three spaces, and may be +followed only by spaces, which are ignored. If the end of the +document is reached and no closing code fence has been found, the code +block contains all of the lines after the opening code fence. +(An alternative spec would require backtracking in the event +that a closing code fence is not found. But this makes parsing much +less efficient, and there seems to be no real down side to the +behavior described here.) + +A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after. + +The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the info string is typically used to +specify the language of the code sample, and rendered in the `class` +attribute of the `pre` tag. However, this spec does not mandate any +particular treatment of the info string. + +Here is a simple example with backticks: + +. +``` +< + > +``` +. +<pre><code>< + > +</code></pre> +. + +With tildes: + +. +~~~ +< + > +~~~ +. +<pre><code>< + > +</code></pre> +. + +The closing code fence must use the same character as the opening +fence: + +. +``` +aaa +~~~ +``` +. +<pre><code>aaa +~~~ +</code></pre> +. + +. +~~~ +aaa +``` +~~~ +. +<pre><code>aaa +``` +</code></pre> +. + +The closing code fence must be at least as long as the opening fence: + +. +```` +aaa +``` +`````` +. +<pre><code>aaa +``` +</code></pre> +. + +. +~~~~ +aaa +~~~ +~~~~ +. +<pre><code>aaa +~~~ +</code></pre> +. + +Unclosed code blocks are closed by the end of the document: + +. +``` +. +<pre><code></code></pre> +. + +. +````` + +``` +aaa +. +<pre><code> +``` +aaa +</code></pre> +. + +A code block can have all empty lines as its content: + +. +``` + + +``` +. +<pre><code> + +</code></pre> +. + +A code block can be empty: + +. +``` +``` +. +<pre><code></code></pre> +. + +Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present: + +. + ``` + aaa +aaa +``` +. +<pre><code>aaa +aaa +</code></pre> +. + +. + ``` +aaa + aaa +aaa + ``` +. +<pre><code>aaa +aaa +aaa +</code></pre> +. + +. + ``` + aaa + aaa + aaa + ``` +. +<pre><code>aaa + aaa +aaa +</code></pre> +. + +Four spaces indentation produces an indented code block: + +. + ``` + aaa + ``` +. +<pre><code>``` +aaa +``` +</code></pre> +. + +Code fences (opening and closing) cannot contain internal spaces: + +. +``` ``` +aaa +. +<p><code></code> +aaa</p> +. + +. +~~~~~~ +aaa +~~~ ~~ +. +<pre><code>aaa +~~~ ~~ +</code></pre> +. + +Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between: + +. +foo +``` +bar +``` +baz +. +<p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +. + +Other blocks can also occur before and after fenced code blocks +without an intervening blank line: + +. +foo +--- +~~~ +bar +~~~ +# baz +. +<h2>foo</h2> +<pre><code>bar +</code></pre> +<h1>baz</h1> +. + +An [info string](#info-string) can be provided after the opening code fence. +Opening and closing spaces will be stripped, and the first word +is used here to populate the `class` attribute of the enclosing +`pre` tag. + +. +```ruby +def foo(x) + return 3 +end +``` +. +<pre class="ruby"><code>def foo(x) + return 3 +end +</code></pre> +. + +. +~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ +. +<pre class="ruby"><code>def foo(x) + return 3 +end +</code></pre> +. + +. +````; +```` +. +<pre class=";"><code></code></pre> +. + +Info strings for backtick code blocks cannot contain backticks: + +. +``` aa ``` +foo +. +<p><code>aa</code> +foo</p> +. + +Closing code fences cannot have info strings: + +. +``` +``` aaa +``` +. +<pre><code>``` aaa +</code></pre> +. + + +## HTML blocks + +An [HTML block tag](#html-block-tag) <a id="html-block-tag"/> is +an [open tag](#open-tag) or [closing tag](#closing-tag) whose tag +name is one of the following (case-insensitive): +`article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `body`, +`li`, `br`, `map`, `button`, `object`, `canvas`, `ol`, `caption`, +`output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`, +`section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`, +`fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`, +`footer`, `tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, +`video`, `script`, `style`. + +An [HTML block](#html-block) <a id="html-block"/> begins with an +[HTML block tag](#html-block-tag), [HTML comment](#html-comment), +[processing instruction](#processing-instruction), +[declaration](#declaration), or [CDATA section](#cdata-section). +It ends when a [blank line](#blank-line) or the end of the +input is encountered. The initial line may be indented up to three +spaces, and subsequent lines may have any indentation. The contents +of the HTML block are interpreted as raw HTML, and will not be escaped +in HTML output. + +Some simple examples: + +. +<table> + <tr> + <td> + hi + </td> + </tr> +</table> + +okay. +. +<table> + <tr> + <td> + hi + </td> + </tr> +</table> +<p>okay.</p> +. + +. + <div> + *hello* + <foo><a> +. + <div> + *hello* + <foo><a> +. + +Here we have two code blocks with a markdown paragraph between them: + +. +<DIV CLASS="foo"> + +*Markdown* + +</DIV> +. +<DIV CLASS="foo"> +<p><em>Markdown</em></p> +</DIV> +. + +In the following example, what looks like a markdown code block +is actually part of the HTML block, which continues until a blank +line or the end of the document is reached: + +. +<div></div> +``` c +int x = 33; +``` +. +<div></div> +``` c +int x = 33; +``` +. + +A comment: + +. +<!-- Foo +bar + baz --> +. +<!-- Foo +bar + baz --> +. + +A processing instruction: + +. +<?php + echo 'foo' +?> +. +<?php + echo 'foo' +?> +. + +CDATA: + +. +<![CDATA[ +function matchwo(a,b) +{ +if (a < b && a < 0) then + { + return 1; + } +else + { + return 0; + } +} +]]> +. +<![CDATA[ +function matchwo(a,b) +{ +if (a < b && a < 0) then + { + return 1; + } +else + { + return 0; + } +} +]]> +. + +The opening tag can be indented 1-3 spaces, but not 4: + +. + <!-- foo --> + + <!-- foo --> +. + <!-- foo --> +<pre><code><!-- foo --> +</code></pre> +. + +An HTML block can interrupt a paragraph, and need not be preceded +by a blank line. + +. +Foo +<div> +bar +</div> +. +<p>Foo</p> +<div> +bar +</div> +. + +However, a following blank line is always needed, except at the end of +a document: + +. +<div> +bar +</div> +*foo* +. +<div> +bar +</div> +*foo* +. + +An incomplete HTML block tag may also start an HTML block: + +. +<div class +foo +. +<div class +foo +. + +This rule differs from John Gruber's original markdown syntax +specification, which says: + +> The only restrictions are that block-level HTML elements — +> e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from +> surrounding content by blank lines, and the start and end tags of the +> block should not be indented with tabs or spaces. + +In some ways Gruber's rule is more restrictive than the one given +here: + +- It requires that an HTML block be preceded by a blank line. +- It does not allow the start tag to be indented. +- It requires a matching end tag, which it also does not allow to + be indented. + +Indeed, most markdown implementations, including some of Gruber's +own perl implementations, do not impose these restrictions. + +There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including markdown content inside HTML tags: +simply separate the markdown from the HTML using blank lines: + +. +<div> + +*Emphasized* text. + +</div> +. +<div> +<p><em>Emphasized</em> text.</p> +</div> +. + +Compare: + +. +<div> +*Emphasized* text. +</div> +. +<div> +*Emphasized* text. +</div> +. + +Some markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute `markdown=1`. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse. + +The main potential drawback is that one can no longer paste HTML +blocks into markdown documents with 100% reliability. However, +*in most cases* this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example: + +. +<table> + +<tr> + +<td> +Hi +</td> + +</tr> + +</table> +. +<table> +<tr> +<td> +Hi +</td> +</tr> +</table> +. + +Moreover, blank lines are usually not necessary and can be +deleted. The exception is inside `<pre>` tags; here, one can +replace the blank lines with ` ` entities. + +So there is no important loss of expressive power with the new rule. + +## Link reference definitions + +A [link reference definition](#link-reference-definition) +<a id="link-reference-definition"/> consists of a [link +label](#link-label), indented up to three spaces, followed +by a colon (`:`), optional blank space (including up to one +newline), a [link destination](#link-destination), optional +blank space (including up to one newline), and an optional [link +title](#link-title), which if it is present must be separated +from the [link destination](#link-destination) by whitespace. +No further non-space characters may occur on the line. + +A [link reference-definition](#link-reference-definition) +does not correspond to a structural element of a document. Instead, it +defines a label which can be used in [reference links](#reference-link) +and reference-style [images](#image) elsewhere in the document. [Link +references] can be defined either before or after the links that use +them. + +. +[foo]: /url "title" + +[foo] +. +<p><a href="/url" title="title">foo</a></p> +. + +. + [foo]: + /url + 'the title' + +[foo] +. +<p><a href="/url" title="the title">foo</a></p> +. + +. +[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] +. +<p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p> +. + +. +[Foo bar]: +<my url> +'title' + +[Foo bar] +. +<p><a href="my url" title="title">Foo bar</a></p> +. + +The title may be omitted: + +. +[foo]: +/url + +[foo] +. +<p><a href="/url">foo</a></p> +. + +The link destination may not be omitted: + +. +[foo]: + +[foo] +. +<p>[foo]:</p> +<p>[foo]</p> +. + +A link can come before its corresponding definition: + +. +[foo] + +[foo]: url +. +<p><a href="url">foo</a></p> +. + +If there are several matching definitions, the first one takes +precedence: + +. +[foo] + +[foo]: first +[foo]: second +. +<p><a href="first">foo</a></p> +. + +As noted in the section on [Links], matching of labels is +case-insensitive (see [matches](#matches)). + +. +[FOO]: /url + +[Foo] +. +<p><a href="/url">Foo</a></p> +. + +. +[ΑΓΩ]: /φου + +[αγω] +. +<p><a href="/φου">αγω</a></p> +. + +Here is a link reference definition with no corresponding link. +It contributes nothing to the document. + +. +[foo]: /url +. +. + +This is not a link reference definition, because there are +non-space characters after the title: + +. +[foo]: /url "title" ok +. +<p>[foo]: /url "title" ok</p> +. + +This is not a link reference definition, because it is indented +four spaces: + +. + [foo]: /url "title" + +[foo] +. +<pre><code>[foo]: /url "title" +</code></pre> +<p>[foo]</p> +. + +This is not a link reference definition, because it occurs inside +a code block: + +. +``` +[foo]: /url +``` + +[foo] +. +<pre><code>[foo]: /url +</code></pre> +<p>[foo]</p> +. + +A [link reference definition](#link-reference-definition) cannot +interrupt a paragraph. + +. +Foo +[bar]: /baz + +[bar] +. +<p>Foo +[bar]: /baz</p> +<p>[bar]</p> +. + +However, it can directly follow other block elements, such as headers +and horizontal rules, and it need not be followed by a blank line. + +. +# [Foo] +[foo]: /url +> bar +. +<h1><a href="/url">Foo</a></h1> +<blockquote> +<p>bar</p> +</blockquote> +. + +Several [link references](#link-reference) can occur one after another, +without intervening blank lines. + +. +[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url + +[foo], +[bar], +[baz] +. +<p><a href="/foo-url" title="foo">foo</a>, +<a href="/bar-url" title="bar">bar</a>, +<a href="/baz-url">baz</a></p> +. + +[Link reference definitions](#link-reference-definition) can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined: + +. +[foo] + +> [foo]: /url +. +<p><a href="/url">foo</a></p> +<blockquote> +</blockquote> +. + + +## Paragraphs + +A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a [paragraph](#paragraph) <a id="paragraph"/>. +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +spaces. + +A simple example with two paragraphs: + +. +aaa + +bbb +. +<p>aaa</p> +<p>bbb</p> +. + +Paragraphs can contain multiple lines, but no blank lines: + +. +aaa +bbb + +ccc +ddd +. +<p>aaa +bbb</p> +<p>ccc +ddd</p> +. + +Multiple blank lines between paragraph have no effect: + +. +aaa + + +bbb +. +<p>aaa</p> +<p>bbb</p> +. + +Leading spaces are skipped: + +. + aaa + bbb +. +<p>aaa +bbb</p> +. + +Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs. + +. +aaa + bbb + ccc +. +<p>aaa +bbb +ccc</p> +. + +However, the first line may be indented at most three spaces, +or an indented code block will be triggered: + +. + aaa +bbb +. +<p>aaa +bbb</p> +. + +. + aaa +bbb +. +<pre><code>aaa +</code></pre> +<p>bbb</p> +. + +Final spaces are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a hard line +break: + +. +aaa +bbb +. +<p>aaa<br /> +bbb</p> +. + +## Blank lines + +[Blank lines](#blank-line) between block-level elements are ignored, +except for the role they play in determining whether a [list](#list) +is [tight](#tight) or [loose](#loose). + +Blank lines at the beginning and end of the document are also ignored. + +. + + +aaa + + +# aaa + + +. +<p>aaa</p> +<h1>aaa</h1> +. + + +# Container blocks + +A [container block](#container-block) is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes](#block-quote) and [list items](#list-item). +[Lists](#list) are meta-containers for [list items](#list-item). + +We define the syntax for container blocks recursively. The general +form of the definition is: + +> If X is a sequence of blocks, then the result of +> transforming X in such-and-such a way is a container of type Y +> with these blocks as its content. + +So, we explain what counts as a block quote or list item by +explaining how these can be *generated* from their contents. +This should suffice to define the syntax, although it does not +give a recipe for *parsing* these constructions. (A recipe is +provided below in the section entitled [A parsing strategy].) + +## Block quotes + +A [block quote marker](#block-quote-marker) <a id="block-quote-marker"/> +consists of 0-3 spaces of initial indent, plus (a) the character `>` together +with a following space, or (b) a single character `>` not followed by a space. + +The following rules define [block quotes](#block-quote): +<a id="block-quote"/> + +1. **Basic case.** If a string of lines *Ls* constitute a sequence + of blocks *Bs*, then the result of appending a [block quote marker] + to the beginning of each line in *Ls* is a [block quote](#block-quote) + containing *Bs*. + +2. **Laziness.** If a string of lines *Ls* constitute a [block + quote](#block-quote) with contents *Bs*, then the result of deleting + the initial [block quote marker](#block-quote-marker) from one or + more lines in which the next non-space character after the [block + quote marker](#block-quote-marker) is [paragraph continuation + text](#paragraph-continuation-text) is a block quote with *Bs* as + its content. [Paragraph continuation + text](#paragraph-continuation-text) is text that will be parsed as + part of the content of a paragraph, but does not occur at the + beginning of the paragraph. + +3. **Consecutiveness.** A document cannot contain two [block + quotes](#block-quote) in a row unless there is a [blank + line](#blank-line) between them. + +Nothing else counts as a [block quote](#block-quote). + +Here is a simple example: + +. +> # Foo +> bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +. + +The spaces after the `>` characters can be omitted: + +. +># Foo +>bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +. + +The `>` characters can be indented 1-3 spaces: + +. + > # Foo + > bar + > baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +. + +Four spaces gives us a code block: + +. + > # Foo + > bar + > baz +. +<pre><code>> # Foo +> bar +> baz +</code></pre> +. + +The Laziness clause allows us to omit the `>` before a +paragraph continuation line: + +. +> # Foo +> bar +baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +. + +A block quote can contain some lazy and some non-lazy +continuation lines: + +. +> bar +baz +> foo +. +<blockquote> +<p>bar +baz +foo</p> +</blockquote> +. + +Laziness only applies to lines that are continuations of +paragraphs. Lines containing characters or indentation that indicate +block structure cannot be lazy. + +. +> foo +--- +. +<blockquote> +<p>foo</p> +</blockquote> +<hr /> +. + +. +> - foo +- bar +. +<blockquote> +<ul> +<li>foo</li> +</ul> +</blockquote> +<ul> +<li>bar</li> +</ul> +. + +. +> foo + bar +. +<blockquote> +<pre><code>foo +</code></pre> +</blockquote> +<pre><code>bar +</code></pre> +. + +. +> ``` +foo +``` +. +<blockquote> +<pre><code></code></pre> +</blockquote> +<p>foo</p> +<pre><code></code></pre> +. + +A block quote can be empty: + +. +> +. +<blockquote> +</blockquote> +. + +. +> +> +> +. +<blockquote> +</blockquote> +. + +A block quote can have initial or final blank lines: + +. +> +> foo +> +. +<blockquote> +<p>foo</p> +</blockquote> +. + +A blank line always separates block quotes: + +. +> foo + +> bar +. +<blockquote> +<p>foo</p> +</blockquote> +<blockquote> +<p>bar</p> +</blockquote> +. + +(Most current markdown implementations, including John Gruber's +original `Markdown.pl`, will parse this eample as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.) + +Consecutiveness means that if we put these block quotes together, +we get a single block quote: + +. +> foo +> bar +. +<blockquote> +<p>foo +bar</p> +</blockquote> +. + +To get a block quote with two paragraphs, use: + +. +> foo +> +> bar +. +<blockquote> +<p>foo</p> +<p>bar</p> +</blockquote> +. + +Block quotes can interrupt paragraphs: + +. +foo +> bar +. +<p>foo</p> +<blockquote> +<p>bar</p> +</blockquote> +. + +In general, blank lines are not needed before or after block +quotes: + +. +> aaa +*** +> bbb +. +<blockquote> +<p>aaa</p> +</blockquote> +<hr /> +<blockquote> +<p>bbb</p> +</blockquote> +. + +However, because of laziness, a blank line is needed between +a block quote and a following paragraph: + +. +> bar +baz +. +<blockquote> +<p>bar +baz</p> +</blockquote> +. + +. +> bar + +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +. + +. +> bar +> +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +. + +It is a consequence of the Laziness rule that any number +of initial `>`s may be omitted on a continuation line of a +nested block quote: + +. +> > > foo +bar +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar</p> +</blockquote> +</blockquote> +</blockquote> +. + +. +>>> foo +> bar +>>baz +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar +baz</p> +</blockquote> +</blockquote> +</blockquote> +. + +When including an indented code block in a block quote, +remember that the [block quote marker](#block-quote-marker) includes +both the `>` and a following space. So *five spaces* are needed after +the `>`: + +. +> code + +> not code +. +<blockquote> +<pre><code>code +</code></pre> +</blockquote> +<blockquote> +<p>not code</p> +</blockquote> +. + + +## List items + +A [list marker](#list-marker) <a id="list-marker"/> is a +[bullet list marker](#bullet-list-marker) or an [ordered list +marker](#ordered-list-marker). + +A [bullet list marker](#bullet-list-marker) <a id="bullet-list-marker"/> +is a `-`, `+`, or `*` character. + +An [ordered list marker](#ordered-list-marker) <a id="ordered-list-marker"/> +is a sequence of one of more digits (`0-9`), followed by either a +`.` character or a `)` character. + +The following rules define [list items](#list-item): + +1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of + blocks *Bs* starting with a non-space character and not separated + from each other by more than one blank line, and *M* is a list + marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result + of prepending *M* and the following spaces to the first line of + *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a + list item with *Bs* as its contents. The type of the list item + (bullet or ordered) is determined by the type of its list marker. + If the list item is ordered, then it is also assigned a start + number, based on the ordered list marker. + +For example, let *Ls* be the lines + +. +A paragraph +with two lines. + + indented code + +> A block quote. +. +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +. + +And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as *Ls*: + +. +1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li><p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote></li> +</ol> +. + +The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces, and there are three spaces between +the list marker and the next nonspace character, then blocks +must be indented five spaces in order to fall under the list +item. + +Here are some examples showing how far content must be indented to be +put under the list item: + +. +- one + + two +. +<ul> +<li>one</li> +</ul> +<p>two</p> +. + +. +- one + + two +. +<ul> +<li><p>one</p> +<p>two</p></li> +</ul> +. + +. + - one + + two +. +<ul> +<li>one</li> +</ul> +<pre><code> two +</code></pre> +. + +. + - one + + two +. +<ul> +<li><p>one</p> +<p>two</p></li> +</ul> +. + +It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first nonspace +character after the list marker. However, that is not quite right. +The spaces after the list marker determine how much relative indentation +is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as show by +this example: + +. + > > 1. one +>> +>> two +. +<blockquote> +<blockquote> +<ol> +<li><p>one</p> +<p>two</p></li> +</ol> +</blockquote> +</blockquote> +. + +Here `two` occurs in the same column as the list marker `1.`, +but is actually contained in the list item, because there is +sufficent indentation after the last containing blockquote marker. + +The converse is also possible. In the following example, the word `two` +occurs far to the right of the initial text of the list item, `one`, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker: + +. +>>- one +>> + > > two +. +<blockquote> +<blockquote> +<ul> +<li>one</li> +</ul> +<p>two</p> +</blockquote> +</blockquote> +. + +A list item may not contain blocks that are separated by more than +one blank line. Thus, two blank lines will end a list: + +. +- foo + + bar + +- foo + + + bar +. +<ul> +<li><p>foo</p> +<p>bar</p></li> +<li><p>foo</p></li> +</ul> +<p>bar</p> +. + +A list item may contain any kind of block: + +. +1. foo + + ``` + bar + ``` + + baz + + > bam +. +<ol> +<li><p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +<blockquote> +<p>bam</p> +</blockquote></li> +</ol> +. + +2. **Item starting with indented code.** If a sequence of lines *Ls* + constitute a sequence of blocks *Bs* starting with an indented code + block and not separated from each other by more than one blank line, + and *M* is a list marker *M* of width *W* followed by + one space, then the result of prepending *M* and the following + space to the first line of *Ls*, and indenting subsequent lines of + *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. + +An indented code block will have to be indented four spaces beyond +the edge of the region where text will be included in the list item. +In the following case that is 6 spaces: + +. +- foo + + bar +. +<ul> +<li><p>foo</p> +<pre><code>bar +</code></pre></li> +</ul> +. + +And in this case it is 11 spaces: + +. + 10. foo + + bar +. +<ol start="10"> +<li><p>foo</p> +<pre><code>bar +</code></pre></li> +</ol> +. + +If the *first* block in the list item is an indented code block, +then by rule #2, the contents must be indented *one* space after the +list marker: + +. + indented code + +paragraph + + more code +. +<pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +. + +. +1. indented code + + paragraph + + more code +. +<ol> +<li><pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre></li> +</ol> +. + +Note that an additional space indent is interpreted as space +inside the code block: + +. +1. indented code + + paragraph + + more code +. +<ol> +<li><pre><code> indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre></li> +</ol> +. + +Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a nonspace +character, and (b) cases in which they begin with an indented code +block. In a case like the following, where the first block begins with +a three-space indent, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker: + +. + foo + +bar +. +<p>foo</p> +<p>bar</p> +. + +. +- foo + + bar +. +<ul> +<li>foo</li> +</ul> +<p>bar</p> +. + +This is not a significant restriction, because when a block begins +with 1-3 spaces indent, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case: + +. +- foo + + bar +. +<ul> +<li><p>foo</p> +<p>bar</p></li> +</ul> +. + + +3. **Indentation.** If a sequence of lines *Ls* constitutes a list item + according to rule #1 or #2, then the result of indenting each line + of *L* by 1-3 spaces (the same for each line) also constitutes a + list item with the same contents and attributes. If a line is + empty, then it need not be indented. + +Indented one space: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li><p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote></li> +</ol> +. + +Indented two spaces: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li><p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote></li> +</ol> +. + +Indented three spaces: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li><p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote></li> +</ol> +. + +Four spaces indent gives a code block: + +. + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<pre><code>1. A paragraph + with two lines. + + indented code + + > A block quote. +</code></pre> +. + + +4. **Laziness.** If a string of lines *Ls* constitute a [list + item](#list-item) with contents *Bs*, then the result of deleting + some or all of the indentation from one or more lines in which the + next non-space character after the [list marker](#list--marker) is + [paragraph continuation text](#paragraph-continuation-text) is a + list item with the same contents and attributes. + +Here is an example with lazy continuation lines: + +. + 1. A paragraph +with two lines. + + indented code + + > A block quote. +. +<ol> +<li><p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote></li> +</ol> +. + +Indentation can be partially deleted: + +. + 1. A paragraph + with two lines. +. +<ol> +<li>A paragraph +with two lines.</li> +</ol> +. + +These examples show how laziness can work in nested structures: + +. +> 1. > Blockquote +continued here. +. +<blockquote> +<ol> +<li><blockquote> +<p>Blockquote +continued here.</p> +</blockquote></li> +</ol> +</blockquote> +. + +. +> 1. > Blockquote +> continued here. +. +<blockquote> +<ol> +<li><blockquote> +<p>Blockquote +continued here.</p> +</blockquote></li> +</ol> +</blockquote> +. + + +5. **That's all.** Nothing that is not counted as a list item by rules + #1--4 counts as a [list item](#block-quote). + +The rules for sublists follow from the general rules above. A sublist +must be indented the same number of spaces a paragraph would need to be +in order to be included in the list item. + +So, in this case we need two spaces indent: + +. +- foo + - bar + - baz +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz</li> +</ul></li> +</ul></li> +</ul> +. + +One is not enough: + +. +- foo + - bar + - baz +. +<ul> +<li>foo</li> +<li>bar</li> +<li>baz</li> +</ul> +. + +Here we need four, because the list marker is wider: + +. +10) foo + - bar +. +<ol start="10"> +<li>foo +<ul> +<li>bar</li> +</ul></li> +</ol> +. + +Three is not enough: + +. +10) foo + - bar +. +<ol start="10"> +<li>foo</li> +</ol> +<ul> +<li>bar</li> +</ul> +. + +A list may be the first block in a list item: + +. +- - foo +. +<ul> +<li><ul> +<li>foo</li> +</ul></li> +</ul> +. + +. +1. - 2. foo +. +<ol> +<li><ul> +<li><ol start="2"> +<li>foo</li> +</ol></li> +</ul></li> +</ol> +. + +A list item may be empty: + +. +- foo +- +- bar +. +<ul> +<li>foo</li> +<li></li> +<li>bar</li> +</ul> +. + +. +- +. +<ul> +<li></li> +</ul> +. + +### Motivation + +John Gruber's markdown spec says the following about list items: + +1. "List markers typically start at the left margin, but may be indented + by up to three spaces. List markers must be followed by one or more + spaces or a tab." + +2. "To make lists look nice, you can wrap items with hanging indents.... + But if you don't want to, you don't have to." + +3. "List items may consist of multiple paragraphs. Each subsequent + paragraph in a list item must be indented by either 4 spaces or one + tab." + +4. "It looks nice if you indent every line of the subsequent paragraphs, + but here again, Markdown will allow you to be lazy." + +5. "To put a blockquote within a list item, the blockquote's `>` + delimiters need to be indented." + +6. "To put a code block within a list item, the code block needs to be + indented twice — 8 spaces or two tabs." + +These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that *all* block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +*four-space rule*. + +The four-space rule is clear and principled, and if the reference +implementation `Markdown.pl` had followed it, it probably would have +become the standard. However, `Markdown.pl` allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP markdown, and others +followed `Markdown.pl`'s behavior more closely.) + +Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving `Markdown.pl` behavior, provided they are laid out +in a way that is natural for a human to read. + +The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #4, then allows continuation lines to be +unindented if needed.) + +This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that + +``` markdown +- foo + + bar + + - baz +``` + +should be parsed as two lists with an intervening paragraph, + +``` html +<ul> +<li>foo</li> +</ul> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +``` + +as the four-space rule demands, rather than a single list, + +``` html +<ul> +<li><p>foo<p> +<p>bar></p></li> +<li><p>baz</p><li> +</ul> +``` + +The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly. + +Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing 1--3 spaces indentation of the +initial list marker, allows text that is indented *less than* the +original list marker to be included in the list item. For example, +`Markdown.pl` parses + +``` markdown + - one + + two +``` + +as a single list item, with `two` a continuation paragraph: + +``` html +<ul> +<li><p>one</p> +<p>two</p></li> +</ul> +``` + +and similarly + +``` markdown +> - one +> +> two +``` + +as + +``` html +<blockquote> +<ul> +<li><p>one</p> +<p>two</p></li> +</ul> +</blockquote> +``` + +This is extremely unintuitive. + +Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph `bar` +is not indented as far as the first paragraph `foo`: + +``` markdown + 10. foo + + bar +``` + +Arguably this text does read like a list item with `bar` as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing markdown, which has the pattern: + +``` markdown +1. foo + + indented code +``` + +where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of `foo`. + +The one case that needs special treatment is a list item that *starts* +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases. + +## Lists + +A [list](#list) <a id="list"/> is a sequence of one or more +list items [of the same type](#of-the-same-type). The list items +may be separated by single [blank lines](#blank-line), but two +blank lines end all containing lists. + +Two list items are [of the same type](#of-the-same-type) +<a id="of-the-same-type"/> if they begin with a [list +marker](#list-marker) of the same type. Two list markers are of the +same type if (a) they are bullet list markers using the same character +(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same +delimiter (either `.` or `)`). + +A list is an [ordered list](#ordered-list) <a id="ordered-list"/> +if its constituent list items begin with +[ordered list markers](#ordered-list-marker), and a [bullet +list](#bullet-list) <a id="bullet-list"/> if its constituent list +items begin with [bullet list markers](#bullet-list-marker). + +The [start number](#start-number) <a id="start-number"/> +of an [ordered list](#ordered-list) is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded. + +A list is [loose](#loose) if it any of its constituent list items are +separated by blank lines, or if any of its constituent list items +directly contain two block-level elements with a blank line between +them. Otherwise a list is [tight](#tight). (The difference in HTML output +is that paragraphs in a loose with are wrapped in `<p>` tags, while +paragraphs in a tight list are not.) + +Changing the bullet or ordered list delimiter starts a new list: + +. +- foo +- bar ++ baz +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<ul> +<li>baz</li> +</ul> +. + +. +1. foo +2. bar +3) baz +. +<ol> +<li>foo</li> +<li>bar</li> +</ol> +<ol start="3"> +<li>baz</li> +</ol> +. + +There can be blank lines between items, but two blank lines end +a list: + +. +- foo + +- bar + + +- baz +. +<ul> +<li><p>foo</p></li> +<li><p>bar</p></li> +</ul> +<ul> +<li>baz</li> +</ul> +. + +As illustrated above in the section on [list items](#list-item), +two blank lines between blocks *within* a list item will also end a +list: + +. +- foo + + + bar +- baz +. +<ul> +<li>foo</li> +</ul> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +. + +Indeed, two blank lines will end *all* containing lists: + +. +- foo + - bar + - baz + + + bim +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz</li> +</ul></li> +</ul></li> +</ul> +<pre><code> bim +</code></pre> +. + +Thus, two blank lines can be used to separate consecutive lists of +the same type, or to separate a list from an indented code block +that would otherwise be parsed as a subparagraph of the final list +item: + +. +- foo +- bar + + +- baz +- bim +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<ul> +<li>baz</li> +<li>bim</li> +</ul> +. + +. +- foo + + notcode + +- foo + + + code +. +<ul> +<li><p>foo</p> +<p>notcode</p></li> +<li><p>foo</p></li> +</ul> +<pre><code>code +</code></pre> +. + +List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item: + +. +- a + - b + - c + - d + - e + - f +- g +. +<ul> +<li>a</li> +<li>b</li> +<li>c</li> +<li>d</li> +<li>e</li> +<li>f</li> +<li>g</li> +</ul> +. + +This is a loose list, because there is a blank line between +two of the list items: + +. +- a +- b + +- c +. +<ul> +<li><p>a</p></li> +<li><p>b</p></li> +<li><p>c</p></li> +</ul> +. + +So is this, with a empty second item: + +. +* a +* + +* c +. +<ul> +<li><p>a</p></li> +<li></li> +<li><p>c</p></li> +</ul> +. + +These are loose lists, even though there is no space between the items, +because one of the items directly contains two block-level elements +with a blank line between them: + +. +- a +- b + + c +- d +. +<ul> +<li><p>a</p></li> +<li><p>b</p> +<p>c</p></li> +<li><p>d</p></li> +</ul> +. + +. +- a +- b + + [ref]: /url +- d +. +<ul> +<li><p>a</p></li> +<li><p>b</p></li> +<li><p>d</p></li> +</ul> +. + +This is a tight list, because the blank lines are in a code block: + +. +- a +- ``` + b + + + ``` +- c +. +<ul> +<li>a</li> +<li><pre><code>b + + +</code></pre></li> +<li>c</li> +</ul> +. + +This is a tight list, because the blank line is between two +paragraphs of a sublist. So the inner list is loose while +the other list is tight: + +. +- a + - b + + c +- d +. +<ul> +<li>a +<ul> +<li><p>b</p> +<p>c</p></li> +</ul></li> +<li>d</li> +</ul> +. + +This is a tight list, because the blank line is inside the +block quote: + +. +* a + > b + > +* c +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote></li> +<li>c</li> +</ul> +. + +This list is tight, because the consecutive block elements +are not separated by blank lines: + +. +- a + > b + ``` + c + ``` +- d +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote> +<pre><code>c +</code></pre></li> +<li>d</li> +</ul> +. + +A single-paragraph list is tight: + +. +- a +. +<ul> +<li>a</li> +</ul> +. + +. +- a + - b +. +<ul> +<li>a +<ul> +<li>b</li> +</ul></li> +</ul> +. + +Here the outer list is loose, the inner list tight: + +. +* foo + * bar + + baz +. +<ul> +<li><p>foo</p> +<ul> +<li>bar</li> +</ul> +<p>baz</p></li> +</ul> +. + +. +- a + - b + - c + +- d + - e + - f +. +<ul> +<li><p>a</p> +<ul> +<li>b</li> +<li>c</li> +</ul></li> +<li><p>d</p> +<ul> +<li>e</li> +<li>f</li> +</ul></li> +</ul> +. + +# Inlines + +Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in + +. +`hi`lo` +. +<p><code>hi</code>lo`</p> +. + +`hi` is parsed as code, leaving the backtick at the end as a literal +backtick. + +## Backslash escapes + +Any ASCII punctuation character may be backslash-escaped: + +. +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ +. +<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> +. + +Backslashes before other characters are treated as literal +backslashes: + +. +\→\A\a\ \3\φ\« +. +<p>\ \A\a\ \3\φ\«</p> +. + +Escaped characters are treated as regular characters and do +not have their usual markdown meanings: + +. +\*not emphasized* +\<br/> not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a header +\[foo]: /url "not a reference" +. +<p>*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a header +[foo]: /url "not a reference"</p> +. + +If a backslash is itself escaped, the following character is not: + +. +\\*emphasis* +. +<p>\<em>emphasis</em></p> +. + +A backslash at the end of the line is a hard line break: + +. +foo\ +bar +. +<p>foo<br /> +bar</p> +. + +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: + +. +`` \[\` `` +. +<p><code>\[\`</code></p> +. + +. + \[\] +. +<pre><code>\[\] +</code></pre> +. + +. +~~~ +\[\] +~~~ +. +<pre><code>\[\] +</code></pre> +. + +. +<http://google.com?find=\*> +. +<p><a href="http://google.com?find=\*">http://google.com?find=\*</a></p> +. + +. +<a href="/bar\/)"> +. +<p><a href="/bar\/)"></p> +. + +But they work in all other contexts, including URLs and link titles, +link references, and info strings in [fenced code +blocks](#fenced-code-block): + +. +[foo](/bar\* "ti\*tle") +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +. + +. +[foo] + +[foo]: /bar\* "ti\*tle" +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +. + +. +``` foo\+bar +foo +``` +. +<pre class="foo+bar"><code>foo +</code></pre> +. + + +## Entities + +Entities are parsed as entities, not as literal text, in all contexts +except code spans and code blocks. Three kinds of entities are recognized. + +[Named entities](#name-entities) <a id="named-entities"/> consist of `&` ++ a string of 2-32 alphanumerics beginning with a letter + `;`. + +. + & © Æ Ď ¾ ℋ ⅆ ∲ +. +<p> & © Æ Ď ¾ ℋ ⅆ ∲</p> +. + +[Decimal entities](#decimal-entities) <a id="decimal-entities"/> +consist of `&` + a string of 1--8 arabic digits + `;`. + +. + # Ӓ Ϡ � +. +<p> # Ӓ Ϡ �</p> +. + +[Hexadecimal entities](#hexadecimal-entities) <a id="hexadecimal-entities"/> +consist of `&` + either `X` or `x` + a string of 1-8 hexadecimal digits ++ `;`. + +. + " ആ ಫ +. +<p> " ആ ಫ</p> +. + +Here are some nonentities: + +. +  &x; &#; &#x; � &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?; +. +<p>&nbsp &x; &#; &#x; &#123456789; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;</p> +. + +Although HTML5 does accept some entities without a trailing semicolon +(such as `©`), these are not recognized as entities here: + +. +© +. +<p>&copy</p> +. + +On the other hand, many strings that are not on the list of HTML5 +named entities are recognized as entities here: + +. +&MadeUpEntity; +. +<p>&MadeUpEntity;</p> +. + +Entities are recognized in any any context besides code spans or +code blocks, including raw HTML, URLs, [link titles](#link-title), and +[fenced code block](#fenced-code-block) info strings: + +. +<a href="öö.html"> +. +<p><a href="öö.html"></p> +. + +. +[foo](/föö "föö") +. +<p><a href="/föö" title="föö">foo</a></p> +. + +. +[foo] + +[foo]: /föö "föö" +. +<p><a href="/föö" title="föö">foo</a></p> +. + +. +``` föö +foo +``` +. +<pre class="föö"><code>foo +</code></pre> +. + +Entities are treated as literal text in code spans and code blocks: + +. +`föö` +. +<p><code>f&ouml;&ouml;</code></p> +. + +. + föfö +. +<pre><code>f&ouml;f&ouml; +</code></pre> +. + +## Code span + +A [backtick string](#backtick-string) <a id="backtick-string"/> +is a string of one or more backtick characters (`` ` ``) that is neither +preceded nor followed by a backtick. + +A code span begins with a backtick string and ends with a backtick +string of equal length. The contents of the code span are the +characters between the two backtick strings, with leading and trailing +spaces and newlines removed, and consecutive spaces and newlines +collapsed to single spaces. + +This is a simple code span: + +. +`foo` +. +<p><code>foo</code></p> +. + +Here two backticks are used, because the code contains a backtick. +This example also illustrates stripping of leading and trailing spaces: + +. +`` foo ` bar `` +. +<p><code>foo ` bar</code></p> +. + +This example shows the motivation for stripping leading and trailing +spaces: + +. +` `` ` +. +<p><code>``</code></p> +. + +Newlines are treated like spaces: + +. +`` +foo +`` +. +<p><code>foo</code></p> +. + +Interior spaces and newlines are collapsed into single spaces, just +as they would be by a browser: + +. +`foo bar + baz` +. +<p><code>foo bar baz</code></p> +. + +Q: Why not just leave the spaces, since browsers will collapse them +anyway? A: Because we might be targeting a non-HTML format, and we +shouldn't rely on HTML-specific rendering assumptions. + +(Existing implementations differ in their treatment of internal +spaces and newlines. Some, including `Markdown.pl` and +`showdown`, convert an internal newline into a `<br />` tag. +But this makes things difficult for those who like to hard-wrap +their paragraphs, since a line break in the midst of a code +span will cause an unintended line break in the output. Others +just leave internal spaces as they are, which is fine if only +HTML is being targeted.) + +. +`foo `` bar` +. +<p><code>foo `` bar</code></p> +. + +Note that backslash escapes do not work in code spans. All backslashes +are treated literally: + +. +`foo\`bar` +. +<p><code>foo\</code>bar`</p> +. + +Backslash escapes are never needed, because one can always choose a +string of *n* backtick characters as delimiters, where the code does +not contain any strings of exactly *n* backtick characters. + +Code span backticks have higher precedence than any other inline +constructs except HTML tags and autolinks. Thus, for example, this is +not parsed as emphasized text, since the second `*` is part of a code +span: + +. +*foo`*` +. +<p>*foo<code>*</code></p> +. + +And this is not parsed as a link: + +. +[not a `link](/foo`) +. +<p>[not a <code>link](/foo</code>)</p> +. + +But this is a link: + +. +<http://foo.bar.`baz>` +. +<p><a href="http://foo.bar.`baz">http://foo.bar.`baz</a>`</p> +. + +And this is an HTML tag: + +. +<a href="`">` +. +<p><a href="`">`</p> +. + +When a backtick string is not closed by a matching backtick string, +we just have literal backticks: + +. +```foo`` +. +<p>```foo``</p> +. + +. +`foo +. +<p>`foo</p> +. + +## Emphasis and strong emphasis + +John Gruber's original [markdown syntax +description](http://daringfireball.net/projects/markdown/syntax#em) says: + +> Markdown treats asterisks (`*`) and underscores (`_`) as indicators of +> emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML +> `<em>` tag; double `*`'s or `_`'s will be wrapped with an HTML `<strong>` +> tag. + +This is enough for most users, but these rules leave much undecided, +especially when it comes to nested emphasis. The original +`Markdown.pl` test suite makes it clear that triple `***` and +`___` delimiters can be used for strong emphasis, and most +implementations have also allowed the following patterns: + +``` markdown +***strong emph*** +***strong** in emph* +***emph* in strong** +**in strong *emph*** +*in emph **strong*** +``` + +The following patterns are less widely supported, but the intent +is clear and they are useful (especially in contexts like bibliography +entries): + +``` markdown +*emph *with emph* in it* +**strong **with strong** in it** +``` + +Many implementations have also restricted intraword emphasis to +the `*` forms, to avoid unwanted emphasis in words containing +internal underscores. (It is best practice to put these in code +spans, but users often do not.) + +``` markdown +internal emphasis: foo*bar*baz +no emphasis: foo_bar_baz +``` + +The following rules capture all of these patterns, while allowing +for efficient parsing strategies that do not backtrack: + +1. A single `*` character [can open emphasis](#can-open-emphasis) + <a id="can-open-emphasis"/> iff + + (a) it is not part of a sequence of four or more unescaped `*`s, + (b) it is not followed by whitespace, and + (c) either it is not followed by a `*` character or it is + followed immediately by strong emphasis. + +2. A single `_` character [can open emphasis](#can-open-emphasis) iff + + (a) it is not part of a sequence of four or more unescaped `_`s, + (b) it is not followed by whitespace, + (c) is is not preceded by an ASCII alphanumeric character, and + (d) either it is not followed by a `_` character or it is + followed immediately by strong emphasis. + +3. A single `*` character [can close emphasis](#can-close-emphasis) + <a id="can-close-emphasis"/> iff + + (a) it is not part of a sequence of four or more unescaped `*`s, and + (b) it is not preceded by whitespace. + +4. A single `_` character [can close emphasis](#can-close-emphasis) iff + + (a) it is not part of a sequence of four or more unescaped `_`s, + (b) it is not preceded by whitespace, and + (c) it is not followed by an ASCII alphanumeric character. + +5. A double `**` [can open strong emphasis](#can-open-strong-emphasis) + <a id="can-open-strong-emphasis" /> iff + + (a) it is not part of a sequence of four or more unescaped `*`s, + (b) it is not followed by whitespace, and + (c) either it is not followed by a `*` character or it is + followed immediately by emphasis. + +6. A double `__` [can open strong emphasis](#can-open-strong-emphasis) + iff + + (a) it is not part of a sequence of four or more unescaped `_`s, + (b) it is not followed by whitespace, and + (c) it is not preceded by an ASCII alphanumeric character, and + (d) either it is not followed by a `_` character or it is + followed immediately by emphasis. + +7. A double `**` [can close strong emphasis](#can-close-strong-emphasis) + <a id="can-close-strong-emphasis" /> iff + + (a) it is not part of a sequence of four or more unescaped `*`s, and + (b) it is not preceded by whitespace. + +8. A double `__` [can close strong emphasis](#can-close-strong-emphasis) + iff + + (a) it is not part of a sequence of four or more unescaped `_`s, + (b) it is not preceded by whitespace, and + (c) it is not followed by an ASCII alphanumeric character. + +9. Emphasis begins with a delimiter that [can open + emphasis](#can-open-emphasis) and includes inlines parsed + sequentially until a delimiter that [can close + emphasis](#can-close-emphasis), and that uses the same + character (`_` or `*`) as the opening delimiter, is reached. + +10. Strong emphasis begins with a delimiter that [can open strong + emphasis](#can-open-strong-emphasis) and includes inlines parsed + sequentially until a delimiter that [can close strong + emphasis](#can-close-strong-emphasis), and that uses the + same character (`_` or `*`) as the opening delimiter, is reached. + +These rules can be illustrated through a series of examples. + +Simple emphasis: + +. +*foo bar* +. +<p><em>foo bar</em></p> +. + +. +_foo bar_ +. +<p><em>foo bar</em></p> +. + +Simple strong emphasis: + +. +**foo bar** +. +<p><strong>foo bar</strong></p> +. + +. +__foo bar__ +. +<p><strong>foo bar</strong></p> +. + +Emphasis can continue over line breaks: + +. +*foo +bar* +. +<p><em>foo +bar</em></p> +. + +. +_foo +bar_ +. +<p><em>foo +bar</em></p> +. + +. +**foo +bar** +. +<p><strong>foo +bar</strong></p> +. + +. +__foo +bar__ +. +<p><strong>foo +bar</strong></p> +. + +Emphasis can contain other inline constructs: + +. +*foo [bar](/url)* +. +<p><em>foo <a href="/url">bar</a></em></p> +. + +. +_foo [bar](/url)_ +. +<p><em>foo <a href="/url">bar</a></em></p> +. + +. +**foo [bar](/url)** +. +<p><strong>foo <a href="/url">bar</a></strong></p> +. + +. +__foo [bar](/url)__ +. +<p><strong>foo <a href="/url">bar</a></strong></p> +. + +Symbols contained in other inline constructs will not +close emphasis: + +. +*foo [bar*](/url) +. +<p>*foo <a href="/url">bar*</a></p> +. + +. +_foo [bar_](/url) +. +<p>_foo <a href="/url">bar_</a></p> +. + +. +**<a href="**"> +. +<p>**<a href="**"></p> +. + +. +__<a href="__"> +. +<p>__<a href="__"></p> +. + +. +*a `*`* +. +<p><em>a <code>*</code></em></p> +. + +. +_a `_`_ +. +<p><em>a <code>_</code></em></p> +. + +. +**a<http://foo.bar?q=**> +. +<p>**a<a href="http://foo.bar?q=**">http://foo.bar?q=**</a></p> +. + +. +__a<http://foo.bar?q=__> +. +<p>__a<a href="http://foo.bar?q=__">http://foo.bar?q=__</a></p> +. + +This is not emphasis, because the opening delimiter is +followed by white space: + +. +and * foo bar* +. +<p>and * foo bar*</p> +. + +. +_ foo bar_ +. +<p>_ foo bar_</p> +. + +. +and ** foo bar** +. +<p>and ** foo bar**</p> +. + +. +__ foo bar__ +. +<p>__ foo bar__</p> +. + +This is not emphasis, because the closing delimiter is +preceded by white space: + +. +and *foo bar * +. +<p>and *foo bar *</p> +. + +. +and _foo bar _ +. +<p>and _foo bar _</p> +. + +. +and **foo bar ** +. +<p>and **foo bar **</p> +. + +. +and __foo bar __ +. +<p>and __foo bar __</p> +. + +The rules imply that a sequence of four or more unescaped `*` or +`_` characters will always be parsed as a literal string: + +. +****hi**** +. +<p>****hi****</p> +. + +. +_____hi_____ +. +<p>_____hi_____</p> +. + +. +Sign here: _________ +. +<p>Sign here: _________</p> +. + +The rules also imply that there can be no empty emphasis or strong +emphasis: + +. +** is not an empty emphasis +. +<p>** is not an empty emphasis</p> +. + +. +**** is not an empty strong emphasis +. +<p>**** is not an empty strong emphasis</p> +. + +To include `*` or `_` in emphasized sections, use backslash escapes +or code spans: + +. +*here is a \** +. +<p><em>here is a *</em></p> +. + +. +__this is a double underscore (`__`)__ +. +<p><strong>this is a double underscore (<code>__</code>)</strong></p> +. + +`*` delimiters allow intra-word emphasis; `_` delimiters do not: + +. +foo*bar*baz +. +<p>foo<em>bar</em>baz</p> +. + +. +foo_bar_baz +. +<p>foo_bar_baz</p> +. + +. +foo__bar__baz +. +<p>foo__bar__baz</p> +. + +. +_foo_bar_baz_ +. +<p><em>foo_bar_baz</em></p> +. + +. +11*15*32 +. +<p>11<em>15</em>32</p> +. + +. +11_15_32 +. +<p>11_15_32</p> +. + +Internal underscores will be ignored in underscore-delimited +emphasis: + +. +_foo_bar_baz_ +. +<p><em>foo_bar_baz</em></p> +. + +. +__foo__bar__baz__ +. +<p><strong>foo__bar__baz</strong></p> +. + +The rules are sufficient for the following nesting patterns: + +. +***foo bar*** +. +<p><strong><em>foo bar</em></strong></p> +. + +. +___foo bar___ +. +<p><strong><em>foo bar</em></strong></p> +. + +. +***foo** bar* +. +<p><em><strong>foo</strong> bar</em></p> +. + +. +___foo__ bar_ +. +<p><em><strong>foo</strong> bar</em></p> +. + +. +***foo* bar** +. +<p><strong><em>foo</em> bar</strong></p> +. + +. +___foo_ bar__ +. +<p><strong><em>foo</em> bar</strong></p> +. + +. +*foo **bar*** +. +<p><em>foo <strong>bar</strong></em></p> +. + +. +_foo __bar___ +. +<p><em>foo <strong>bar</strong></em></p> +. + +. +**foo *bar*** +. +<p><strong>foo <em>bar</em></strong></p> +. + +. +__foo _bar___ +. +<p><strong>foo <em>bar</em></strong></p> +. + +. +*foo **bar*** +. +<p><em>foo <strong>bar</strong></em></p> +. + +. +_foo __bar___ +. +<p><em>foo <strong>bar</strong></em></p> +. + +. +*foo *bar* baz* +. +<p><em>foo <em>bar</em> baz</em></p> +. + +. +_foo _bar_ baz_ +. +<p><em>foo <em>bar</em> baz</em></p> +. + +. +**foo **bar** baz** +. +<p><strong>foo <strong>bar</strong> baz</strong></p> +. + +. +__foo __bar__ baz__ +. +<p><strong>foo <strong>bar</strong> baz</strong></p> +. + +. +*foo **bar** baz* +. +<p><em>foo <strong>bar</strong> baz</em></p> +. + +. +_foo __bar__ baz_ +. +<p><em>foo <strong>bar</strong> baz</em></p> +. + +. +**foo *bar* baz** +. +<p><strong>foo <em>bar</em> baz</strong></p> +. + +. +__foo _bar_ baz__ +. +<p><strong>foo <em>bar</em> baz</strong></p> +. + +Note that you cannot nest emphasis directly inside emphasis +using the same delimeter, or strong emphasis directly inside +strong emphasis: + +. +**foo** +. +<p><strong>foo</strong></p> +. + +. +****foo**** +. +<p>****foo****</p> +. + +For these nestings, you need to switch delimiters: + +. +*_foo_* +. +<p><em><em>foo</em></em></p> +. + +. +**__foo__** +. +<p><strong><strong>foo</strong></strong></p> +. + +Note that a `*` followed by a `*` can close emphasis, and +a `**` followed by a `*` can close strong emphasis (and +similarly for `_` and `__`): + +. +*foo** +. +<p><em>foo</em>*</p> +. + +. +*foo *bar** +. +<p><em>foo <em>bar</em></em></p> +. + +. +**foo*** +. +<p><strong>foo</strong>*</p> +. + +The following contains no strong emphasis, because the opening +delimiter is closed by the first `*` before `bar`: + +. +*foo**bar*** +. +<p><em>foo</em><em>bar</em>**</p> +. + +However, a string of four or more `****` can never close emphasis: + +. +*foo**** +. +<p>*foo****</p> +. + +Note that there are some asymmetries here: + +. +*foo** + +**foo* +. +<p><em>foo</em>*</p> +<p>**foo*</p> +. + +. +*foo *bar** + +**foo* bar* +. +<p><em>foo <em>bar</em></em></p> +<p>**foo* bar*</p> +. + +More cases with mismatched delimiters: + +. +**foo* bar* +. +<p>**foo* bar*</p> +. + +. +*bar*** +. +<p><em>bar</em>**</p> +. + +. +***foo* +. +<p>***foo*</p> +. + +. +**bar*** +. +<p><strong>bar</strong>*</p> +. + +. +***foo** +. +<p>***foo**</p> +. + +. +***foo *bar* +. +<p>***foo <em>bar</em></p> +. + +## Links + +A link contains a [link label](#link-label) (the visible text), +a [destination](#destination) (the URI that is the link destination), +and optionally a [link title](#link-title). There are two basic kinds +of links in markdown. In [inline links](#inline-links) the destination +and title are given immediately after the lable. In [reference +links](#reference-links) the destination and title are defined elsewhere +in the document. + +A [link label](#link-label) <a id="link-label"/> consists of + +- an opening `[`, followed by +- zero or more backtick code spans, autolinks, HTML tags, link labels, + backslash-escaped ASCII punctuation characters, or non-`]` characters, + followed by +- a closing `]`. + +These rules are motivated by the following intuitive ideas: + +- A link label is a container for inline elements. +- The square brackets bind more tightly than emphasis markers, + but less tightly than `<>` or `` ` ``. +- Link labels may contain material in matching square brackets. + +A [link destination](#link-destination) <a id="link-destination"/> +consists of either + +- a sequence of zero or more characters between an opening `<` and a + closing `>` that contains no line breaks or unescaped `<` or `>` + characters, or + +- a nonempty sequence of characters that does not include + ASCII space or control characters, and includes parentheses + only if (a) they are backslash-escaped or (b) they are part of + a balanced pair of unescaped parentheses that is not itself + inside a balanced pair of unescaped paretheses. + +A [link title](#link-title) <a id="link-title"/> consists of either + +- a sequence of zero or more characters between straight double-quote + characters (`"`), including a `"` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between straight single-quote + characters (`'`), including a `'` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between matching parentheses + (`(...)`), including a `)` character only if it is backslash-escaped. + +An [inline link](#inline-link) <a id="inline-link"/> +consists of a [link label](#link-label) followed immediately +by a left parenthesis `(`, optional whitespace, +an optional [link destination](#link-destination), +an optional [link title](#link-title) separated from the link +destination by whitespace, optional whitespace, and a right +parenthesis `)`. The link's text consists of the label (excluding +the enclosing square brackets) parsed as inlines. The link's +URI consists of the link destination, excluding enclosing `<...>` if +present, with backslash-escapes in effect as described above. The +link's title consists of the link title, excluding its enclosing +delimiters, with backslash-escapes in effect as described above. + +Here is a simple inline link: + +. +[link](/uri "title") +. +<p><a href="/uri" title="title">link</a></p> +. + +The title may be omitted: + +. +[link](/uri) +. +<p><a href="/uri">link</a></p> +. + +Both the title and the destination may be omitted: + +. +[link]() +. +<p><a href="">link</a></p> +. + +. +[link](<>) +. +<p><a href="">link</a></p> +. + + +If the destination contains spaces, it must be enclosed in pointy +braces: + +. +[link](/my uri) +. +<p>[link](/my uri)</p> +. + +. +[link](</my uri>) +. +<p><a href="/my uri">link</a></p> +. + +The destination cannot contain line breaks, even with pointy braces: + +. +[link](foo +bar) +. +<p>[link](foo +bar)</p> +. + +One level of balanced parentheses is allowed without escaping: + +. +[link]((foo)and(bar)) +. +<p><a href="(foo)and(bar)">link</a></p> +. + +However, if you have parentheses within parentheses, you need to escape +or use the `<...>` form: + +. +[link](foo(and(bar))) +. +<p>[link](foo(and(bar)))</p> +. + +. +[link](foo(and\(bar\))) +. +<p><a href="foo(and(bar))">link</a></p> +. + +. +[link](<foo(and(bar))>) +. +<p><a href="foo(and(bar))">link</a></p> +. + +Parentheses and other symbols can also be escaped, as usual +in markdown: + +. +[link](foo\)\:) +. +<p><a href="foo):">link</a></p> +. + +URL-escaping and entities should be left alone inside the destination: + +. +[link](foo%20bä) +. +<p><a href="foo%20bä">link</a></p> +. + +Note that, because titles can often be parsed as destinations, +if you try to omit the destination and keep the title, you'll +get unexpected results: + +. +[link]("title") +. +<p><a href=""title"">link</a></p> +. + +Titles may be in single quotes, double quotes, or parentheses: + +. +[link](/url "title") +[link](/url 'title') +[link](/url (title)) +. +<p><a href="/url" title="title">link</a> +<a href="/url" title="title">link</a> +<a href="/url" title="title">link</a></p> +. + +Backslash escapes and entities may be used in titles: + +. +[link](/url "title \""") +. +<p><a href="/url" title="title """>link</a></p> +. + +Nested balanced quotes are not allowed without escaping: + +. +[link](/url "title "and" title") +. +<p>[link](/url "title "and" title")</p> +. + +But it is easy to work around this by using a different quote type: + +. +[link](/url 'title "and" title') +. +<p><a href="/url" title="title "and" title">link</a></p> +. + +(Note: `Markdown.pl` did allow double quotes inside a double-quoted +title, and its test suite included a test demonstrating this. +But it is hard to see a good rationale for the extra complexity this +brings, since there are already many ways---backslash escaping, +entities, or using a different quote type for the enclosing title---to +write titles containing double quotes. `Markdown.pl`'s handling of +titles has a number of other strange features. For example, it allows +single-quoted titles in inline links, but not reference links. And, in +reference links but not inline links, it allows a title to begin with +`"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing +quotation mark, though 1.0.2b8 does not. It seems preferable to adopt +a simple, rational rule that works the same way in inline links and +link reference definitions.) + +Whitespace is allowed around the destination and title: + +. +[link]( /uri + "title" ) +. +<p><a href="/uri" title="title">link</a></p> +. + +But it is not allowed between the link label and the +following parenthesis: + +. +[link] (/uri) +. +<p>[link] (/uri)</p> +. + +Note that this is not a link, because the closing `]` occurs in +an HTML tag: + +. +[foo <bar attr="](baz)"> +. +<p>[foo <bar attr="](baz)"></p> +. + + +There are three kinds of [reference links](#reference-link): +<a id="reference-link"/> + +A [full reference link](#full-reference-link) <a id="full-reference-link"/> +consists of a [link label](#link-label), optional whitespace, and +another [link label](#link-label) that [matches](#matches) a +[reference link definition](#reference-link-definition) elsewhere in the +document. + +One label [matches](#matches) <a id="matches"/> +another just in case their normalized forms are equal. To normalize a +label, perform the *unicode case fold* and collapse consecutive internal +whitespace to a single space. If there are multiple matching reference +link definitions, the one that comes first in the document is used. (It +is desirable in such cases to emit a warning.) + +The contents of the first link label are parsed as inlines, which are +used as the link's text. The link's URI and title are provided by the +matching reference link definition. + +Here is a simple example: + +. +[foo][bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +The first label can contain inline content: + +. +[*foo\!*][bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo!</em></a></p> +. + +Matching is case-insensitive: + +. +[foo][BaR] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +Unicode case fold is used: + +. +[Толпой][Толпой] is a Russian word. + +[ТОЛПОЙ]: /url +. +<p><a href="/url">Толпой</a> is a Russian word.</p> +. + +Consecutive internal whitespace is treated as one space for +purposes of determining matching: + +. +[Foo + bar]: /url + +[Baz][Foo bar] +. +<p><a href="/url">Baz</a></p> +. + +There can be whitespace between the two labels: + +. +[foo] [bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +. +[foo] +[bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +When there are multiple matching reference link definitions, +the first is used: + +. +[foo]: /url1 + +[foo]: /url2 + +[bar][foo] +. +<p><a href="/url1">bar</a></p> +. + +Note that matching is performed on normalized strings, not parsed +inline content. So the following does not match, even though the +labels define equivalent inline content: + +. +[bar][foo\!] + +[foo!]: /url +. +<p>[bar][foo!]</p> +. + +A [collapsed reference link](#collapsed-reference-link) +<a id="collapsed-reference-link"/> consists of a [link +label](#link-label) that [matches](#matches) a [reference link +definition](#reference-link-definition) elsewhere in the +document, optional whitespace, and the string `[]`. The contents of the +first link label are parsed as inlines, which are used as the link's +text. The link's URI and title are provided by the matching reference +link definition. Thus, `[foo][]` is equivalent to `[foo][foo]`. + +. +[foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +. +[*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +. + +The link labels are case-insensitive: + +. +[Foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +. + + +As with full reference links, whitespace is allowed +between the two sets of brackets: + +. +[foo] +[] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +A [shortcut reference link](#shortcut-reference-link) +<a id="shortcut-reference-link"/> consists of a [link +label](#link-label) that [matches](#matches) a [reference link +definition](#reference-link-definition) elsewhere in the +document and is not followed by `[]` or a link label. +The contents of the first link label are parsed as inlines, +which are used as the link's text. the link's URI and title +are provided by the matching reference link definition. +Thus, `[foo]` is equivalent to `[foo][]`. + +. +[foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +. + +. +[*foo* bar] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +. + +. +[[*foo* bar]] + +[*foo* bar]: /url "title" +. +<p>[<a href="/url" title="title"><em>foo</em> bar</a>]</p> +. + +The link labels are case-insensitive: + +. +[Foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +. + +If you just want bracketed text, you can backslash-escape the +opening bracket to avoid links: + +. +\[foo] + +[foo]: /url "title" +. +<p>[foo]</p> +. + +Note that this is a link, because link labels bind more tightly +than emphasis: + +. +[foo*]: /url + +*[foo*] +. +<p>*<a href="/url">foo*</a></p> +. + +However, this is not, because link labels bind tight less +tightly than code backticks: + +. +[foo`]: /url + +[foo`]` +. +<p>[foo<code>]</code></p> +. + +Link labels can contain matched square brackets: + +. +[[[foo]]] + +[[[foo]]]: /url +. +<p><a href="/url">[[foo]]</a></p> +. + +. +[[[foo]]] + +[[[foo]]]: /url1 +[foo]: /url2 +. +<p><a href="/url1">[[foo]]</a></p> +. + +For non-matching brackets, use backslash escapes: + +. +[\[foo] + +[\[foo]: /url +. +<p><a href="/url">[foo</a></p> +. + +Full references take precedence over shortcut references: + +. +[foo][bar] + +[foo]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a></p> +. + +In the following case `[bar][baz]` is parsed as a reference, +`[foo]` as normal text: + +. +[foo][bar][baz] + +[baz]: /url +. +<p>[foo]<a href="/url">bar</a></p> +. + +Here, though, `[foo][bar]` is parsed as a reference, since +`[bar]` is defined: + +. +[foo][bar][baz] + +[baz]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a><a href="/url1">baz</a></p> +. + +Here `[foo]` is not parsed as a shortcut reference, because it +is followed by a link label (even though `[bar]` is not defined): + +. +[foo][bar][baz] + +[baz]: /url1 +[foo]: /url2 +. +<p>[foo]<a href="/url1">bar</a></p> +. + + +## Images + +An (unescaped) exclamation mark (`!`) followed by a reference or +inline link will be parsed as an image. The link label will be +used as the image's alt text, and the link title, if any, will +be used as the image's title. + +. +![foo](/url "title") +. +<p><img src="/url" alt="foo" title="title" /></p> +. + +. +![foo *bar*] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo <em>bar</em>" title="train & tracks" /></p> +. + +. +![foo *bar*][] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo <em>bar</em>" title="train & tracks" /></p> +. + +. +![foo *bar*][foobar] + +[FOOBAR]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo <em>bar</em>" title="train & tracks" /></p> +. + +. +![foo](train.jpg) +. +<p><img src="train.jpg" alt="foo" /></p> +. + +. +My ![foo bar](/path/to/train.jpg "title" ) +. +<p>My <img src="/path/to/train.jpg" alt="foo bar" title="title" /></p> +. + +. +![foo](<url>) +. +<p><img src="url" alt="foo" /></p> +. + +. +![](/url) +. +<p><img src="/url" alt="" /></p> +. + +Reference-style: + +. +![foo] [bar] + +[bar]: /url +. +<p><img src="/url" alt="foo" /></p> +. + +. +![foo] [bar] + +[BAR]: /url +. +<p><img src="/url" alt="foo" /></p> +. + +Collapsed: + +. +![foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +. + +. +![*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="<em>foo</em> bar" title="title" /></p> +. + +The labels are case-insensitive: + +. +![Foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +. + +As with full reference links, whitespace is allowed +between the two sets of brackets: + +. +![foo] +[] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +. + +Shortcut: + +. +![foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +. + +. +![*foo* bar] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="<em>foo</em> bar" title="title" /></p> +. + +. +![[foo]] + +[[foo]]: /url "title" +. +<p><img src="/url" alt="[foo]" title="title" /></p> +. + +The link labels are case-insensitive: + +. +![Foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +. + +If you just want bracketed text, you can backslash-escape the +opening `!` and `[`: + +. +\!\[foo] + +[foo]: /url "title" +. +<p>![foo]</p> +. + +If you want a link after a literal `!`, backslash-escape the +`!`: + +. +\![foo] + +[foo]: /url "title" +. +<p>!<a href="/url" title="title">foo</a></p> +. + +## Autolinks + +Autolinks are absolute URIs and email addresses inside `<` and `>`. +They are parsed as links, with the URL or email address as the link +label. + +A [URI autolink](#uri-autolink) <a id="uri-autolink"/> +consists of `<`, followed by an [absolute +URI](#absolute-uri) not containing `<`, followed by `>`. It is parsed +as a link to the URI, with the URI as the link's label. + +An [absolute URI](#absolute-uri), <a id="absolute-uri"/> +for these purposes, consists of a [scheme](#scheme) followed by a colon (`:`) +followed by zero or more characters other than ASCII whitespace and +control characters, `<`, and `>`. If the URI includes these characters, +you must use percent-encoding (e.g. `%20` for a space). + +The following [schemes](#scheme) <a id="scheme"/> +are recognized (case-insensitive): +`coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`, +`cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`, +`gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`, +`ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`, +`mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`, +`ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`, +`service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,` +soap.beep`, `soap.beeps`, `tag`, `tel`, `telnet`, `tftp`, `thismessage`, +`tn3270`, `tip`, `tv`, `urn`, `vemmi`, `ws`, `wss`, `xcon`, +`xcon-userid`, `xmlrpc.beep`, `xmlrpc.beeps`, `xmpp`, `z39.50r`, +`z39.50s`, `adiumxtra`, `afp`, `afs`, `aim`, `apt`,` attachment`, `aw`, +`beshare`, `bitcoin`, `bolo`, `callto`, `chrome`,` chrome-extension`, +`com-eventbrite-attendee`, `content`, `cvs`,` dlna-playsingle`, +`dlna-playcontainer`, `dtn`, `dvb`, `ed2k`, `facetime`, `feed`, +`finger`, `fish`, `gg`, `git`, `gizmoproject`, `gtalk`, `hcp`, `icon`, +`ipn`, `irc`, `irc6`, `ircs`, `itms`, `jar`, `jms`, `keyparc`, `lastfm`, +`ldaps`, `magnet`, `maps`, `market`,` message`, `mms`, `ms-help`, +`msnim`, `mumble`, `mvn`, `notes`, `oid`, `palm`, `paparazzi`, +`platform`, `proxy`, `psyc`, `query`, `res`, `resource`, `rmi`, `rsync`, +`rtmp`, `secondlife`, `sftp`, `sgn`, `skype`, `smb`, `soldat`, +`spotify`, `ssh`, `steam`, `svn`, `teamspeak`, `things`, `udp`, +`unreal`, `ut2004`, `ventrilo`, `view-source`, `webcal`, `wtai`, +`wyciwyg`, `xfire`, `xri`, `ymsgr`. + +Here are some valid autolinks: + +. +<http://foo.bar.baz> +. +<p><a href="http://foo.bar.baz">http://foo.bar.baz</a></p> +. + +. +<http://foo.bar.baz?q=hello&id=22&boolean> +. +<p><a href="http://foo.bar.baz?q=hello&id=22&boolean">http://foo.bar.baz?q=hello&id=22&boolean</a></p> +. + +. +<irc://foo.bar:2233/baz> +. +<p><a href="irc://foo.bar:2233/baz">irc://foo.bar:2233/baz</a></p> +. + +Uppercase is also fine: + +. +<MAILTO:FOO@BAR.BAZ> +. +<p><a href="MAILTO:FOO@BAR.BAZ">MAILTO:FOO@BAR.BAZ</a></p> +. + +Spaces are not allowed in autolinks: + +. +<http://foo.bar/baz bim> +. +<p><http://foo.bar/baz bim></p> +. + +An [email autolink](#email-autolink) <a id="email-autolink"/> +consists of `<`, followed by an [email address](#email-address), +followed by `>`. The link's label is the email address, +and the URL is `mailto:` followed by the email address. + +An [email address](#email-address), <a id="email-address"/> +for these purposes, is anything that matches +the [non-normative regex from the HTML5 +spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#e-mail-state-%28type=email%29): + + /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? + (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ + +Examples of email autolinks: + +. +<foo@bar.baz.com> +. +<p><a href="mailto:foo@bar.baz.com">foo@bar.baz.com</a></p> +. + +. +<foo+special@Bar.baz-bar0.com> +. +<p><a href="mailto:foo+special@Bar.baz-bar0.com">foo+special@Bar.baz-bar0.com</a></p> +. + +These are not autolinks: + +. +<> +. +<p><></p> +. + +. +<heck://bing.bong> +. +<p><heck://bing.bong></p> +. + +. +< http://foo.bar > +. +<p>< http://foo.bar ></p> +. + +. +<foo.bar.baz> +. +<p><foo.bar.baz></p> +. + +. +<localhost:5001/foo> +. +<p><localhost:5001/foo></p> +. + +## Raw HTML + +Text between `<` and `>` that looks like an HTML tag is parsed as a +raw HTML tag and will be rendered in HTML without escaping. +Tag and attribute names are not limited to current HTML tags, +so custom tags (and even, say, DocBook tags) may be used. + +Here is the grammar for tags: + +A [tag name](#tag-name) <a id="tag-name"/> consists of an ASCII letter +followed by zero or more ASCII letters or digits. + +An [attribute](#attribute) <a id="attribute"/> consists of whitespace, +an **attribute name**, and an optional **attribute value +specification**. + +An [attribute name](#attribute-name) <a id="attribute-name"/> +consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII +letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML +specification restricted to ASCII. HTML5 is laxer.) + +An [attribute value specification](#attribute-value-specification) +<a id="attribute-value-specification"/> consists of optional whitespace, +a `=` character, optional whitespace, and an [attribute +value](#attribute-value). + +An [attribute value](#attribute-value) <a id="attribute-value"/> +consists of an [unquoted attribute value](#unquoted-attribute-value), +a [single-quoted attribute value](#single-quoted-attribute-value), +or a [double-quoted attribute value](#double-quoted-attribute-value). + +An [unquoted attribute value](#unquoted-attribute-value) +<a id="unquoted-attribute-value"/> is a nonempty string of characters not +including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``. + +A [single-quoted attribute value](#single-quoted-attribute-value) +<a id="single-quoted-attribute-value"/> consists of `'`, zero or more +characters not including `'`, and a final `'`. + +A [double-quoted attribute value](#double-quoted-attribute-value) +<a id="double-quoted-attribute-value"/> consists of `"`, zero or more +characters not including `"`, and a final `"`. + +An [open tag](#open-tag) <a id="open-tag"/> consists of a `<` character, +a [tag name](#tag-name), zero or more [attributes](#attribute), +optional whitespace, an optional `/` character, and a `>` character. + +A [closing tag](#closing-tag) <a id="closing-tag"/> consists of the +string `</`, a [tag name](#tag-name), optional whitespace, and the +character `>`. + +An [HTML comment](#html-comment) <a id="html-comment"/> consists of the +string `<!--`, a string of characters not including the string `--`, and +the string `-->`. + +A [processing instruction](#processing-instruction) +<a id="processing-instruction"/> consists of the string `<?`, a string +of characters not including the string `?>`, and the string +`?>`. + +A [declaration](#declaration) <a id="declaration"/> consists of the +string `<!`, a name consisting of one or more uppercase ASCII letters, +whitespace, a string of characters not including the character `>`, and +the character `>`. + +A [CDATA section](#cdata-section) <a id="cdata-section"/> consists of +the string `<![CDATA[`, a string of characters not including the string +`]]>`, and the string `]]>`. + +An [HTML tag](#html-tag) <a id="html-tag"/> consists of an [open +tag](#open-tag), a [closing tag](#closing-tag), an [HTML +comment](#html-comment), a [processing +instruction](#processing-instruction), an [element type +declaration](#element-type-declaration), or a [CDATA +section](#cdata-section). + +Here are some simple open tags: + +. +<a><bab><c2c> +. +<p><a><bab><c2c></p> +. + +Empty elements: + +. +<a/><b2/> +. +<p><a/><b2/></p> +. + +Whitespace is allowed: + +. +<a /><b2 +data="foo" > +. +<p><a /><b2 +data="foo" ></p> +. + +With attributes: + +. +<a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /> +. +<p><a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /></p> +. + +Illegal tag names, not parsed as HTML: + +. +<33> <__> +. +<p><33> <__></p> +. + +Illegal attribute names: + +. +<a h*#ref="hi"> +. +<p><a h*#ref="hi"></p> +. + +Illegal attribute values: + +. +<a href="hi'> <a href=hi'> +. +<p><a href="hi'> <a href=hi'></p> +. + +Illegal whitespace: + +. +< a>< +foo><bar/ > +. +<p>< a>< +foo><bar/ ></p> +. + +Missing whitespace: + +. +<a href='bar'title=title> +. +<p><a href='bar'title=title></p> +. + +Closing tags: + +. +</a> +</foo > +. +<p></a> +</foo ></p> +. + +Illegal attributes in closing tag: + +. +</a href="foo"> +. +<p></a href="foo"></p> +. + +Comments: + +. +foo <!-- this is a +comment - with hyphen --> +. +<p>foo <!-- this is a +comment - with hyphen --></p> +. + +. +foo <!-- not a comment -- two hyphens --> +. +<p>foo <!-- not a comment -- two hyphens --></p> +. + +Processing instructions: + +. +foo <?php echo $a; ?> +. +<p>foo <?php echo $a; ?></p> +. + +Declarations: + +. +foo <!ELEMENT br EMPTY> +. +<p>foo <!ELEMENT br EMPTY></p> +. + +CDATA sections: + +. +foo <![CDATA[>&<]]> +. +<p>foo <![CDATA[>&<]]></p> +. + +Entities are preserved in HTML attributes: + +. +<a href="ö"> +. +<p><a href="ö"></p> +. + +Backslash escapes do not work in HTML attributes: + +. +<a href="\*"> +. +<p><a href="\*"></p> +. + +. +<a href="\""> +. +<p><a href="""></p> +. + +## Hard line breaks + +A line break (not in a code span or HTML tag) that is preceded +by two or more spaces is parsed as a linebreak (rendered +in HTML as a `<br />` tag): + +. +foo +baz +. +<p>foo<br /> +baz</p> +. + +For a more visible alternative, a backslash before the newline may be +used instead of two spaces: + +. +foo\ +baz +. +<p>foo<br /> +baz</p> +. + +More than two spaces can be used: + +. +foo +baz +. +<p>foo<br /> +baz</p> +. + +Leading spaces at the beginning of the next line are ignored: + +. +foo + bar +. +<p>foo<br /> +bar</p> +. + +. +foo\ + bar +. +<p>foo<br /> +bar</p> +. + +Line breaks can occur inside emphasis, links, and other constructs +that allow inline content: + +. +*foo +bar* +. +<p><em>foo<br /> +bar</em></p> +. + +. +*foo\ +bar* +. +<p><em>foo<br /> +bar</em></p> +. + +Line breaks do not occur inside code spans + +. +`code +span` +. +<p><code>code span</code></p> +. + +. +`code\ +span` +. +<p><code>code\ span</code></p> +. + +or HTML tags: + +. +<a href="foo +bar"> +. +<p><a href="foo +bar"></p> +. + +. +<a href="foo\ +bar"> +. +<p><a href="foo\ +bar"></p> +. + +## Soft line breaks + +A regular line break (not in a code span or HTML tag) that is not +preceded by two or more spaces is parsed as a softbreak. (A +softbreak may be rendered in HTML either as a newline or as a space. +The result will be the same in browsers. In the examples here, a +newline will be used.) + +. +foo +baz +. +<p>foo +baz</p> +. + +Spaces at the end of the line and beginning of the next line are +removed: + +. +foo + baz +. +<p>foo +baz</p> +. + +A conforming parser may render a soft line break in HTML either as a +line break or as a space. + +A renderer may also provide an option to render soft line breaks +as hard line breaks. + +## Strings + +Any characters not given an interpretation by the above rules will +be parsed as string content. + +. +hello $.;'there +. +<p>hello $.;'there</p> +. + +. +Foo χρῆν +. +<p>Foo χρῆν</p> +. + +Internal spaces are preserved verbatim: + +. +Multiple spaces +. +<p>Multiple spaces</p> +. + +<!-- END TESTS --> + +# Appendix A: A parsing strategy {-} + +## Overview {-} + +Parsing has two phases: + +1. In the first phase, lines of input are consumed and the block +structure of the document---its division into paragraphs, block quotes, +list items, and so on---is constructed. Text is assigned to these +blocks but not parsed. Reference link definitions are parsed and a +map of links is constructed. + +2. In the second phase, the raw text contents of paragraphs and headers +are parsed into sequences of markdown inline elements (strings, +code spans, links, emphasis, and so on), using the map of link +references constructed in phase 1. + +## The document tree {-} + +At each point in processing, the document is represented as a tree of +**blocks**. The root of the tree is a `document` block. The `document` +may have any number of other blocks as **children**. These children +may, in turn, have other blocks a children. The last child of a block +is normally considered **open**, meaning that subsequent lines of input +can alter its contents. (Blocks that are not open are **closed**.) +Here, for example, is a possible document tree, with the open blocks +marked by arrows: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## How source lines alter the document tree {-} + +Each line that is processed has an effect on this tree. The line is +analyzed and, depending on its contents, the document may be altered +in one or more of the following ways: + +1. One or more open blocks may be closed. +2. One or more new blocks may be created as children of the + last open block. +3. Text may be added to the last (deepest) open block remaining + on the tree. + +Once a line has been incorporated into the tree in this way, +it can be discarded, so input can be read in a stream. + +We can see how this works by considering how the tree above is +generated by four lines of markdown: + +``` markdown +> Lorem ipsum dolor +sit amet. +> - Qui *quodsi iracundia* +> - aliquando id +``` + +At the outset, our document model is just + +``` tree +-> document +``` + +The first line of our text, + +``` markdown +> Lorem ipsum dolor +``` + +causes a `block_quote` block to be created as a child of our +open `document` block, and a `paragraph` block as a child of +the `block_quote`. Then the text is added to the last open +block, the `paragraph`: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor" +``` + +The next line, + +``` markdown +sit amet. +``` + +is a "lazy continuation" of the open `paragraph`, so it gets added +to the paragraph's text: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor\nsit amet." +``` + +The third line, + +``` markdown +> - Qui *quodsi iracundia* +``` + +causes the `paragraph` block to be closed, and a new `list` block +opened as a child of the `block_quote`. A `list_item` is also +added as a child of the `list`, and a `paragraph` as a chid of +the `list_item`. The text is then added to the `paragraph`: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + -> list_item + -> paragraph + "Qui *quodsi iracundia*" +``` + +The fourth line, + +``` markdown +> - aliquando id +``` + +causes the `list_item` (and its child the `paragraph`) to be closed, +and a new `list_item` opened up as child of the `list`. A `paragraph` +is added as a child of the new `list_item`, to contain the text. +We thus obtain the final tree: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## From block structure to the final document {-} + +Once all of the input has been parsed, all open blocks are closed. + +We then "walk the tree," visiting every node, and parse raw +string contents of paragraphs and headers as inlines. At this +point we have seen all the link reference definitions, so we can +resolve reference links as we go. + +``` tree +document + block_quote + paragraph + str "Lorem ipsum dolor" + softbreak + str "sit amet." + list (type=bullet tight=true bullet_char=-) + list_item + paragraph + str "Qui " + emph + str "quodsi iracundia" + list_item + paragraph + str "aliquando id" +``` + +Notice how the newline in the first paragraph has been parsed as +a `softbreak`, and the asterisks in the first list item have become +an `emph`. + +The document can be rendered as HTML, or in any other format, given +an appropriate renderer. + + |