- ---
- title: CommonMark Spec
- author:
- - John MacFarlane
- version: 0.12
- date: 2014-11-10
- ...
- # Introduction
- ## What is Markdown?
- Markdown is a plain text format for writing structured documents,
- based on conventions used for indicating formatting in email and
- usenet posts. It was developed in 2004 by John Gruber, who wrote
- the first Markdown-to-HTML converter in perl, and it soon became
- widely used in websites. By 2014 there were dozens of
- implementations in many languages. Some of them extended basic
- Markdown syntax with conventions for footnotes, definition lists,
- tables, and other constructs, and some allowed output not just in
- HTML but in LaTeX and many other formats.
- ## Why is a spec needed?
- John Gruber's [canonical description of Markdown's
- syntax](http://daringfireball.net/projects/markdown/syntax)
- does not specify the syntax unambiguously. Here are some examples of
- questions it does not answer:
- 1. How much indentation is needed for a sublist? The spec says that
- continuation paragraphs need to be indented four spaces, but is
- not fully explicit about sublists. It is natural to think that
- they, too, must be indented four spaces, but `Markdown.pl` does
- not require that. This is hardly a "corner case," and divergences
- between implementations on this issue often lead to surprises for
- users in real documents. (See [this comment by John
- Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).)
- 2. Is a blank line needed before a block quote or header?
- Most implementations do not require the blank line. However,
- this can lead to unexpected results in hard-wrapped text, and
- also to ambiguities in parsing (note that some implementations
- put the header inside the blockquote, while others do not).
- (John Gruber has also spoken [in favor of requiring the blank
- lines](http://article.gmane.org/gmane.text.markdown.general/2146).)
- 3. Is a blank line needed before an indented code block?
- (`Markdown.pl` requires it, but this is not mentioned in the
- documentation, and some implementations do not require it.)
- ``` markdown
- paragraph
- code?
- ```
- 4. What is the exact rule for determining when list items get
- wrapped in `<p>` tags? Can a list be partially "loose" and partially
- "tight"? What should we do with a list like this?
- ``` markdown
- 1. one
- 2. two
- 3. three
- ```
- Or this?
- ``` markdown
- 1. one
- - a
- - b
- 2. two
- ```
- (There are some relevant comments by John Gruber
- [here](http://article.gmane.org/gmane.text.markdown.general/2554).)
- 5. Can list markers be indented? Can ordered list markers be right-aligned?
- ``` markdown
- 8. item 1
- 9. item 2
- 10. item 2a
- ```
- 6. Is this one list with a horizontal rule in its second item,
- or two lists separated by a horizontal rule?
- ``` markdown
- * a
- * * * * *
- * b
- ```
- 7. When list markers change from numbers to bullets, do we have
- two lists or one? (The Markdown syntax description suggests two,
- but the perl scripts and many other implementations produce one.)
- ``` markdown
- 1. fee
- 2. fie
- - foe
- - fum
- ```
- 8. What are the precedence rules for the markers of inline structure?
- For example, is the following a valid link, or does the code span
- take precedence ?
- ``` markdown
- [a backtick (`)](/url) and [another backtick (`)](/url).
- ```
- 9. What are the precedence rules for markers of emphasis and strong
- emphasis? For example, how should the following be parsed?
- ``` markdown
- *foo *bar* baz*
- ```
- 10. What are the precedence rules between block-level and inline-level
- structure? For example, how should the following be parsed?
- ``` markdown
- - `a long code span can contain a hyphen like this
- - and it can screw things up`
- ```
- 11. Can list items include headers? (`Markdown.pl` does not allow this,
- but headers can occur in blockquotes.)
- ``` markdown
- - # Heading
- ```
- 12. Can link references be defined inside block quotes or list items?
- ``` markdown
- > Blockquote [foo].
- >
- > [foo]: /url
- ```
- 13. If there are multiple definitions for the same reference, which takes
- precedence?
- ``` markdown
- [foo]: /url1
- [foo]: /url2
- [foo][]
- ```
- In the absence of a spec, early implementers consulted `Markdown.pl`
- to resolve these ambiguities. But `Markdown.pl` was quite buggy, and
- gave manifestly bad results in many cases, so it was not a
- satisfactory replacement for a spec.
- Because there is no unambiguous spec, implementations have diverged
- considerably. As a result, users are often surprised to find that
- a document that renders one way on one system (say, a github wiki)
- renders differently on another (say, converting to docbook using
- pandoc). To make matters worse, because nothing in Markdown counts
- as a "syntax error," the divergence often isn't discovered right away.
- ## About this document
- This document attempts to specify Markdown syntax unambiguously.
- It contains many examples with side-by-side Markdown and
- HTML. These are intended to double as conformance tests. An
- accompanying script `runtests.pl` can be used to run the tests
- against any Markdown program:
- perl runtests.pl spec.txt PROGRAM
- Since this document describes how Markdown is to be parsed into
- an abstract syntax tree, it would have made sense to use an abstract
- representation of the syntax tree instead of HTML. But HTML is capable
- of representing the structural distinctions we need to make, and the
- choice of HTML for the tests makes it possible to run the tests against
- an implementation without writing an abstract syntax tree renderer.
- This document is generated from a text file, `spec.txt`, written
- in Markdown with a small extension for the side-by-side tests.
- The script `spec2md.pl` can be used to turn `spec.txt` into pandoc
- Markdown, which can then be converted into other formats.
- In the examples, the `→` character is used to represent tabs.
- # Preprocessing
- A [line](@line)
- is a sequence of zero or more [characters](#character) followed by a
- line ending (CR, LF, or CRLF) or by the end of file.
- A [character](@character) is a unicode code point.
- This spec does not specify an encoding; it thinks of lines as composed
- of characters rather than bytes. A conforming parser may be limited
- to a certain encoding.
- Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
- .
- →foo→baz→→bim
- .
- <pre><code>foo baz bim
- </code></pre>
- .
- .
- a→a
- ὐ→a
- .
- <pre><code>a a
- ὐ a
- </code></pre>
- .
- Line endings are replaced by newline characters (LF).
- A line containing no characters, or a line containing only spaces (after
- tab expansion), is called a [blank line](@blank-line).
- # Blocks and inlines
- We can think of a document as a sequence of
- [blocks](@block)---structural
- elements like paragraphs, block quotations,
- lists, headers, rules, and code blocks. Blocks can contain other
- blocks, or they can contain [inline](@inline) content:
- words, spaces, links, emphasized text, images, and inline code.
- ## Precedence
- Indicators of block structure always take precedence over indicators
- of inline structure. So, for example, the following is a list with
- two items, not a list with one item containing a code span:
- .
- - `one
- - two`
- .
- <ul>
- <li>`one</li>
- <li>two`</li>
- </ul>
- .
- This means that parsing can proceed in two steps: first, the block
- structure of the document can be discerned; second, text lines inside
- paragraphs, headers, and other block constructs can be parsed for inline
- structure. The second step requires information about link reference
- definitions that will be available only at the end of the first
- step. Note that the first step requires processing lines in sequence,
- but the second can be parallelized, since the inline parsing of
- one block element does not affect the inline parsing of any other.
- ## Container blocks and leaf blocks
- We can divide blocks into two types:
- [container blocks](@container-block),
- which can contain other blocks, and [leaf blocks](@leaf-block),
- which cannot.
- # Leaf blocks
- This section describes the different kinds of leaf block that make up a
- Markdown document.
- ## Horizontal rules
- A line consisting of 0-3 spaces of indentation, followed by a sequence
- of three or more matching `-`, `_`, or `*` characters, each followed
- optionally by any number of spaces, forms a [horizontal
- rule](@horizontal-rule).
- .
- ***
- ---
- ___
- .
- <hr />
- <hr />
- <hr />
- .
- Wrong characters:
- .
- +++
- .
- <p>+++</p>
- .
- .
- ===
- .
- <p>===</p>
- .
- Not enough characters:
- .
- --
- **
- __
- .
- <p>--
- **
- __</p>
- .
- One to three spaces indent are allowed:
- .
- ***
- ***
- ***
- .
- <hr />
- <hr />
- <hr />
- .
- Four spaces is too many:
- .
- ***
- .
- <pre><code>***
- </code></pre>
- .
- .
- Foo
- ***
- .
- <p>Foo
- ***</p>
- .
- More than three characters may be used:
- .
- _____________________________________
- .
- <hr />
- .
- Spaces are allowed between the characters:
- .
- - - -
- .
- <hr />
- .
- .
- ** * ** * ** * **
- .
- <hr />
- .
- .
- - - - -
- .
- <hr />
- .
- Spaces are allowed at the end:
- .
- - - - -
- .
- <hr />
- .
- However, no other characters may occur in the line:
- .
- _ _ _ _ a
- a------
- ---a---
- .
- <p>_ _ _ _ a</p>
- <p>a------</p>
- <p>---a---</p>
- .
- It is required that all of the non-space characters be the same.
- So, this is not a horizontal rule:
- .
- *-*
- .
- <p><em>-</em></p>
- .
- Horizontal rules do not need blank lines before or after:
- .
- - foo
- ***
- - bar
- .
- <ul>
- <li>foo</li>
- </ul>
- <hr />
- <ul>
- <li>bar</li>
- </ul>
- .
- Horizontal rules can interrupt a paragraph:
- .
- Foo
- ***
- bar
- .
- <p>Foo</p>
- <hr />
- <p>bar</p>
- .
- If a line of dashes that meets the above conditions for being a
- horizontal rule could also be interpreted as the underline of a [setext
- header](#setext-header), the interpretation as a
- [setext-header](#setext-header) takes precedence. Thus, for example,
- this is a setext header, not a paragraph followed by a horizontal rule:
- .
- Foo
- ---
- bar
- .
- <h2>Foo</h2>
- <p>bar</p>
- .
- When both a horizontal rule and a list item are possible
- interpretations of a line, the horizontal rule is preferred:
- .
- * Foo
- * * *
- * Bar
- .
- <ul>
- <li>Foo</li>
- </ul>
- <hr />
- <ul>
- <li>Bar</li>
- </ul>
- .
- If you want a horizontal rule in a list item, use a different bullet:
- .
- - Foo
- - * * *
- .
- <ul>
- <li>Foo</li>
- <li><hr /></li>
- </ul>
- .
- ## ATX headers
- An [ATX header](@atx-header)
- consists of a string of characters, parsed as inline content, between an
- opening sequence of 1--6 unescaped `#` characters and an optional
- closing sequence of any number of `#` characters. The opening sequence
- of `#` characters cannot be followed directly by a nonspace character.
- The optional closing sequence of `#`s must be preceded by a space and may be
- followed by spaces only. The opening `#` character may be indented 0-3
- spaces. The raw contents of the header are stripped of leading and
- trailing spaces before being parsed as inline content. The header level
- is equal to the number of `#` characters in the opening sequence.
- Simple headers:
- .
- # foo
- ## foo
- ### foo
- #### foo
- ##### foo
- ###### foo
- .
- <h1>foo</h1>
- <h2>foo</h2>
- <h3>foo</h3>
- <h4>foo</h4>
- <h5>foo</h5>
- <h6>foo</h6>
- .
- More than six `#` characters is not a header:
- .
- ####### foo
- .
- <p>####### foo</p>
- .
- A space is required between the `#` characters and the header's
- contents. Note that many implementations currently do not require
- the space. However, the space was required by the [original ATX
- implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps
- prevent things like the following from being parsed as headers:
- .
- #5 bolt
- .
- <p>#5 bolt</p>
- .
- This is not a header, because the first `#` is escaped:
- .
- \## foo
- .
- <p>## foo</p>
- .
- Contents are parsed as inlines:
- .
- # foo *bar* \*baz\*
- .
- <h1>foo <em>bar</em> *baz*</h1>
- .
- Leading and trailing blanks are ignored in parsing inline content:
- .
- # foo
- .
- <h1>foo</h1>
- .
- One to three spaces indentation are allowed:
- .
- ### foo
- ## foo
- # foo
- .
- <h3>foo</h3>
- <h2>foo</h2>
- <h1>foo</h1>
- .
- Four spaces are too much:
- .
- # foo
- .
- <pre><code># foo
- </code></pre>
- .
- .
- foo
- # bar
- .
- <p>foo
- # bar</p>
- .
- A closing sequence of `#` characters is optional:
- .
- ## foo ##
- ### bar ###
- .
- <h2>foo</h2>
- <h3>bar</h3>
- .
- It need not be the same length as the opening sequence:
- .
- # foo ##################################
- ##### foo ##
- .
- <h1>foo</h1>
- <h5>foo</h5>
- .
- Spaces are allowed after the closing sequence:
- .
- ### foo ###
- .
- <h3>foo</h3>
- .
- A sequence of `#` characters with a nonspace character following it
- is not a closing sequence, but counts as part of the contents of the
- header:
- .
- ### foo ### b
- .
- <h3>foo ### b</h3>
- .
- The closing sequence must be preceded by a space:
- .
- # foo#
- .
- <h1>foo#</h1>
- .
- Backslash-escaped `#` characters do not count as part
- of the closing sequence:
- .
- ### foo \###
- ## foo #\##
- # foo \#
- .
- <h3>foo ###</h3>
- <h2>foo ###</h2>
- <h1>foo #</h1>
- .
- ATX headers need not be separated from surrounding content by blank
- lines, and they can interrupt paragraphs:
- .
- ****
- ## foo
- ****
- .
- <hr />
- <h2>foo</h2>
- <hr />
- .
- .
- Foo bar
- # baz
- Bar foo
- .
- <p>Foo bar</p>
- <h1>baz</h1>
- <p>Bar foo</p>
- .
- ATX headers can be empty:
- .
- ##
- #
- ### ###
- .
- <h2></h2>
- <h1></h1>
- <h3></h3>
- .
- ## Setext headers
- A [setext header](@setext-header)
- consists of a line of text, containing at least one nonspace character,
- with no more than 3 spaces indentation, followed by a [setext header
- underline](#setext-header-underline). The line of text must be
- one that, were it not followed by the setext header underline,
- would be interpreted as part of a paragraph: it cannot be a code
- block, header, blockquote, horizontal rule, or list. A [setext header
- underline](@setext-header-underline)
- is a sequence of `=` characters or a sequence of `-` characters, with no
- more than 3 spaces indentation and any number of trailing
- spaces. The header is a level 1 header if `=` characters are used, and
- a level 2 header if `-` characters are used. The contents of the header
- are the result of parsing the first line as Markdown inline content.
- In general, a setext header need not be preceded or followed by a
- blank line. However, it cannot interrupt a paragraph, so when a
- setext header comes after a paragraph, a blank line is needed between
- them.
- Simple examples:
- .
- Foo *bar*
- =========
- Foo *bar*
- ---------
- .
- <h1>Foo <em>bar</em></h1>
- <h2>Foo <em>bar</em></h2>
- .
- The underlining can be any length:
- .
- Foo
- -------------------------
- Foo
- =
- .
- <h2>Foo</h2>
- <h1>Foo</h1>
- .
- The header content can be indented up to three spaces, and need
- not line up with the underlining:
- .
- Foo
- ---
- Foo
- -----
- Foo
- ===
- .
- <h2>Foo</h2>
- <h2>Foo</h2>
- <h1>Foo</h1>
- .
- Four spaces indent is too much:
- .
- Foo
- ---
- Foo
- ---
- .
- <pre><code>Foo
- ---
- Foo
- </code></pre>
- <hr />
- .
- The setext header underline can be indented up to three spaces, and
- may have trailing spaces:
- .
- Foo
- ----
- .
- <h2>Foo</h2>
- .
- Four spaces is too much:
- .
- Foo
- ---
- .
- <p>Foo
- ---</p>
- .
- The setext header underline cannot contain internal spaces:
- .
- Foo
- = =
- Foo
- --- -
- .
- <p>Foo
- = =</p>
- <p>Foo</p>
- <hr />
- .
- Trailing spaces in the content line do not cause a line break:
- .
- Foo
- -----
- .
- <h2>Foo</h2>
- .
- Nor does a backslash at the end:
- .
- Foo\
- ----
- .
- <h2>Foo\</h2>
- .
- Since indicators of block structure take precedence over
- indicators of inline structure, the following are setext headers:
- .
- `Foo
- ----
- `
- <a title="a lot
- ---
- of dashes"/>
- .
- <h2>`Foo</h2>
- <p>`</p>
- <h2><a title="a lot</h2>
- <p>of dashes"/></p>
- .
- The setext header underline cannot be a [lazy continuation
- line](#lazy-continuation-line) in a list item or block quote:
- .
- > Foo
- ---
- .
- <blockquote>
- <p>Foo</p>
- </blockquote>
- <hr />
- .
- .
- - Foo
- ---
- .
- <ul>
- <li>Foo</li>
- </ul>
- <hr />
- .
- A setext header cannot interrupt a paragraph:
- .
- Foo
- Bar
- ---
- Foo
- Bar
- ===
- .
- <p>Foo
- Bar</p>
- <hr />
- <p>Foo
- Bar
- ===</p>
- .
- But in general a blank line is not required before or after:
- .
- ---
- Foo
- ---
- Bar
- ---
- Baz
- .
- <hr />
- <h2>Foo</h2>
- <h2>Bar</h2>
- <p>Baz</p>
- .
- Setext headers cannot be empty:
- .
- ====
- .
- <p>====</p>
- .
- Setext header text lines must not be interpretable as block
- constructs other than paragraphs. So, the line of dashes
- in these examples gets interpreted as a horizontal rule:
- .
- ---
- ---
- .
- <hr />
- <hr />
- .
- .
- - foo
- -----
- .
- <ul>
- <li>foo</li>
- </ul>
- <hr />
- .
- .
- foo
- ---
- .
- <pre><code>foo
- </code></pre>
- <hr />
- .
- .
- > foo
- -----
- .
- <blockquote>
- <p>foo</p>
- </blockquote>
- <hr />
- .
- If you want a header with `> foo` as its literal text, you can
- use backslash escapes:
- .
- \> foo
- ------
- .
- <h2>> foo</h2>
- .
- ## Indented code blocks
- An [indented code block](@indented-code-block)
- is composed of one or more
- [indented chunks](#indented-chunk) separated by blank lines.
- An [indented chunk](@indented-chunk)
- is a sequence of non-blank lines, each indented four or more
- spaces. An indented code block cannot interrupt a paragraph, so
- if it occurs before or after a paragraph, there must be an
- intervening blank line. The contents of the code block are
- the literal contents of the lines, including trailing newlines,
- minus four spaces of indentation. An indented code block has no
- attributes.
- .
- a simple
- indented code block
- .
- <pre><code>a simple
- indented code block
- </code></pre>
- .
- The contents are literal text, and do not get parsed as Markdown:
- .
- <a/>
- *hi*
- - one
- .
- <pre><code><a/>
- *hi*
- - one
- </code></pre>
- .
- Here we have three chunks separated by blank lines:
- .
- chunk1
- chunk2
-
-
-
- chunk3
- .
- <pre><code>chunk1
- chunk2
- chunk3
- </code></pre>
- .
- Any initial spaces beyond four will be included in the content, even
- in interior blank lines:
- .
- chunk1
-
- chunk2
- .
- <pre><code>chunk1
-
- chunk2
- </code></pre>
- .
- An indented code block cannot interrupt a paragraph. (This
- allows hanging indents and the like.)
- .
- Foo
- bar
- .
- <p>Foo
- bar</p>
- .
- However, any non-blank line with fewer than four leading spaces ends
- the code block immediately. So a paragraph may occur immediately
- after indented code:
- .
- foo
- bar
- .
- <pre><code>foo
- </code></pre>
- <p>bar</p>
- .
- And indented code can occur immediately before and after other kinds of
- blocks:
- .
- # Header
- foo
- Header
- ------
- foo
- ----
- .
- <h1>Header</h1>
- <pre><code>foo
- </code></pre>
- <h2>Header</h2>
- <pre><code>foo
- </code></pre>
- <hr />
- .
- The first line can be indented more than four spaces:
- .
- foo
- bar
- .
- <pre><code> foo
- bar
- </code></pre>
- .
- Blank lines preceding or following an indented code block
- are not included in it:
- .
-
- foo
-
- .
- <pre><code>foo
- </code></pre>
- .
- Trailing spaces are included in the code block's content:
- .
- foo
- .
- <pre><code>foo
- </code></pre>
- .
- ## Fenced code blocks
- A [code fence](@code-fence) is a sequence
- of at least three consecutive backtick characters (`` ` ``) or
- tildes (`~`). (Tildes and backticks cannot be mixed.)
- A [fenced code block](@fenced-code-block)
- begins with a code fence, indented no more than three spaces.
- The line with the opening code fence may optionally contain some text
- following the code fence; this is trimmed of leading and trailing
- spaces and called the [info string](@info-string).
- The info string may not contain any backtick
- characters. (The reason for this restriction is that otherwise
- some inline code would be incorrectly interpreted as the
- beginning of a fenced code block.)
- The content of the code block consists of all subsequent lines, until
- a closing [code fence](#code-fence) of the same type as the code block
- began with (backticks or tildes), and with at least as many backticks
- or tildes as the opening code fence. If the leading code fence is
- indented N spaces, then up to N spaces of indentation are removed from
- each line of the content (if present). (If a content line is not
- indented, it is preserved unchanged. If it is indented less than N
- spaces, all of the indentation is removed.)
- The closing code fence may be indented up to three spaces, and may be
- followed only by spaces, which are ignored. If the end of the
- containing block (or document) is reached and no closing code fence
- has been found, the code block contains all of the lines after the
- opening code fence until the end of the containing block (or
- document). (An alternative spec would require backtracking in the
- event that a closing code fence is not found. But this makes parsing
- much less efficient, and there seems to be no real down side to the
- behavior described here.)
- A fenced code block may interrupt a paragraph, and does not require
- a blank line either before or after.
- The content of a code fence is treated as literal text, not parsed
- as inlines. The first word of the info string is typically used to
- specify the language of the code sample, and rendered in the `class`
- attribute of the `code` tag. However, this spec does not mandate any
- particular treatment of the info string.
- Here is a simple example with backticks:
- .
- ```
- <
- >
- ```
- .
- <pre><code><
- >
- </code></pre>
- .
- With tildes:
- .
- ~~~
- <
- >
- ~~~
- .
- <pre><code><
- >
- </code></pre>
- .
- The closing code fence must use the same character as the opening
- fence:
- .
- ```
- aaa
- ~~~
- ```
- .
- <pre><code>aaa
- ~~~
- </code></pre>
- .
- .
- ~~~
- aaa
- ```
- ~~~
- .
- <pre><code>aaa
- ```
- </code></pre>
- .
- The closing code fence must be at least as long as the opening fence:
- .
- ````
- aaa
- ```
- ``````
- .
- <pre><code>aaa
- ```
- </code></pre>
- .
- .
- ~~~~
- aaa
- ~~~
- ~~~~
- .
- <pre><code>aaa
- ~~~
- </code></pre>
- .
- Unclosed code blocks are closed by the end of the document:
- .
- ```
- .
- <pre><code></code></pre>
- .
- .
- `````
- ```
- aaa
- .
- <pre><code>
- ```
- aaa
- </code></pre>
- .
- A code block can have all empty lines as its content:
- .
- ```
-
- ```
- .
- <pre><code>
-
- </code></pre>
- .
- A code block can be empty:
- .
- ```
- ```
- .
- <pre><code></code></pre>
- .
- Fences can be indented. If the opening fence is indented,
- content lines will have equivalent opening indentation removed,
- if present:
- .
- ```
- aaa
- aaa
- ```
- .
- <pre><code>aaa
- aaa
- </code></pre>
- .
- .
- ```
- aaa
- aaa
- aaa
- ```
- .
- <pre><code>aaa
- aaa
- aaa
- </code></pre>
- .
- .
- ```
- aaa
- aaa
- aaa
- ```
- .
- <pre><code>aaa
- aaa
- aaa
- </code></pre>
- .
- Four spaces indentation produces an indented code block:
- .
- ```
- aaa
- ```
- .
- <pre><code>```
- aaa
- ```
- </code></pre>
- .
- Closing fences may be indented by 0-3 spaces, and their indentation
- need not match that of the opening fence:
- .
- ```
- aaa
- ```
- .
- <pre><code>aaa
- </code></pre>
- .
- .
- ```
- aaa
- ```
- .
- <pre><code>aaa
- </code></pre>
- .
- This is not a closing fence, because it is indented 4 spaces:
- .
- ```
- aaa
- ```
- .
- <pre><code>aaa
- ```
- </code></pre>
- .
- Code fences (opening and closing) cannot contain internal spaces:
- .
- ``` ```
- aaa
- .
- <p><code></code>
- aaa</p>
- .
- .
- ~~~~~~
- aaa
- ~~~ ~~
- .
- <pre><code>aaa
- ~~~ ~~
- </code></pre>
- .
- Fenced code blocks can interrupt paragraphs, and can be followed
- directly by paragraphs, without a blank line between:
- .
- foo
- ```
- bar
- ```
- baz
- .
- <p>foo</p>
- <pre><code>bar
- </code></pre>
- <p>baz</p>
- .
- Other blocks can also occur before and after fenced code blocks
- without an intervening blank line:
- .
- foo
- ---
- ~~~
- bar
- ~~~
- # baz
- .
- <h2>foo</h2>
- <pre><code>bar
- </code></pre>
- <h1>baz</h1>
- .
- An [info string](#info-string) can be provided after the opening code fence.
- Opening and closing spaces will be stripped, and the first word, prefixed
- with `language-`, is used as the value for the `class` attribute of the
- `code` element within the enclosing `pre` element.
- .
- ```ruby
- def foo(x)
- return 3
- end
- ```
- .
- <pre><code class="language-ruby">def foo(x)
- return 3
- end
- </code></pre>
- .
- .
- ~~~~ ruby startline=3 $%@#$
- def foo(x)
- return 3
- end
- ~~~~~~~
- .
- <pre><code class="language-ruby">def foo(x)
- return 3
- end
- </code></pre>
- .
- .
- ````;
- ````
- .
- <pre><code class="language-;"></code></pre>
- .
- Info strings for backtick code blocks cannot contain backticks:
- .
- ``` aa ```
- foo
- .
- <p><code>aa</code>
- foo</p>
- .
- Closing code fences cannot have info strings:
- .
- ```
- ``` aaa
- ```
- .
- <pre><code>``` aaa
- </code></pre>
- .
- ## HTML blocks
- An [HTML block tag](@html-block-tag) is
- an [open tag](#open-tag) or [closing tag](#closing-tag) whose tag
- name is one of the following (case-insensitive):
- `article`, `header`, `aside`, `hgroup`, `blockquote`, `hr`, `iframe`,
- `body`, `li`, `map`, `button`, `object`, `canvas`, `ol`, `caption`,
- `output`, `col`, `p`, `colgroup`, `pre`, `dd`, `progress`, `div`,
- `section`, `dl`, `table`, `td`, `dt`, `tbody`, `embed`, `textarea`,
- `fieldset`, `tfoot`, `figcaption`, `th`, `figure`, `thead`, `footer`,
- `tr`, `form`, `ul`, `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `video`,
- `script`, `style`.
- An [HTML block](@html-block) begins with an
- [HTML block tag](#html-block-tag), [HTML comment](#html-comment),
- [processing instruction](#processing-instruction),
- [declaration](#declaration), or [CDATA section](#cdata-section).
- It ends when a [blank line](#blank-line) or the end of the
- input is encountered. The initial line may be indented up to three
- spaces, and subsequent lines may have any indentation. The contents
- of the HTML block are interpreted as raw HTML, and will not be escaped
- in HTML output.
- Some simple examples:
- .
- <table>
- <tr>
- <td>
- hi
- </td>
- </tr>
- </table>
- okay.
- .
- <table>
- <tr>
- <td>
- hi
- </td>
- </tr>
- </table>
- <p>okay.</p>
- .
- .
- <div>
- *hello*
- <foo><a>
- .
- <div>
- *hello*
- <foo><a>
- .
- Here we have two HTML blocks with a Markdown paragraph between them:
- .
- <DIV CLASS="foo">
- *Markdown*
- </DIV>
- .
- <DIV CLASS="foo">
- <p><em>Markdown</em></p>
- </DIV>
- .
- In the following example, what looks like a Markdown code block
- is actually part of the HTML block, which continues until a blank
- line or the end of the document is reached:
- .
- <div></div>
- ``` c
- int x = 33;
- ```
- .
- <div></div>
- ``` c
- int x = 33;
- ```
- .
- A comment:
- .
- <!-- Foo
- bar
- baz -->
- .
- <!-- Foo
- bar
- baz -->
- .
- A processing instruction:
- .
- <?php
- echo '>';
- ?>
- .
- <?php
- echo '>';
- ?>
- .
- CDATA:
- .
- <![CDATA[
- function matchwo(a,b)
- {
- if (a < b && a < 0) then
- {
- return 1;
- }
- else
- {
- return 0;
- }
- }
- ]]>
- .
- <![CDATA[
- function matchwo(a,b)
- {
- if (a < b && a < 0) then
- {
- return 1;
- }
- else
- {
- return 0;
- }
- }
- ]]>
- .
- The opening tag can be indented 1-3 spaces, but not 4:
- .
- <!-- foo -->
- <!-- foo -->
- .
- <!-- foo -->
- <pre><code><!-- foo -->
- </code></pre>
- .
- An HTML block can interrupt a paragraph, and need not be preceded
- by a blank line.
- .
- Foo
- <div>
- bar
- </div>
- .
- <p>Foo</p>
- <div>
- bar
- </div>
- .
- However, a following blank line is always needed, except at the end of
- a document:
- .
- <div>
- bar
- </div>
- *foo*
- .
- <div>
- bar
- </div>
- *foo*
- .
- An incomplete HTML block tag may also start an HTML block:
- .
- <div class
- foo
- .
- <div class
- foo
- .
- This rule differs from John Gruber's original Markdown syntax
- specification, which says:
- > The only restrictions are that block-level HTML elements —
- > e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from
- > surrounding content by blank lines, and the start and end tags of the
- > block should not be indented with tabs or spaces.
- In some ways Gruber's rule is more restrictive than the one given
- here:
- - It requires that an HTML block be preceded by a blank line.
- - It does not allow the start tag to be indented.
- - It requires a matching end tag, which it also does not allow to
- be indented.
- Indeed, most Markdown implementations, including some of Gruber's
- own perl implementations, do not impose these restrictions.
- There is one respect, however, in which Gruber's rule is more liberal
- than the one given here, since it allows blank lines to occur inside
- an HTML block. There are two reasons for disallowing them here.
- First, it removes the need to parse balanced tags, which is
- expensive and can require backtracking from the end of the document
- if no matching end tag is found. Second, it provides a very simple
- and flexible way of including Markdown content inside HTML tags:
- simply separate the Markdown from the HTML using blank lines:
- .
- <div>
- *Emphasized* text.
- </div>
- .
- <div>
- <p><em>Emphasized</em> text.</p>
- </div>
- .
- Compare:
- .
- <div>
- *Emphasized* text.
- </div>
- .
- <div>
- *Emphasized* text.
- </div>
- .
- Some Markdown implementations have adopted a convention of
- interpreting content inside tags as text if the open tag has
- the attribute `markdown=1`. The rule given above seems a simpler and
- more elegant way of achieving the same expressive power, which is also
- much simpler to parse.
- The main potential drawback is that one can no longer paste HTML
- blocks into Markdown documents with 100% reliability. However,
- *in most cases* this will work fine, because the blank lines in
- HTML are usually followed by HTML block tags. For example:
- .
- <table>
- <tr>
- <td>
- Hi
- </td>
- </tr>
- </table>
- .
- <table>
- <tr>
- <td>
- Hi
- </td>
- </tr>
- </table>
- .
- Moreover, blank lines are usually not necessary and can be
- deleted. The exception is inside `<pre>` tags; here, one can
- replace the blank lines with ` ` entities.
- So there is no important loss of expressive power with the new rule.
- ## Link reference definitions
- A [link reference definition](@link-reference-definition)
- consists of a [link
- label](#link-label), indented up to three spaces, followed
- by a colon (`:`), optional blank space (including up to one
- newline), a [link destination](#link-destination), optional
- blank space (including up to one newline), and an optional [link
- title](#link-title), which if it is present must be separated
- from the [link destination](#link-destination) by whitespace.
- No further non-space characters may occur on the line.
- A [link reference-definition](#link-reference-definition)
- does not correspond to a structural element of a document. Instead, it
- defines a label which can be used in [reference links](#reference-link)
- and reference-style [images](#image) elsewhere in the document. [Link
- reference definitions] can come either before or after the links that use
- them.
- .
- [foo]: /url "title"
- [foo]
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- .
- [foo]:
- /url
- 'the title'
- [foo]
- .
- <p><a href="/url" title="the title">foo</a></p>
- .
- .
- [Foo*bar\]]:my_(url) 'title (with parens)'
- [Foo*bar\]]
- .
- <p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p>
- .
- .
- [Foo bar]:
- <my url>
- 'title'
- [Foo bar]
- .
- <p><a href="my%20url" title="title">Foo bar</a></p>
- .
- The title may be omitted:
- .
- [foo]:
- /url
- [foo]
- .
- <p><a href="/url">foo</a></p>
- .
- The link destination may not be omitted:
- .
- [foo]:
- [foo]
- .
- <p>[foo]:</p>
- <p>[foo]</p>
- .
- A link can come before its corresponding definition:
- .
- [foo]
- [foo]: url
- .
- <p><a href="url">foo</a></p>
- .
- If there are several matching definitions, the first one takes
- precedence:
- .
- [foo]
- [foo]: first
- [foo]: second
- .
- <p><a href="first">foo</a></p>
- .
- As noted in the section on [Links], matching of labels is
- case-insensitive (see [matches](#matches)).
- .
- [FOO]: /url
- [Foo]
- .
- <p><a href="/url">Foo</a></p>
- .
- .
- [ΑΓΩ]: /φου
- [αγω]
- .
- <p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
- .
- Here is a link reference definition with no corresponding link.
- It contributes nothing to the document.
- .
- [foo]: /url
- .
- .
- This is not a link reference definition, because there are
- non-space characters after the title:
- .
- [foo]: /url "title" ok
- .
- <p>[foo]: /url "title" ok</p>
- .
- This is not a link reference definition, because it is indented
- four spaces:
- .
- [foo]: /url "title"
- [foo]
- .
- <pre><code>[foo]: /url "title"
- </code></pre>
- <p>[foo]</p>
- .
- This is not a link reference definition, because it occurs inside
- a code block:
- .
- ```
- [foo]: /url
- ```
- [foo]
- .
- <pre><code>[foo]: /url
- </code></pre>
- <p>[foo]</p>
- .
- A [link reference definition](#link-reference-definition) cannot
- interrupt a paragraph.
- .
- Foo
- [bar]: /baz
- [bar]
- .
- <p>Foo
- [bar]: /baz</p>
- <p>[bar]</p>
- .
- However, it can directly follow other block elements, such as headers
- and horizontal rules, and it need not be followed by a blank line.
- .
- # [Foo]
- [foo]: /url
- > bar
- .
- <h1><a href="/url">Foo</a></h1>
- <blockquote>
- <p>bar</p>
- </blockquote>
- .
- Several [link references](#link-reference) can occur one after another,
- without intervening blank lines.
- .
- [foo]: /foo-url "foo"
- [bar]: /bar-url
- "bar"
- [baz]: /baz-url
- [foo],
- [bar],
- [baz]
- .
- <p><a href="/foo-url" title="foo">foo</a>,
- <a href="/bar-url" title="bar">bar</a>,
- <a href="/baz-url">baz</a></p>
- .
- [Link reference definitions](#link-reference-definition) can occur
- inside block containers, like lists and block quotations. They
- affect the entire document, not just the container in which they
- are defined:
- .
- [foo]
- > [foo]: /url
- .
- <p><a href="/url">foo</a></p>
- <blockquote>
- </blockquote>
- .
- ## Paragraphs
- A sequence of non-blank lines that cannot be interpreted as other
- kinds of blocks forms a [paragraph](@paragraph).
- The contents of the paragraph are the result of parsing the
- paragraph's raw content as inlines. The paragraph's raw content
- is formed by concatenating the lines and removing initial and final
- spaces.
- A simple example with two paragraphs:
- .
- aaa
- bbb
- .
- <p>aaa</p>
- <p>bbb</p>
- .
- Paragraphs can contain multiple lines, but no blank lines:
- .
- aaa
- bbb
- ccc
- ddd
- .
- <p>aaa
- bbb</p>
- <p>ccc
- ddd</p>
- .
- Multiple blank lines between paragraph have no effect:
- .
- aaa
- bbb
- .
- <p>aaa</p>
- <p>bbb</p>
- .
- Leading spaces are skipped:
- .
- aaa
- bbb
- .
- <p>aaa
- bbb</p>
- .
- Lines after the first may be indented any amount, since indented
- code blocks cannot interrupt paragraphs.
- .
- aaa
- bbb
- ccc
- .
- <p>aaa
- bbb
- ccc</p>
- .
- However, the first line may be indented at most three spaces,
- or an indented code block will be triggered:
- .
- aaa
- bbb
- .
- <p>aaa
- bbb</p>
- .
- .
- aaa
- bbb
- .
- <pre><code>aaa
- </code></pre>
- <p>bbb</p>
- .
- Final spaces are stripped before inline parsing, so a paragraph
- that ends with two or more spaces will not end with a [hard line
- break](#hard-line-break):
- .
- aaa
- bbb
- .
- <p>aaa<br />
- bbb</p>
- .
- ## Blank lines
- [Blank lines](#blank-line) between block-level elements are ignored,
- except for the role they play in determining whether a [list](#list)
- is [tight](#tight) or [loose](#loose).
- Blank lines at the beginning and end of the document are also ignored.
- .
-
- aaa
-
- # aaa
-
- .
- <p>aaa</p>
- <h1>aaa</h1>
- .
- # Container blocks
- A [container block](#container-block) is a block that has other
- blocks as its contents. There are two basic kinds of container blocks:
- [block quotes](#block-quote) and [list items](#list-item).
- [Lists](#list) are meta-containers for [list items](#list-item).
- We define the syntax for container blocks recursively. The general
- form of the definition is:
- > If X is a sequence of blocks, then the result of
- > transforming X in such-and-such a way is a container of type Y
- > with these blocks as its content.
- So, we explain what counts as a block quote or list item by explaining
- how these can be *generated* from their contents. This should suffice
- to define the syntax, although it does not give a recipe for *parsing*
- these constructions. (A recipe is provided below in the section entitled
- [A parsing strategy](#appendix-a-a-parsing-strategy).)
- ## Block quotes
- A [block quote marker](@block-quote-marker)
- consists of 0-3 spaces of initial indent, plus (a) the character `>` together
- with a following space, or (b) a single character `>` not followed by a space.
- The following rules define [block quotes](@block-quote):
- 1. **Basic case.** If a string of lines *Ls* constitute a sequence
- of blocks *Bs*, then the result of prepending a [block quote
- marker](#block-quote-marker) to the beginning of each line in *Ls*
- is a [block quote](#block-quote) containing *Bs*.
- 2. **Laziness.** If a string of lines *Ls* constitute a [block
- quote](#block-quote) with contents *Bs*, then the result of deleting
- the initial [block quote marker](#block-quote-marker) from one or
- more lines in which the next non-space character after the [block
- quote marker](#block-quote-marker) is [paragraph continuation
- text](#paragraph-continuation-text) is a block quote with *Bs* as
- its content.
- [Paragraph continuation text](@paragraph-continuation-text) is text
- that will be parsed as part of the content of a paragraph, but does
- not occur at the beginning of the paragraph.
- 3. **Consecutiveness.** A document cannot contain two [block
- quotes](#block-quote) in a row unless there is a [blank
- line](#blank-line) between them.
- Nothing else counts as a [block quote](#block-quote).
- Here is a simple example:
- .
- > # Foo
- > bar
- > baz
- .
- <blockquote>
- <h1>Foo</h1>
- <p>bar
- baz</p>
- </blockquote>
- .
- The spaces after the `>` characters can be omitted:
- .
- ># Foo
- >bar
- > baz
- .
- <blockquote>
- <h1>Foo</h1>
- <p>bar
- baz</p>
- </blockquote>
- .
- The `>` characters can be indented 1-3 spaces:
- .
- > # Foo
- > bar
- > baz
- .
- <blockquote>
- <h1>Foo</h1>
- <p>bar
- baz</p>
- </blockquote>
- .
- Four spaces gives us a code block:
- .
- > # Foo
- > bar
- > baz
- .
- <pre><code>> # Foo
- > bar
- > baz
- </code></pre>
- .
- The Laziness clause allows us to omit the `>` before a
- paragraph continuation line:
- .
- > # Foo
- > bar
- baz
- .
- <blockquote>
- <h1>Foo</h1>
- <p>bar
- baz</p>
- </blockquote>
- .
- A block quote can contain some lazy and some non-lazy
- continuation lines:
- .
- > bar
- baz
- > foo
- .
- <blockquote>
- <p>bar
- baz
- foo</p>
- </blockquote>
- .
- Laziness only applies to lines that are continuations of
- paragraphs. Lines containing characters or indentation that indicate
- block structure cannot be lazy.
- .
- > foo
- ---
- .
- <blockquote>
- <p>foo</p>
- </blockquote>
- <hr />
- .
- .
- > - foo
- - bar
- .
- <blockquote>
- <ul>
- <li>foo</li>
- </ul>
- </blockquote>
- <ul>
- <li>bar</li>
- </ul>
- .
- .
- > foo
- bar
- .
- <blockquote>
- <pre><code>foo
- </code></pre>
- </blockquote>
- <pre><code>bar
- </code></pre>
- .
- .
- > ```
- foo
- ```
- .
- <blockquote>
- <pre><code></code></pre>
- </blockquote>
- <p>foo</p>
- <pre><code></code></pre>
- .
- A block quote can be empty:
- .
- >
- .
- <blockquote>
- </blockquote>
- .
- .
- >
- >
- >
- .
- <blockquote>
- </blockquote>
- .
- A block quote can have initial or final blank lines:
- .
- >
- > foo
- >
- .
- <blockquote>
- <p>foo</p>
- </blockquote>
- .
- A blank line always separates block quotes:
- .
- > foo
- > bar
- .
- <blockquote>
- <p>foo</p>
- </blockquote>
- <blockquote>
- <p>bar</p>
- </blockquote>
- .
- (Most current Markdown implementations, including John Gruber's
- original `Markdown.pl`, will parse this example as a single block quote
- with two paragraphs. But it seems better to allow the author to decide
- whether two block quotes or one are wanted.)
- Consecutiveness means that if we put these block quotes together,
- we get a single block quote:
- .
- > foo
- > bar
- .
- <blockquote>
- <p>foo
- bar</p>
- </blockquote>
- .
- To get a block quote with two paragraphs, use:
- .
- > foo
- >
- > bar
- .
- <blockquote>
- <p>foo</p>
- <p>bar</p>
- </blockquote>
- .
- Block quotes can interrupt paragraphs:
- .
- foo
- > bar
- .
- <p>foo</p>
- <blockquote>
- <p>bar</p>
- </blockquote>
- .
- In general, blank lines are not needed before or after block
- quotes:
- .
- > aaa
- ***
- > bbb
- .
- <blockquote>
- <p>aaa</p>
- </blockquote>
- <hr />
- <blockquote>
- <p>bbb</p>
- </blockquote>
- .
- However, because of laziness, a blank line is needed between
- a block quote and a following paragraph:
- .
- > bar
- baz
- .
- <blockquote>
- <p>bar
- baz</p>
- </blockquote>
- .
- .
- > bar
- baz
- .
- <blockquote>
- <p>bar</p>
- </blockquote>
- <p>baz</p>
- .
- .
- > bar
- >
- baz
- .
- <blockquote>
- <p>bar</p>
- </blockquote>
- <p>baz</p>
- .
- It is a consequence of the Laziness rule that any number
- of initial `>`s may be omitted on a continuation line of a
- nested block quote:
- .
- > > > foo
- bar
- .
- <blockquote>
- <blockquote>
- <blockquote>
- <p>foo
- bar</p>
- </blockquote>
- </blockquote>
- </blockquote>
- .
- .
- >>> foo
- > bar
- >>baz
- .
- <blockquote>
- <blockquote>
- <blockquote>
- <p>foo
- bar
- baz</p>
- </blockquote>
- </blockquote>
- </blockquote>
- .
- When including an indented code block in a block quote,
- remember that the [block quote marker](#block-quote-marker) includes
- both the `>` and a following space. So *five spaces* are needed after
- the `>`:
- .
- > code
- > not code
- .
- <blockquote>
- <pre><code>code
- </code></pre>
- </blockquote>
- <blockquote>
- <p>not code</p>
- </blockquote>
- .
- ## List items
- A [list marker](@list-marker) is a
- [bullet list marker](#bullet-list-marker) or an [ordered list
- marker](#ordered-list-marker).
- A [bullet list marker](@bullet-list-marker)
- is a `-`, `+`, or `*` character.
- An [ordered list marker](@ordered-list-marker)
- is a sequence of one of more digits (`0-9`), followed by either a
- `.` character or a `)` character.
- The following rules define [list items](@list-item):
- 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
- blocks *Bs* starting with a non-space character and not separated
- from each other by more than one blank line, and *M* is a list
- marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result
- of prepending *M* and the following spaces to the first line of
- *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
- list item with *Bs* as its contents. The type of the list item
- (bullet or ordered) is determined by the type of its list marker.
- If the list item is ordered, then it is also assigned a start
- number, based on the ordered list marker.
- For example, let *Ls* be the lines
- .
- A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote>
- .
- And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says
- that the following is an ordered list item with start number 1,
- and the same contents as *Ls*:
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <ol>
- <li><p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote></li>
- </ol>
- .
- The most important thing to notice is that the position of
- the text after the list marker determines how much indentation
- is needed in subsequent blocks in the list item. If the list
- marker takes up two spaces, and there are three spaces between
- the list marker and the next nonspace character, then blocks
- must be indented five spaces in order to fall under the list
- item.
- Here are some examples showing how far content must be indented to be
- put under the list item:
- .
- - one
- two
- .
- <ul>
- <li>one</li>
- </ul>
- <p>two</p>
- .
- .
- - one
- two
- .
- <ul>
- <li><p>one</p>
- <p>two</p></li>
- </ul>
- .
- .
- - one
- two
- .
- <ul>
- <li>one</li>
- </ul>
- <pre><code> two
- </code></pre>
- .
- .
- - one
- two
- .
- <ul>
- <li><p>one</p>
- <p>two</p></li>
- </ul>
- .
- It is tempting to think of this in terms of columns: the continuation
- blocks must be indented at least to the column of the first nonspace
- character after the list marker. However, that is not quite right.
- The spaces after the list marker determine how much relative indentation
- is needed. Which column this indentation reaches will depend on
- how the list item is embedded in other constructions, as shown by
- this example:
- .
- > > 1. one
- >>
- >> two
- .
- <blockquote>
- <blockquote>
- <ol>
- <li><p>one</p>
- <p>two</p></li>
- </ol>
- </blockquote>
- </blockquote>
- .
- Here `two` occurs in the same column as the list marker `1.`,
- but is actually contained in the list item, because there is
- sufficent indentation after the last containing blockquote marker.
- The converse is also possible. In the following example, the word `two`
- occurs far to the right of the initial text of the list item, `one`, but
- it is not considered part of the list item, because it is not indented
- far enough past the blockquote marker:
- .
- >>- one
- >>
- > > two
- .
- <blockquote>
- <blockquote>
- <ul>
- <li>one</li>
- </ul>
- <p>two</p>
- </blockquote>
- </blockquote>
- .
- A list item may not contain blocks that are separated by more than
- one blank line. Thus, two blank lines will end a list, unless the
- two blanks are contained in a [fenced code block](#fenced-code-block).
- .
- - foo
- bar
- - foo
- bar
- - ```
- foo
- bar
- ```
- .
- <ul>
- <li><p>foo</p>
- <p>bar</p></li>
- <li><p>foo</p></li>
- </ul>
- <p>bar</p>
- <ul>
- <li><pre><code>foo
- bar
- </code></pre></li>
- </ul>
- .
- A list item may contain any kind of block:
- .
- 1. foo
- ```
- bar
- ```
- baz
- > bam
- .
- <ol>
- <li><p>foo</p>
- <pre><code>bar
- </code></pre>
- <p>baz</p>
- <blockquote>
- <p>bam</p>
- </blockquote></li>
- </ol>
- .
- 2. **Item starting with indented code.** If a sequence of lines *Ls*
- constitute a sequence of blocks *Bs* starting with an indented code
- block and not separated from each other by more than one blank line,
- and *M* is a list marker *M* of width *W* followed by
- one space, then the result of prepending *M* and the following
- space to the first line of *Ls*, and indenting subsequent lines of
- *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
- If a line is empty, then it need not be indented. The type of the
- list item (bullet or ordered) is determined by the type of its list
- marker. If the list item is ordered, then it is also assigned a
- start number, based on the ordered list marker.
- An indented code block will have to be indented four spaces beyond
- the edge of the region where text will be included in the list item.
- In the following case that is 6 spaces:
- .
- - foo
- bar
- .
- <ul>
- <li><p>foo</p>
- <pre><code>bar
- </code></pre></li>
- </ul>
- .
- And in this case it is 11 spaces:
- .
- 10. foo
- bar
- .
- <ol start="10">
- <li><p>foo</p>
- <pre><code>bar
- </code></pre></li>
- </ol>
- .
- If the *first* block in the list item is an indented code block,
- then by rule #2, the contents must be indented *one* space after the
- list marker:
- .
- indented code
- paragraph
- more code
- .
- <pre><code>indented code
- </code></pre>
- <p>paragraph</p>
- <pre><code>more code
- </code></pre>
- .
- .
- 1. indented code
- paragraph
- more code
- .
- <ol>
- <li><pre><code>indented code
- </code></pre>
- <p>paragraph</p>
- <pre><code>more code
- </code></pre></li>
- </ol>
- .
- Note that an additional space indent is interpreted as space
- inside the code block:
- .
- 1. indented code
- paragraph
- more code
- .
- <ol>
- <li><pre><code> indented code
- </code></pre>
- <p>paragraph</p>
- <pre><code>more code
- </code></pre></li>
- </ol>
- .
- Note that rules #1 and #2 only apply to two cases: (a) cases
- in which the lines to be included in a list item begin with a nonspace
- character, and (b) cases in which they begin with an indented code
- block. In a case like the following, where the first block begins with
- a three-space indent, the rules do not allow us to form a list item by
- indenting the whole thing and prepending a list marker:
- .
- foo
- bar
- .
- <p>foo</p>
- <p>bar</p>
- .
- .
- - foo
- bar
- .
- <ul>
- <li>foo</li>
- </ul>
- <p>bar</p>
- .
- This is not a significant restriction, because when a block begins
- with 1-3 spaces indent, the indentation can always be removed without
- a change in interpretation, allowing rule #1 to be applied. So, in
- the above case:
- .
- - foo
- bar
- .
- <ul>
- <li><p>foo</p>
- <p>bar</p></li>
- </ul>
- .
- 3. **Indentation.** If a sequence of lines *Ls* constitutes a list item
- according to rule #1 or #2, then the result of indenting each line
- of *L* by 1-3 spaces (the same for each line) also constitutes a
- list item with the same contents and attributes. If a line is
- empty, then it need not be indented.
- Indented one space:
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <ol>
- <li><p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote></li>
- </ol>
- .
- Indented two spaces:
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <ol>
- <li><p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote></li>
- </ol>
- .
- Indented three spaces:
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <ol>
- <li><p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote></li>
- </ol>
- .
- Four spaces indent gives a code block:
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <pre><code>1. A paragraph
- with two lines.
- indented code
- > A block quote.
- </code></pre>
- .
- 4. **Laziness.** If a string of lines *Ls* constitute a [list
- item](#list-item) with contents *Bs*, then the result of deleting
- some or all of the indentation from one or more lines in which the
- next non-space character after the indentation is
- [paragraph continuation text](#paragraph-continuation-text) is a
- list item with the same contents and attributes. The unindented
- lines are called
- [lazy continuation lines](@lazy-continuation-line).
- Here is an example with [lazy continuation
- lines](#lazy-continuation-line):
- .
- 1. A paragraph
- with two lines.
- indented code
- > A block quote.
- .
- <ol>
- <li><p>A paragraph
- with two lines.</p>
- <pre><code>indented code
- </code></pre>
- <blockquote>
- <p>A block quote.</p>
- </blockquote></li>
- </ol>
- .
- Indentation can be partially deleted:
- .
- 1. A paragraph
- with two lines.
- .
- <ol>
- <li>A paragraph
- with two lines.</li>
- </ol>
- .
- These examples show how laziness can work in nested structures:
- .
- > 1. > Blockquote
- continued here.
- .
- <blockquote>
- <ol>
- <li><blockquote>
- <p>Blockquote
- continued here.</p>
- </blockquote></li>
- </ol>
- </blockquote>
- .
- .
- > 1. > Blockquote
- > continued here.
- .
- <blockquote>
- <ol>
- <li><blockquote>
- <p>Blockquote
- continued here.</p>
- </blockquote></li>
- </ol>
- </blockquote>
- .
- 5. **That's all.** Nothing that is not counted as a list item by rules
- #1--4 counts as a [list item](#list-item).
- The rules for sublists follow from the general rules above. A sublist
- must be indented the same number of spaces a paragraph would need to be
- in order to be included in the list item.
- So, in this case we need two spaces indent:
- .
- - foo
- - bar
- - baz
- .
- <ul>
- <li>foo
- <ul>
- <li>bar
- <ul>
- <li>baz</li>
- </ul></li>
- </ul></li>
- </ul>
- .
- One is not enough:
- .
- - foo
- - bar
- - baz
- .
- <ul>
- <li>foo</li>
- <li>bar</li>
- <li>baz</li>
- </ul>
- .
- Here we need four, because the list marker is wider:
- .
- 10) foo
- - bar
- .
- <ol start="10">
- <li>foo
- <ul>
- <li>bar</li>
- </ul></li>
- </ol>
- .
- Three is not enough:
- .
- 10) foo
- - bar
- .
- <ol start="10">
- <li>foo</li>
- </ol>
- <ul>
- <li>bar</li>
- </ul>
- .
- A list may be the first block in a list item:
- .
- - - foo
- .
- <ul>
- <li><ul>
- <li>foo</li>
- </ul></li>
- </ul>
- .
- .
- 1. - 2. foo
- .
- <ol>
- <li><ul>
- <li><ol start="2">
- <li>foo</li>
- </ol></li>
- </ul></li>
- </ol>
- .
- A list item may be empty:
- .
- - foo
- -
- - bar
- .
- <ul>
- <li>foo</li>
- <li></li>
- <li>bar</li>
- </ul>
- .
- .
- -
- .
- <ul>
- <li></li>
- </ul>
- .
- A list item can contain a header:
- .
- - # Foo
- - Bar
- ---
- baz
- .
- <ul>
- <li><h1>Foo</h1></li>
- <li><h2>Bar</h2>
- <p>baz</p></li>
- </ul>
- .
- ### Motivation
- John Gruber's Markdown spec says the following about list items:
- 1. "List markers typically start at the left margin, but may be indented
- by up to three spaces. List markers must be followed by one or more
- spaces or a tab."
- 2. "To make lists look nice, you can wrap items with hanging indents....
- But if you don't want to, you don't have to."
- 3. "List items may consist of multiple paragraphs. Each subsequent
- paragraph in a list item must be indented by either 4 spaces or one
- tab."
- 4. "It looks nice if you indent every line of the subsequent paragraphs,
- but here again, Markdown will allow you to be lazy."
- 5. "To put a blockquote within a list item, the blockquote's `>`
- delimiters need to be indented."
- 6. "To put a code block within a list item, the code block needs to be
- indented twice — 8 spaces or two tabs."
- These rules specify that a paragraph under a list item must be indented
- four spaces (presumably, from the left margin, rather than the start of
- the list marker, but this is not said), and that code under a list item
- must be indented eight spaces instead of the usual four. They also say
- that a block quote must be indented, but not by how much; however, the
- example given has four spaces indentation. Although nothing is said
- about other kinds of block-level content, it is certainly reasonable to
- infer that *all* block elements under a list item, including other
- lists, must be indented four spaces. This principle has been called the
- *four-space rule*.
- The four-space rule is clear and principled, and if the reference
- implementation `Markdown.pl` had followed it, it probably would have
- become the standard. However, `Markdown.pl` allowed paragraphs and
- sublists to start with only two spaces indentation, at least on the
- outer level. Worse, its behavior was inconsistent: a sublist of an
- outer-level list needed two spaces indentation, but a sublist of this
- sublist needed three spaces. It is not surprising, then, that different
- implementations of Markdown have developed very different rules for
- determining what comes under a list item. (Pandoc and python-Markdown,
- for example, stuck with Gruber's syntax description and the four-space
- rule, while discount, redcarpet, marked, PHP Markdown, and others
- followed `Markdown.pl`'s behavior more closely.)
- Unfortunately, given the divergences between implementations, there
- is no way to give a spec for list items that will be guaranteed not
- to break any existing documents. However, the spec given here should
- correctly handle lists formatted with either the four-space rule or
- the more forgiving `Markdown.pl` behavior, provided they are laid out
- in a way that is natural for a human to read.
- The strategy here is to let the width and indentation of the list marker
- determine the indentation necessary for blocks to fall under the list
- item, rather than having a fixed and arbitrary number. The writer can
- think of the body of the list item as a unit which gets indented to the
- right enough to fit the list marker (and any indentation on the list
- marker). (The laziness rule, #4, then allows continuation lines to be
- unindented if needed.)
- This rule is superior, we claim, to any rule requiring a fixed level of
- indentation from the margin. The four-space rule is clear but
- unnatural. It is quite unintuitive that
- ``` markdown
- - foo
- bar
- - baz
- ```
- should be parsed as two lists with an intervening paragraph,
- ``` html
- <ul>
- <li>foo</li>
- </ul>
- <p>bar</p>
- <ul>
- <li>baz</li>
- </ul>
- ```
- as the four-space rule demands, rather than a single list,
- ``` html
- <ul>
- <li><p>foo</p>
- <p>bar</p>
- <ul>
- <li>baz</li>
- </ul></li>
- </ul>
- ```
- The choice of four spaces is arbitrary. It can be learned, but it is
- not likely to be guessed, and it trips up beginners regularly.
- Would it help to adopt a two-space rule? The problem is that such
- a rule, together with the rule allowing 1--3 spaces indentation of the
- initial list marker, allows text that is indented *less than* the
- original list marker to be included in the list item. For example,
- `Markdown.pl` parses
- ``` markdown
- - one
- two
- ```
- as a single list item, with `two` a continuation paragraph:
- ``` html
- <ul>
- <li><p>one</p>
- <p>two</p></li>
- </ul>
- ```
- and similarly
- ``` markdown
- > - one
- >
- > two
- ```
- as
- ``` html
- <blockquote>
- <ul>
- <li><p>one</p>
- <p>two</p></li>
- </ul>
- </blockquote>
- ```
- This is extremely unintuitive.
- Rather than requiring a fixed indent from the margin, we could require
- a fixed indent (say, two spaces, or even one space) from the list marker (which
- may itself be indented). This proposal would remove the last anomaly
- discussed. Unlike the spec presented above, it would count the following
- as a list item with a subparagraph, even though the paragraph `bar`
- is not indented as far as the first paragraph `foo`:
- ``` markdown
- 10. foo
- bar
- ```
- Arguably this text does read like a list item with `bar` as a subparagraph,
- which may count in favor of the proposal. However, on this proposal indented
- code would have to be indented six spaces after the list marker. And this
- would break a lot of existing Markdown, which has the pattern:
- ``` markdown
- 1. foo
- indented code
- ```
- where the code is indented eight spaces. The spec above, by contrast, will
- parse this text as expected, since the code block's indentation is measured
- from the beginning of `foo`.
- The one case that needs special treatment is a list item that *starts*
- with indented code. How much indentation is required in that case, since
- we don't have a "first paragraph" to measure from? Rule #2 simply stipulates
- that in such cases, we require one space indentation from the list marker
- (and then the normal four spaces for the indented code). This will match the
- four-space rule in cases where the list marker plus its initial indentation
- takes four spaces (a common case), but diverge in other cases.
- ## Lists
- A [list](@list) is a sequence of one or more
- list items [of the same type](#of-the-same-type). The list items
- may be separated by single [blank lines](#blank-line), but two
- blank lines end all containing lists.
- Two list items are [of the same type](@of-the-same-type)
- if they begin with a [list
- marker](#list-marker) of the same type. Two list markers are of the
- same type if (a) they are bullet list markers using the same character
- (`-`, `+`, or `*`) or (b) they are ordered list numbers with the same
- delimiter (either `.` or `)`).
- A list is an [ordered list](@ordered-list)
- if its constituent list items begin with
- [ordered list markers](#ordered-list-marker), and a [bullet
- list](@bullet-list) if its constituent list
- items begin with [bullet list markers](#bullet-list-marker).
- The [start number](@start-number)
- of an [ordered list](#ordered-list) is determined by the list number of
- its initial list item. The numbers of subsequent list items are
- disregarded.
- A list is [loose](@loose) if it any of its constituent
- list items are separated by blank lines, or if any of its constituent
- list items directly contain two block-level elements with a blank line
- between them. Otherwise a list is [tight](@tight).
- (The difference in HTML output is that paragraphs in a loose list are
- wrapped in `<p>` tags, while paragraphs in a tight list are not.)
- Changing the bullet or ordered list delimiter starts a new list:
- .
- - foo
- - bar
- + baz
- .
- <ul>
- <li>foo</li>
- <li>bar</li>
- </ul>
- <ul>
- <li>baz</li>
- </ul>
- .
- .
- 1. foo
- 2. bar
- 3) baz
- .
- <ol>
- <li>foo</li>
- <li>bar</li>
- </ol>
- <ol start="3">
- <li>baz</li>
- </ol>
- .
- In CommonMark, a list can interrupt a paragraph. That is,
- no blank line is needed to separate a paragraph from a following
- list:
- .
- Foo
- - bar
- - baz
- .
- <p>Foo</p>
- <ul>
- <li>bar</li>
- <li>baz</li>
- </ul>
- .
- `Markdown.pl` does not allow this, through fear of triggering a list
- via a numeral in a hard-wrapped line:
- .
- The number of windows in my house is
- 14. The number of doors is 6.
- .
- <p>The number of windows in my house is</p>
- <ol start="14">
- <li>The number of doors is 6.</li>
- </ol>
- .
- Oddly, `Markdown.pl` *does* allow a blockquote to interrupt a paragraph,
- even though the same considerations might apply. We think that the two
- cases should be treated the same. Here are two reasons for allowing
- lists to interrupt paragraphs:
- First, it is natural and not uncommon for people to start lists without
- blank lines:
- I need to buy
- - new shoes
- - a coat
- - a plane ticket
- Second, we are attracted to a
- > [principle of uniformity](@principle-of-uniformity):
- > if a span of text has a certain
- > meaning, it will continue to have the same meaning when put into a list
- > item.
- (Indeed, the spec for [list items](#list-item) presupposes this.)
- This principle implies that if
- * I need to buy
- - new shoes
- - a coat
- - a plane ticket
- is a list item containing a paragraph followed by a nested sublist,
- as all Markdown implementations agree it is (though the paragraph
- may be rendered without `<p>` tags, since the list is "tight"),
- then
- I need to buy
- - new shoes
- - a coat
- - a plane ticket
- by itself should be a paragraph followed by a nested sublist.
- Our adherence to the [principle of uniformity](#principle-of-uniformity)
- thus inclines us to think that there are two coherent packages:
- 1. Require blank lines before *all* lists and blockquotes,
- including lists that occur as sublists inside other list items.
- 2. Require blank lines in none of these places.
- [reStructuredText](http://docutils.sourceforge.net/rst.html) takes
- the first approach, for which there is much to be said. But the second
- seems more consistent with established practice with Markdown.
- There can be blank lines between items, but two blank lines end
- a list:
- .
- - foo
- - bar
- - baz
- .
- <ul>
- <li><p>foo</p></li>
- <li><p>bar</p></li>
- </ul>
- <ul>
- <li>baz</li>
- </ul>
- .
- As illustrated above in the section on [list items](#list-item),
- two blank lines between blocks *within* a list item will also end a
- list:
- .
- - foo
- bar
- - baz
- .
- <ul>
- <li>foo</li>
- </ul>
- <p>bar</p>
- <ul>
- <li>baz</li>
- </ul>
- .
- Indeed, two blank lines will end *all* containing lists:
- .
- - foo
- - bar
- - baz
- bim
- .
- <ul>
- <li>foo
- <ul>
- <li>bar
- <ul>
- <li>baz</li>
- </ul></li>
- </ul></li>
- </ul>
- <pre><code> bim
- </code></pre>
- .
- Thus, two blank lines can be used to separate consecutive lists of
- the same type, or to separate a list from an indented code block
- that would otherwise be parsed as a subparagraph of the final list
- item:
- .
- - foo
- - bar
- - baz
- - bim
- .
- <ul>
- <li>foo</li>
- <li>bar</li>
- </ul>
- <ul>
- <li>baz</li>
- <li>bim</li>
- </ul>
- .
- .
- - foo
- notcode
- - foo
- code
- .
- <ul>
- <li><p>foo</p>
- <p>notcode</p></li>
- <li><p>foo</p></li>
- </ul>
- <pre><code>code
- </code></pre>
- .
- List items need not be indented to the same level. The following
- list items will be treated as items at the same list level,
- since none is indented enough to belong to the previous list
- item:
- .
- - a
- - b
- - c
- - d
- - e
- - f
- - g
- .
- <ul>
- <li>a</li>
- <li>b</li>
- <li>c</li>
- <li>d</li>
- <li>e</li>
- <li>f</li>
- <li>g</li>
- </ul>
- .
- This is a loose list, because there is a blank line between
- two of the list items:
- .
- - a
- - b
- - c
- .
- <ul>
- <li><p>a</p></li>
- <li><p>b</p></li>
- <li><p>c</p></li>
- </ul>
- .
- So is this, with a empty second item:
- .
- * a
- *
- * c
- .
- <ul>
- <li><p>a</p></li>
- <li></li>
- <li><p>c</p></li>
- </ul>
- .
- These are loose lists, even though there is no space between the items,
- because one of the items directly contains two block-level elements
- with a blank line between them:
- .
- - a
- - b
- c
- - d
- .
- <ul>
- <li><p>a</p></li>
- <li><p>b</p>
- <p>c</p></li>
- <li><p>d</p></li>
- </ul>
- .
- .
- - a
- - b
- [ref]: /url
- - d
- .
- <ul>
- <li><p>a</p></li>
- <li><p>b</p></li>
- <li><p>d</p></li>
- </ul>
- .
- This is a tight list, because the blank lines are in a code block:
- .
- - a
- - ```
- b
- ```
- - c
- .
- <ul>
- <li>a</li>
- <li><pre><code>b
- </code></pre></li>
- <li>c</li>
- </ul>
- .
- This is a tight list, because the blank line is between two
- paragraphs of a sublist. So the sublist is loose while
- the outer list is tight:
- .
- - a
- - b
- c
- - d
- .
- <ul>
- <li>a
- <ul>
- <li><p>b</p>
- <p>c</p></li>
- </ul></li>
- <li>d</li>
- </ul>
- .
- This is a tight list, because the blank line is inside the
- block quote:
- .
- * a
- > b
- >
- * c
- .
- <ul>
- <li>a
- <blockquote>
- <p>b</p>
- </blockquote></li>
- <li>c</li>
- </ul>
- .
- This list is tight, because the consecutive block elements
- are not separated by blank lines:
- .
- - a
- > b
- ```
- c
- ```
- - d
- .
- <ul>
- <li>a
- <blockquote>
- <p>b</p>
- </blockquote>
- <pre><code>c
- </code></pre></li>
- <li>d</li>
- </ul>
- .
- A single-paragraph list is tight:
- .
- - a
- .
- <ul>
- <li>a</li>
- </ul>
- .
- .
- - a
- - b
- .
- <ul>
- <li>a
- <ul>
- <li>b</li>
- </ul></li>
- </ul>
- .
- Here the outer list is loose, the inner list tight:
- .
- * foo
- * bar
- baz
- .
- <ul>
- <li><p>foo</p>
- <ul>
- <li>bar</li>
- </ul>
- <p>baz</p></li>
- </ul>
- .
- .
- - a
- - b
- - c
- - d
- - e
- - f
- .
- <ul>
- <li><p>a</p>
- <ul>
- <li>b</li>
- <li>c</li>
- </ul></li>
- <li><p>d</p>
- <ul>
- <li>e</li>
- <li>f</li>
- </ul></li>
- </ul>
- .
- # Inlines
- Inlines are parsed sequentially from the beginning of the character
- stream to the end (left to right, in left-to-right languages).
- Thus, for example, in
- .
- `hi`lo`
- .
- <p><code>hi</code>lo`</p>
- .
- `hi` is parsed as code, leaving the backtick at the end as a literal
- backtick.
- ## Backslash escapes
- Any ASCII punctuation character may be backslash-escaped:
- .
- \!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~
- .
- <p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p>
- .
- Backslashes before other characters are treated as literal
- backslashes:
- .
- \→\A\a\ \3\φ\«
- .
- <p>\ \A\a\ \3\φ\«</p>
- .
- Escaped characters are treated as regular characters and do
- not have their usual Markdown meanings:
- .
- \*not emphasized*
- \<br/> not a tag
- \[not a link](/foo)
- \`not code`
- 1\. not a list
- \* not a list
- \# not a header
- \[foo]: /url "not a reference"
- .
- <p>*not emphasized*
- <br/> not a tag
- [not a link](/foo)
- `not code`
- 1. not a list
- * not a list
- # not a header
- [foo]: /url "not a reference"</p>
- .
- If a backslash is itself escaped, the following character is not:
- .
- \\*emphasis*
- .
- <p>\<em>emphasis</em></p>
- .
- A backslash at the end of the line is a [hard line
- break](#hard-line-break):
- .
- foo\
- bar
- .
- <p>foo<br />
- bar</p>
- .
- Backslash escapes do not work in code blocks, code spans, autolinks, or
- raw HTML:
- .
- `` \[\` ``
- .
- <p><code>\[\`</code></p>
- .
- .
- \[\]
- .
- <pre><code>\[\]
- </code></pre>
- .
- .
- ~~~
- \[\]
- ~~~
- .
- <pre><code>\[\]
- </code></pre>
- .
- .
- <http://example.com?find=\*>
- .
- <p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
- .
- .
- <a href="/bar\/)">
- .
- <p><a href="/bar\/)"></p>
- .
- But they work in all other contexts, including URLs and link titles,
- link references, and info strings in [fenced code
- blocks](#fenced-code-block):
- .
- [foo](/bar\* "ti\*tle")
- .
- <p><a href="/bar*" title="ti*tle">foo</a></p>
- .
- .
- [foo]
- [foo]: /bar\* "ti\*tle"
- .
- <p><a href="/bar*" title="ti*tle">foo</a></p>
- .
- .
- ``` foo\+bar
- foo
- ```
- .
- <pre><code class="language-foo+bar">foo
- </code></pre>
- .
- ## Entities
- With the goal of making this standard as HTML-agnostic as possible, all
- valid HTML entities in any context are recognized as such and
- converted into unicode characters before they are stored in the AST.
- This allows implementations that target HTML output to trivially escape
- the entities when generating HTML, and simplifies the job of
- implementations targetting other languages, as these will only need to
- handle the unicode chars and need not be HTML-entity aware.
- [Named entities](@name-entities) consist of `&`
- + any of the valid HTML5 entity names + `;`. The
- [following document](http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json)
- is used as an authoritative source of the valid entity names and their
- corresponding codepoints.
- Conforming implementations that target HTML don't need to generate
- entities for all the valid named entities that exist, with the exception
- of `"` (`"`), `&` (`&`), `<` (`<`) and `>` (`>`), which
- always need to be written as entities for security reasons.
- .
- & © Æ Ď ¾ ℋ ⅆ ∲
- .
- <p> & © Æ Ď ¾ ℋ ⅆ ∲</p>
- .
- [Decimal entities](@decimal-entities)
- consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
- entities need to be recognised and tranformed into their corresponding
- UTF8 codepoints. Invalid Unicode codepoints will be written as the
- "unknown codepoint" character (`0xFFFD`)
- .
- # Ӓ Ϡ �
- .
- <p># Ӓ Ϡ �</p>
- .
- [Hexadecimal entities](@hexadecimal-entities)
- consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
- + `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST.
- .
- " ആ ಫ
- .
- <p>" ആ ಫ</p>
- .
- Here are some nonentities:
- .
-   &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;
- .
- <p>&nbsp &x; &#; &#x; &ThisIsWayTooLongToBeAnEntityIsntIt; &hi?;</p>
- .
- Although HTML5 does accept some entities without a trailing semicolon
- (such as `©`), these are not recognized as entities here, because it
- makes the grammar too ambiguous:
- .
- ©
- .
- <p>&copy</p>
- .
- Strings that are not on the list of HTML5 named entities are not
- recognized as entities either:
- .
- &MadeUpEntity;
- .
- <p>&MadeUpEntity;</p>
- .
- Entities are recognized in any context besides code spans or
- code blocks, including raw HTML, URLs, [link titles](#link-title), and
- [fenced code block](#fenced-code-block) info strings:
- .
- <a href="öö.html">
- .
- <p><a href="öö.html"></p>
- .
- .
- [foo](/föö "föö")
- .
- <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
- .
- .
- [foo]
- [foo]: /föö "föö"
- .
- <p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
- .
- .
- ``` föö
- foo
- ```
- .
- <pre><code class="language-föö">foo
- </code></pre>
- .
- Entities are treated as literal text in code spans and code blocks:
- .
- `föö`
- .
- <p><code>f&ouml;&ouml;</code></p>
- .
- .
- föfö
- .
- <pre><code>f&ouml;f&ouml;
- </code></pre>
- .
- ## Code span
- A [backtick string](@backtick-string)
- is a string of one or more backtick characters (`` ` ``) that is neither
- preceded nor followed by a backtick.
- A [code span](@code-span) begins with a backtick string and ends with a backtick
- string of equal length. The contents of the code span are the
- characters between the two backtick strings, with leading and trailing
- spaces and newlines removed, and consecutive spaces and newlines
- collapsed to single spaces.
- This is a simple code span:
- .
- `foo`
- .
- <p><code>foo</code></p>
- .
- Here two backticks are used, because the code contains a backtick.
- This example also illustrates stripping of leading and trailing spaces:
- .
- `` foo ` bar ``
- .
- <p><code>foo ` bar</code></p>
- .
- This example shows the motivation for stripping leading and trailing
- spaces:
- .
- ` `` `
- .
- <p><code>``</code></p>
- .
- Newlines are treated like spaces:
- .
- ``
- foo
- ``
- .
- <p><code>foo</code></p>
- .
- Interior spaces and newlines are collapsed into single spaces, just
- as they would be by a browser:
- .
- `foo bar
- baz`
- .
- <p><code>foo bar baz</code></p>
- .
- Q: Why not just leave the spaces, since browsers will collapse them
- anyway? A: Because we might be targeting a non-HTML format, and we
- shouldn't rely on HTML-specific rendering assumptions.
- (Existing implementations differ in their treatment of internal
- spaces and newlines. Some, including `Markdown.pl` and
- `showdown`, convert an internal newline into a `<br />` tag.
- But this makes things difficult for those who like to hard-wrap
- their paragraphs, since a line break in the midst of a code
- span will cause an unintended line break in the output. Others
- just leave internal spaces as they are, which is fine if only
- HTML is being targeted.)
- .
- `foo `` bar`
- .
- <p><code>foo `` bar</code></p>
- .
- Note that backslash escapes do not work in code spans. All backslashes
- are treated literally:
- .
- `foo\`bar`
- .
- <p><code>foo\</code>bar`</p>
- .
- Backslash escapes are never needed, because one can always choose a
- string of *n* backtick characters as delimiters, where the code does
- not contain any strings of exactly *n* backtick characters.
- Code span backticks have higher precedence than any other inline
- constructs except HTML tags and autolinks. Thus, for example, this is
- not parsed as emphasized text, since the second `*` is part of a code
- span:
- .
- *foo`*`
- .
- <p>*foo<code>*</code></p>
- .
- And this is not parsed as a link:
- .
- [not a `link](/foo`)
- .
- <p>[not a <code>link](/foo</code>)</p>
- .
- But this is a link:
- .
- <http://foo.bar.`baz>`
- .
- <p><a href="http://foo.bar.%60baz">http://foo.bar.`baz</a>`</p>
- .
- And this is an HTML tag:
- .
- <a href="`">`
- .
- <p><a href="`">`</p>
- .
- When a backtick string is not closed by a matching backtick string,
- we just have literal backticks:
- .
- ```foo``
- .
- <p>```foo``</p>
- .
- .
- `foo
- .
- <p>`foo</p>
- .
- ## Emphasis and strong emphasis
- John Gruber's original [Markdown syntax
- description](http://daringfireball.net/projects/markdown/syntax#em) says:
- > Markdown treats asterisks (`*`) and underscores (`_`) as indicators of
- > emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML
- > `<em>` tag; double `*`'s or `_`'s will be wrapped with an HTML `<strong>`
- > tag.
- This is enough for most users, but these rules leave much undecided,
- especially when it comes to nested emphasis. The original
- `Markdown.pl` test suite makes it clear that triple `***` and
- `___` delimiters can be used for strong emphasis, and most
- implementations have also allowed the following patterns:
- ``` markdown
- ***strong emph***
- ***strong** in emph*
- ***emph* in strong**
- **in strong *emph***
- *in emph **strong***
- ```
- The following patterns are less widely supported, but the intent
- is clear and they are useful (especially in contexts like bibliography
- entries):
- ``` markdown
- *emph *with emph* in it*
- **strong **with strong** in it**
- ```
- Many implementations have also restricted intraword emphasis to
- the `*` forms, to avoid unwanted emphasis in words containing
- internal underscores. (It is best practice to put these in code
- spans, but users often do not.)
- ``` markdown
- internal emphasis: foo*bar*baz
- no emphasis: foo_bar_baz
- ```
- The following rules capture all of these patterns, while allowing
- for efficient parsing strategies that do not backtrack:
- 1. A single `*` character [can open emphasis](@can-open-emphasis)
- iff it is not followed by
- whitespace.
- 2. A single `_` character [can open emphasis](#can-open-emphasis) iff
- it is not followed by whitespace and it is not preceded by an
- ASCII alphanumeric character.
- 3. A single `*` character [can close emphasis](@can-close-emphasis)
- iff it is not preceded by whitespace.
- 4. A single `_` character [can close emphasis](#can-close-emphasis) iff
- it is not preceded by whitespace and it is not followed by an
- ASCII alphanumeric character.
- 5. A double `**` [can open strong emphasis](@can-open-strong-emphasis)
- iff it is not followed by
- whitespace.
- 6. A double `__` [can open strong emphasis](#can-open-strong-emphasis)
- iff it is not followed by whitespace and it is not preceded by an
- ASCII alphanumeric character.
- 7. A double `**` [can close strong emphasis](@can-close-strong-emphasis)
- iff it is not preceded by
- whitespace.
- 8. A double `__` [can close strong emphasis](#can-close-strong-emphasis)
- iff it is not preceded by whitespace and it is not followed by an
- ASCII alphanumeric character.
- 9. Emphasis begins with a delimiter that [can open
- emphasis](#can-open-emphasis) and ends with a delimiter that [can close
- emphasis](#can-close-emphasis), and that uses the same
- character (`_` or `*`) as the opening delimiter. There must
- be a nonempty sequence of inlines between the open delimiter
- and the closing delimiter; these form the contents of the emphasis
- inline.
- 10. Strong emphasis begins with a delimiter that [can open strong
- emphasis](#can-open-strong-emphasis) and ends with a delimiter that
- [can close strong emphasis](#can-close-strong-emphasis), and that
- uses the same character (`_` or `*`) as the opening delimiter.
- There must be a nonempty sequence of inlines between the open
- delimiter and the closing delimiter; these form the contents of
- the strong emphasis inline.
- 11. A literal `*` character cannot occur at the beginning or end of
- `*`-delimited emphasis or `**`-delimited strong emphasis, unless it
- is backslash-escaped.
- 12. A literal `_` character cannot occur at the beginning or end of
- `_`-delimited emphasis or `__`-delimited strong emphasis, unless it
- is backslash-escaped.
- Where rules 1--12 above are compatible with multiple parsings,
- the following principles resolve ambiguity:
- 13. The number of nestings should be minimized. Thus, for example,
- an interpretation `<strong>...</strong>` is always preferred to
- `<em><em>...</em></em>`.
- 14. An interpretation `<strong><em>...</em></strong>` is always
- preferred to `<em><strong>..</strong></em>`.
- 15. When two potential emphasis or strong emphasis spans overlap,
- so that the second begins before the first ends and ends after
- the first ends, the first is preferred. Thus, for example,
- `*foo _bar* baz_` is parsed as `<em>foo _bar</em> baz_` rather
- than `*foo <em>bar* baz</em>`. For the same reason,
- `**foo*bar**` is parsed as `<em><em>foo</em>bar</em>*`
- rather than `<strong>foo*bar</strong>`.
- 16. When there are two potential emphasis or strong emphasis spans
- with the same closing delimiter, the shorter one (the one that
- opens later) is preferred. Thus, for example,
- `**foo **bar baz**` is parsed as `**foo <strong>bar baz</strong>`
- rather than `<strong>foo **bar baz</strong>`.
- 17. Inline code spans, links, images, and HTML tags group more tightly
- than emphasis. So, when there is a choice between an interpretation
- that contains one of these elements and one that does not, the
- former always wins. Thus, for example, `*[foo*](bar)` is
- parsed as `*<a href="bar">foo*</a>` rather than as
- `<em>[foo</em>](bar)`.
- These rules can be illustrated through a series of examples.
- Rule 1:
- .
- *foo bar*
- .
- <p><em>foo bar</em></p>
- .
- This is not emphasis, because the opening `*` is followed by
- whitespace:
- .
- a * foo bar*
- .
- <p>a * foo bar*</p>
- .
- Intraword emphasis with `*` is permitted:
- .
- foo*bar*
- .
- <p>foo<em>bar</em></p>
- .
- .
- 5*6*78
- .
- <p>5<em>6</em>78</p>
- .
- Rule 2:
- .
- _foo bar_
- .
- <p><em>foo bar</em></p>
- .
- This is not emphasis, because the opening `*` is followed by
- whitespace:
- .
- _ foo bar_
- .
- <p>_ foo bar_</p>
- .
- Emphasis with `_` is not allowed inside ASCII words:
- .
- foo_bar_
- .
- <p>foo_bar_</p>
- .
- .
- 5_6_78
- .
- <p>5_6_78</p>
- .
- But it is permitted inside non-ASCII words:
- .
- пристаням_стремятся_
- .
- <p>пристаням<em>стремятся</em></p>
- .
- Rule 3:
- This is not emphasis, because the closing `*` is preceded by
- whitespace:
- .
- *foo bar *
- .
- <p>*foo bar *</p>
- .
- Intraword emphasis with `*` is allowed:
- .
- *foo*bar
- .
- <p><em>foo</em>bar</p>
- .
- Rule 4:
- This is not emphasis, because the closing `_` is preceded by
- whitespace:
- .
- _foo bar _
- .
- <p>_foo bar _</p>
- .
- Intraword emphasis:
- .
- _foo_bar
- .
- <p>_foo_bar</p>
- .
- .
- _пристаням_стремятся
- .
- <p><em>пристаням</em>стремятся</p>
- .
- .
- _foo_bar_baz_
- .
- <p><em>foo_bar_baz</em></p>
- .
- Rule 5:
- .
- **foo bar**
- .
- <p><strong>foo bar</strong></p>
- .
- This is not strong emphasis, because the opening delimiter is
- followed by whitespace:
- .
- ** foo bar**
- .
- <p>** foo bar**</p>
- .
- Intraword strong emphasis with `**` is permitted:
- .
- foo**bar**
- .
- <p>foo<strong>bar</strong></p>
- .
- Rule 6:
- .
- __foo bar__
- .
- <p><strong>foo bar</strong></p>
- .
- This is not strong emphasis, because the opening delimiter is
- followed by whitespace:
- .
- __ foo bar__
- .
- <p>__ foo bar__</p>
- .
- Intraword emphasis examples:
- .
- foo__bar__
- .
- <p>foo__bar__</p>
- .
- .
- 5__6__78
- .
- <p>5__6__78</p>
- .
- .
- пристаням__стремятся__
- .
- <p>пристаням<strong>стремятся</strong></p>
- .
- .
- __foo, __bar__, baz__
- .
- <p><strong>foo, <strong>bar</strong>, baz</strong></p>
- .
- Rule 7:
- This is not strong emphasis, because the closing delimiter is preceded
- by whitespace:
- .
- **foo bar **
- .
- <p>**foo bar **</p>
- .
- (Nor can it be interpreted as an emphasized `*foo bar *`, because of
- Rule 11.)
- Intraword emphasis:
- .
- **foo**bar
- .
- <p><strong>foo</strong>bar</p>
- .
- Rule 8:
- This is not strong emphasis, because the closing delimiter is
- preceded by whitespace:
- .
- __foo bar __
- .
- <p>__foo bar __</p>
- .
- Intraword strong emphasis examples:
- .
- __foo__bar
- .
- <p>__foo__bar</p>
- .
- .
- __пристаням__стремятся
- .
- <p><strong>пристаням</strong>стремятся</p>
- .
- .
- __foo__bar__baz__
- .
- <p><strong>foo__bar__baz</strong></p>
- .
- Rule 9:
- Any nonempty sequence of inline elements can be the contents of an
- emphasized span.
- .
- *foo [bar](/url)*
- .
- <p><em>foo <a href="/url">bar</a></em></p>
- .
- .
- *foo
- bar*
- .
- <p><em>foo
- bar</em></p>
- .
- In particular, emphasis and strong emphasis can be nested
- inside emphasis:
- .
- _foo __bar__ baz_
- .
- <p><em>foo <strong>bar</strong> baz</em></p>
- .
- .
- _foo _bar_ baz_
- .
- <p><em>foo <em>bar</em> baz</em></p>
- .
- .
- __foo_ bar_
- .
- <p><em><em>foo</em> bar</em></p>
- .
- .
- *foo *bar**
- .
- <p><em>foo <em>bar</em></em></p>
- .
- .
- *foo **bar** baz*
- .
- <p><em>foo <strong>bar</strong> baz</em></p>
- .
- But note:
- .
- *foo**bar**baz*
- .
- <p><em>foo</em><em>bar</em><em>baz</em></p>
- .
- The difference is that in the preceding case,
- the internal delimiters [can close emphasis](#can-close-emphasis),
- while in the cases with spaces, they cannot.
- .
- ***foo** bar*
- .
- <p><em><strong>foo</strong> bar</em></p>
- .
- .
- *foo **bar***
- .
- <p><em>foo <strong>bar</strong></em></p>
- .
- Note, however, that in the following case we get no strong
- emphasis, because the opening delimiter is closed by the first
- `*` before `bar`:
- .
- *foo**bar***
- .
- <p><em>foo</em><em>bar</em>**</p>
- .
- Indefinite levels of nesting are possible:
- .
- *foo **bar *baz* bim** bop*
- .
- <p><em>foo <strong>bar <em>baz</em> bim</strong> bop</em></p>
- .
- .
- *foo [*bar*](/url)*
- .
- <p><em>foo <a href="/url"><em>bar</em></a></em></p>
- .
- There can be no empty emphasis or strong emphasis:
- .
- ** is not an empty emphasis
- .
- <p>** is not an empty emphasis</p>
- .
- .
- **** is not an empty strong emphasis
- .
- <p>**** is not an empty strong emphasis</p>
- .
- Rule 10:
- Any nonempty sequence of inline elements can be the contents of an
- strongly emphasized span.
- .
- **foo [bar](/url)**
- .
- <p><strong>foo <a href="/url">bar</a></strong></p>
- .
- .
- **foo
- bar**
- .
- <p><strong>foo
- bar</strong></p>
- .
- In particular, emphasis and strong emphasis can be nested
- inside strong emphasis:
- .
- __foo _bar_ baz__
- .
- <p><strong>foo <em>bar</em> baz</strong></p>
- .
- .
- __foo __bar__ baz__
- .
- <p><strong>foo <strong>bar</strong> baz</strong></p>
- .
- .
- ____foo__ bar__
- .
- <p><strong><strong>foo</strong> bar</strong></p>
- .
- .
- **foo **bar****
- .
- <p><strong>foo <strong>bar</strong></strong></p>
- .
- .
- **foo *bar* baz**
- .
- <p><strong>foo <em>bar</em> baz</strong></p>
- .
- But note:
- .
- **foo*bar*baz**
- .
- <p><em><em>foo</em>bar</em>baz**</p>
- .
- The difference is that in the preceding case,
- the internal delimiters [can close emphasis](#can-close-emphasis),
- while in the cases with spaces, they cannot.
- .
- ***foo* bar**
- .
- <p><strong><em>foo</em> bar</strong></p>
- .
- .
- **foo *bar***
- .
- <p><strong>foo <em>bar</em></strong></p>
- .
- Indefinite levels of nesting are possible:
- .
- **foo *bar **baz**
- bim* bop**
- .
- <p><strong>foo <em>bar <strong>baz</strong>
- bim</em> bop</strong></p>
- .
- .
- **foo [*bar*](/url)**
- .
- <p><strong>foo <a href="/url"><em>bar</em></a></strong></p>
- .
- There can be no empty emphasis or strong emphasis:
- .
- __ is not an empty emphasis
- .
- <p>__ is not an empty emphasis</p>
- .
- .
- ____ is not an empty strong emphasis
- .
- <p>____ is not an empty strong emphasis</p>
- .
- Rule 11:
- .
- foo ***
- .
- <p>foo ***</p>
- .
- .
- foo *\**
- .
- <p>foo <em>*</em></p>
- .
- .
- foo *_*
- .
- <p>foo <em>_</em></p>
- .
- .
- foo *****
- .
- <p>foo *****</p>
- .
- .
- foo **\***
- .
- <p>foo <strong>*</strong></p>
- .
- .
- foo **_**
- .
- <p>foo <strong>_</strong></p>
- .
- Note that when delimiters do not match evenly, Rule 11 determines
- that the excess literal `*` characters will appear outside of the
- emphasis, rather than inside it:
- .
- **foo*
- .
- <p>*<em>foo</em></p>
- .
- .
- *foo**
- .
- <p><em>foo</em>*</p>
- .
- .
- ***foo**
- .
- <p>*<strong>foo</strong></p>
- .
- .
- ****foo*
- .
- <p>***<em>foo</em></p>
- .
- .
- **foo***
- .
- <p><strong>foo</strong>*</p>
- .
- .
- *foo****
- .
- <p><em>foo</em>***</p>
- .
- Rule 12:
- .
- foo ___
- .
- <p>foo ___</p>
- .
- .
- foo _\__
- .
- <p>foo <em>_</em></p>
- .
- .
- foo _*_
- .
- <p>foo <em>*</em></p>
- .
- .
- foo _____
- .
- <p>foo _____</p>
- .
- .
- foo __\___
- .
- <p>foo <strong>_</strong></p>
- .
- .
- foo __*__
- .
- <p>foo <strong>*</strong></p>
- .
- .
- __foo_
- .
- <p>_<em>foo</em></p>
- .
- Note that when delimiters do not match evenly, Rule 12 determines
- that the excess literal `_` characters will appear outside of the
- emphasis, rather than inside it:
- .
- _foo__
- .
- <p><em>foo</em>_</p>
- .
- .
- ___foo__
- .
- <p>_<strong>foo</strong></p>
- .
- .
- ____foo_
- .
- <p>___<em>foo</em></p>
- .
- .
- __foo___
- .
- <p><strong>foo</strong>_</p>
- .
- .
- _foo____
- .
- <p><em>foo</em>___</p>
- .
- Rule 13 implies that if you want emphasis nested directly inside
- emphasis, you must use different delimiters:
- .
- **foo**
- .
- <p><strong>foo</strong></p>
- .
- .
- *_foo_*
- .
- <p><em><em>foo</em></em></p>
- .
- .
- __foo__
- .
- <p><strong>foo</strong></p>
- .
- .
- _*foo*_
- .
- <p><em><em>foo</em></em></p>
- .
- However, strong emphasis within strong emphasisis possible without
- switching delimiters:
- .
- ****foo****
- .
- <p><strong><strong>foo</strong></strong></p>
- .
- .
- ____foo____
- .
- <p><strong><strong>foo</strong></strong></p>
- .
- Rule 13 can be applied to arbitrarily long sequences of
- delimiters:
- .
- ******foo******
- .
- <p><strong><strong><strong>foo</strong></strong></strong></p>
- .
- Rule 14:
- .
- ***foo***
- .
- <p><strong><em>foo</em></strong></p>
- .
- .
- _____foo_____
- .
- <p><strong><strong><em>foo</em></strong></strong></p>
- .
- Rule 15:
- .
- *foo _bar* baz_
- .
- <p><em>foo _bar</em> baz_</p>
- .
- .
- **foo*bar**
- .
- <p><em><em>foo</em>bar</em>*</p>
- .
- Rule 16:
- .
- **foo **bar baz**
- .
- <p>**foo <strong>bar baz</strong></p>
- .
- .
- *foo *bar baz*
- .
- <p>*foo <em>bar baz</em></p>
- .
- Rule 17:
- .
- *[bar*](/url)
- .
- <p>*<a href="/url">bar*</a></p>
- .
- .
- _foo [bar_](/url)
- .
- <p>_foo <a href="/url">bar_</a></p>
- .
- .
- *<img src="foo" title="*"/>
- .
- <p>*<img src="foo" title="*"/></p>
- .
- .
- **<a href="**">
- .
- <p>**<a href="**"></p>
- .
- .
- __<a href="__">
- .
- <p>__<a href="__"></p>
- .
- .
- *a `*`*
- .
- <p><em>a <code>*</code></em></p>
- .
- .
- _a `_`_
- .
- <p><em>a <code>_</code></em></p>
- .
- .
- **a<http://foo.bar?q=**>
- .
- <p>**a<a href="http://foo.bar?q=**">http://foo.bar?q=**</a></p>
- .
- .
- __a<http://foo.bar?q=__>
- .
- <p>__a<a href="http://foo.bar?q=__">http://foo.bar?q=__</a></p>
- .
- ## Links
- A link contains [link text](#link-label) (the visible text),
- a [destination](#destination) (the URI that is the link destination),
- and optionally a [link title](#link-title). There are two basic kinds
- of links in Markdown. In [inline links](#inline-links) the destination
- and title are given immediately after the link text. In [reference
- links](#reference-links) the destination and title are defined elsewhere
- in the document.
- A [link text](@link-text) consists of a sequence of zero or more
- inline elements enclosed by square brackets (`[` and `]`). The
- following rules apply:
- - Links may not contain other links, at any level of nesting.
- - Brackets are allowed in the [link text](#link-text) only if (a) they
- are backslash-escaped or (b) they appear as a matched pair of brackets,
- with an open bracket `[`, a sequence of zero or more inlines, and
- a close bracket `]`.
- - Backtick [code spans](#code-span), [autolinks](#autolink), and
- raw [HTML tags](#html-tag) bind more tightly
- than the brackets in link text. Thus, for example,
- `` [foo`]` `` could not be a link text, since the second `]`
- is part of a code span.
- - The brackets in link text bind more tightly than markers for
- [emphasis and strong emphasis](#emphasis-and-strong-emphasis).
- Thus, for example, `*[foo*](url)` is a link.
- A [link destination](@link-destination) consists of either
- - a sequence of zero or more characters between an opening `<` and a
- closing `>` that contains no line breaks or unescaped `<` or `>`
- characters, or
- - a nonempty sequence of characters that does not include
- ASCII space or control characters, and includes parentheses
- only if (a) they are backslash-escaped or (b) they are part of
- a balanced pair of unescaped parentheses that is not itself
- inside a balanced pair of unescaped paretheses.
- A [link title](@link-title) consists of either
- - a sequence of zero or more characters between straight double-quote
- characters (`"`), including a `"` character only if it is
- backslash-escaped, or
- - a sequence of zero or more characters between straight single-quote
- characters (`'`), including a `'` character only if it is
- backslash-escaped, or
- - a sequence of zero or more characters between matching parentheses
- (`(...)`), including a `)` character only if it is backslash-escaped.
- An [inline link](@inline-link)
- consists of a [link text](#link-text) followed immediately
- by a left parenthesis `(`, optional whitespace,
- an optional [link destination](#link-destination),
- an optional [link title](#link-title) separated from the link
- destination by whitespace, optional whitespace, and a right
- parenthesis `)`. The link's text consists of the inlines contained
- in the [link text](#link-text) (excluding the enclosing square brackets).
- The link's URI consists of the link destination, excluding enclosing
- `<...>` if present, with backslash-escapes in effect as described
- above. The link's title consists of the link title, excluding its
- enclosing delimiters, with backslash-escapes in effect as described
- above.
- Here is a simple inline link:
- .
- [link](/uri "title")
- .
- <p><a href="/uri" title="title">link</a></p>
- .
- The title may be omitted:
- .
- [link](/uri)
- .
- <p><a href="/uri">link</a></p>
- .
- Both the title and the destination may be omitted:
- .
- [link]()
- .
- <p><a href="">link</a></p>
- .
- .
- [link](<>)
- .
- <p><a href="">link</a></p>
- .
- If the destination contains spaces, it must be enclosed in pointy
- braces:
- .
- [link](/my uri)
- .
- <p>[link](/my uri)</p>
- .
- .
- [link](</my uri>)
- .
- <p><a href="/my%20uri">link</a></p>
- .
- The destination cannot contain line breaks, even with pointy braces:
- .
- [link](foo
- bar)
- .
- <p>[link](foo
- bar)</p>
- .
- One level of balanced parentheses is allowed without escaping:
- .
- [link]((foo)and(bar))
- .
- <p><a href="(foo)and(bar)">link</a></p>
- .
- However, if you have parentheses within parentheses, you need to escape
- or use the `<...>` form:
- .
- [link](foo(and(bar)))
- .
- <p>[link](foo(and(bar)))</p>
- .
- .
- [link](foo(and\(bar\)))
- .
- <p><a href="foo(and(bar))">link</a></p>
- .
- .
- [link](<foo(and(bar))>)
- .
- <p><a href="foo(and(bar))">link</a></p>
- .
- Parentheses and other symbols can also be escaped, as usual
- in Markdown:
- .
- [link](foo\)\:)
- .
- <p><a href="foo):">link</a></p>
- .
- URL-escaping should be left alone inside the destination, as all
- URL-escaped characters are also valid URL characters. HTML entities in
- the destination will be parsed into their UTF-8 codepoints, as usual, and
- optionally URL-escaped when written as HTML.
- .
- [link](foo%20bä)
- .
- <p><a href="foo%20b%C3%A4">link</a></p>
- .
- Note that, because titles can often be parsed as destinations,
- if you try to omit the destination and keep the title, you'll
- get unexpected results:
- .
- [link]("title")
- .
- <p><a href="%22title%22">link</a></p>
- .
- Titles may be in single quotes, double quotes, or parentheses:
- .
- [link](/url "title")
- [link](/url 'title')
- [link](/url (title))
- .
- <p><a href="/url" title="title">link</a>
- <a href="/url" title="title">link</a>
- <a href="/url" title="title">link</a></p>
- .
- Backslash escapes and entities may be used in titles:
- .
- [link](/url "title \""")
- .
- <p><a href="/url" title="title """>link</a></p>
- .
- Nested balanced quotes are not allowed without escaping:
- .
- [link](/url "title "and" title")
- .
- <p>[link](/url "title "and" title")</p>
- .
- But it is easy to work around this by using a different quote type:
- .
- [link](/url 'title "and" title')
- .
- <p><a href="/url" title="title "and" title">link</a></p>
- .
- (Note: `Markdown.pl` did allow double quotes inside a double-quoted
- title, and its test suite included a test demonstrating this.
- But it is hard to see a good rationale for the extra complexity this
- brings, since there are already many ways---backslash escaping,
- entities, or using a different quote type for the enclosing title---to
- write titles containing double quotes. `Markdown.pl`'s handling of
- titles has a number of other strange features. For example, it allows
- single-quoted titles in inline links, but not reference links. And, in
- reference links but not inline links, it allows a title to begin with
- `"` and end with `)`. `Markdown.pl` 1.0.1 even allows titles with no closing
- quotation mark, though 1.0.2b8 does not. It seems preferable to adopt
- a simple, rational rule that works the same way in inline links and
- link reference definitions.)
- Whitespace is allowed around the destination and title:
- .
- [link]( /uri
- "title" )
- .
- <p><a href="/uri" title="title">link</a></p>
- .
- But it is not allowed between the link text and the
- following parenthesis:
- .
- [link] (/uri)
- .
- <p>[link] (/uri)</p>
- .
- The link text may contain balanced brackets, but not unbalanced ones,
- unless they are escaped:
- .
- [link [foo [bar]]](/uri)
- .
- <p><a href="/uri">link [foo [bar]]</a></p>
- .
- .
- [link] bar](/uri)
- .
- <p>[link] bar](/uri)</p>
- .
- .
- [link [bar](/uri)
- .
- <p>[link <a href="/uri">bar</a></p>
- .
- .
- [link \[bar](/uri)
- .
- <p><a href="/uri">link [bar</a></p>
- .
- The link text may contain inline content:
- .
- [link *foo **bar** `#`*](/uri)
- .
- <p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p>
- .
- .
- [![moon](moon.jpg)](/uri)
- .
- <p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p>
- .
- However, links may not contain other links, at any level of nesting.
- .
- [foo [bar](/uri)](/uri)
- .
- <p>[foo <a href="/uri">bar</a>](/uri)</p>
- .
- .
- [foo *[bar [baz](/uri)](/uri)*](/uri)
- .
- <p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p>
- .
- These cases illustrate the precedence of link text grouping over
- emphasis grouping:
- .
- *[foo*](/uri)
- .
- <p>*<a href="/uri">foo*</a></p>
- .
- .
- [foo *bar](baz*)
- .
- <p><a href="baz*">foo *bar</a></p>
- .
- These cases illustrate the precedence of HTML tags, code spans,
- and autolinks over link grouping:
- .
- [foo <bar attr="](baz)">
- .
- <p>[foo <bar attr="](baz)"></p>
- .
- .
- [foo`](/uri)`
- .
- <p>[foo<code>](/uri)</code></p>
- .
- .
- [foo<http://example.com?search=](uri)>
- .
- <p>[foo<a href="http://example.com?search=%5D(uri)">http://example.com?search=](uri)</a></p>
- .
- There are three kinds of [reference links](@reference-link):
- [full](#full-reference-link), [collapsed](#collapsed-reference-link),
- and [shortcut](#shortcut-reference-link).
- A [full reference link](@full-reference-link)
- consists of a [link text](#link-text), optional whitespace, and
- a [link label](#link-label) that [matches](#matches) a
- [link reference definition](#link-reference-definition) elsewhere in the
- document.
- A [link label](@link-label) begins with a left bracket (`[`) and ends
- with the first right bracket (`]`) that is not backslash-escaped.
- Unescaped square bracket characters are not allowed in
- [link labels](#link-label). A link label can have at most 999
- characters inside the square brackets.
- One label [matches](@matches)
- another just in case their normalized forms are equal. To normalize a
- label, perform the *unicode case fold* and collapse consecutive internal
- whitespace to a single space. If there are multiple matching reference
- link definitions, the one that comes first in the document is used. (It
- is desirable in such cases to emit a warning.)
- The contents of the first link label are parsed as inlines, which are
- used as the link's text. The link's URI and title are provided by the
- matching [link reference definition](#link-reference-definition).
- Here is a simple example:
- .
- [foo][bar]
- [bar]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- The rules for the [link text](#link-text) are the same as with
- [inline links](#inline-link). Thus:
- The link text may contain balanced brackets, but not unbalanced ones,
- unless they are escaped:
- .
- [link [foo [bar]]][ref]
- [ref]: /uri
- .
- <p><a href="/uri">link [foo [bar]]</a></p>
- .
- .
- [link \[bar][ref]
- [ref]: /uri
- .
- <p><a href="/uri">link [bar</a></p>
- .
- The link text may contain inline content:
- .
- [link *foo **bar** `#`*][ref]
- [ref]: /uri
- .
- <p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p>
- .
- .
- [![moon](moon.jpg)][ref]
- [ref]: /uri
- .
- <p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p>
- .
- However, links may not contain other links, at any level of nesting.
- .
- [foo [bar](/uri)][ref]
- [ref]: /uri
- .
- <p>[foo <a href="/uri">bar</a>]<a href="/uri">ref</a></p>
- .
- .
- [foo *bar [baz][ref]*][ref]
- [ref]: /uri
- .
- <p>[foo <em>bar <a href="/uri">baz</a></em>]<a href="/uri">ref</a></p>
- .
- (In the examples above, we have two [shortcut reference
- links](#shortcut-reference-link) instead of one [full reference
- link](#full-reference-link).)
- The following cases illustrate the precedence of link text grouping over
- emphasis grouping:
- .
- *[foo*][ref]
- [ref]: /uri
- .
- <p>*<a href="/uri">foo*</a></p>
- .
- .
- [foo *bar][ref]
- [ref]: /uri
- .
- <p><a href="/uri">foo *bar</a></p>
- .
- These cases illustrate the precedence of HTML tags, code spans,
- and autolinks over link grouping:
- .
- [foo <bar attr="][ref]">
- [ref]: /uri
- .
- <p>[foo <bar attr="][ref]"></p>
- .
- .
- [foo`][ref]`
- [ref]: /uri
- .
- <p>[foo<code>][ref]</code></p>
- .
- .
- [foo<http://example.com?search=][ref]>
- [ref]: /uri
- .
- <p>[foo<a href="http://example.com?search=%5D%5Bref%5D">http://example.com?search=][ref]</a></p>
- .
- Matching is case-insensitive:
- .
- [foo][BaR]
- [bar]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- Unicode case fold is used:
- .
- [Толпой][Толпой] is a Russian word.
- [ТОЛПОЙ]: /url
- .
- <p><a href="/url">Толпой</a> is a Russian word.</p>
- .
- Consecutive internal whitespace is treated as one space for
- purposes of determining matching:
- .
- [Foo
- bar]: /url
- [Baz][Foo bar]
- .
- <p><a href="/url">Baz</a></p>
- .
- There can be whitespace between the [link text](#link-text) and the
- [link label](#link-label):
- .
- [foo] [bar]
- [bar]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- .
- [foo]
- [bar]
- [bar]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- When there are multiple matching [link reference
- definitions](#link-reference-definition), the first is used:
- .
- [foo]: /url1
- [foo]: /url2
- [bar][foo]
- .
- <p><a href="/url1">bar</a></p>
- .
- Note that matching is performed on normalized strings, not parsed
- inline content. So the following does not match, even though the
- labels define equivalent inline content:
- .
- [bar][foo\!]
- [foo!]: /url
- .
- <p>[bar][foo!]</p>
- .
- [Link labels](#link-label) cannot contain brackets, unless they are
- backslash-escaped:
- .
- [foo][ref[]
- [ref[]: /uri
- .
- <p>[foo][ref[]</p>
- <p>[ref[]: /uri</p>
- .
- .
- [foo][ref[bar]]
- [ref[bar]]: /uri
- .
- <p>[foo][ref[bar]]</p>
- <p>[ref[bar]]: /uri</p>
- .
- .
- [[[foo]]]
- [[[foo]]]: /url
- .
- <p>[[[foo]]]</p>
- <p>[[[foo]]]: /url</p>
- .
- .
- [foo][ref\[]
- [ref\[]: /uri
- .
- <p><a href="/uri">foo</a></p>
- .
- A [collapsed reference link](@collapsed-reference-link)
- consists of a [link
- label](#link-label) that [matches](#matches) a [link reference
- definition](#link-reference-definition) elsewhere in the
- document, optional whitespace, and the string `[]`. The contents of the
- first link label are parsed as inlines, which are used as the link's
- text. The link's URI and title are provided by the matching reference
- link definition. Thus, `[foo][]` is equivalent to `[foo][foo]`.
- .
- [foo][]
- [foo]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- .
- [*foo* bar][]
- [*foo* bar]: /url "title"
- .
- <p><a href="/url" title="title"><em>foo</em> bar</a></p>
- .
- The link labels are case-insensitive:
- .
- [Foo][]
- [foo]: /url "title"
- .
- <p><a href="/url" title="title">Foo</a></p>
- .
- As with full reference links, whitespace is allowed
- between the two sets of brackets:
- .
- [foo]
- []
- [foo]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- A [shortcut reference link](@shortcut-reference-link)
- consists of a [link
- label](#link-label) that [matches](#matches) a [link reference
- definition](#link-reference-definition) elsewhere in the
- document and is not followed by `[]` or a link label.
- The contents of the first link label are parsed as inlines,
- which are used as the link's text. the link's URI and title
- are provided by the matching link reference definition.
- Thus, `[foo]` is equivalent to `[foo][]`.
- .
- [foo]
- [foo]: /url "title"
- .
- <p><a href="/url" title="title">foo</a></p>
- .
- .
- [*foo* bar]
- [*foo* bar]: /url "title"
- .
- <p><a href="/url" title="title"><em>foo</em> bar</a></p>
- .
- .
- [[*foo* bar]]
- [*foo* bar]: /url "title"
- .
- <p>[<a href="/url" title="title"><em>foo</em> bar</a>]</p>
- .
- The link labels are case-insensitive:
- .
- [Foo]
- [foo]: /url "title"
- .
- <p><a href="/url" title="title">Foo</a></p>
- .
- If you just want bracketed text, you can backslash-escape the
- opening bracket to avoid links:
- .
- \[foo]
- [foo]: /url "title"
- .
- <p>[foo]</p>
- .
- Note that this is a link, because a link label ends with the first
- following closing bracket:
- .
- [foo*]: /url
- *[foo*]
- .
- <p>*<a href="/url">foo*</a></p>
- .
- This is a link too, for the same reason:
- .
- [foo`]: /url
- [foo`]`
- .
- <p>[foo<code>]</code></p>
- .
- Full references take precedence over shortcut references:
- .
- [foo][bar]
- [foo]: /url1
- [bar]: /url2
- .
- <p><a href="/url2">foo</a></p>
- .
- In the following case `[bar][baz]` is parsed as a reference,
- `[foo]` as normal text:
- .
- [foo][bar][baz]
- [baz]: /url
- .
- <p>[foo]<a href="/url">bar</a></p>
- .
- Here, though, `[foo][bar]` is parsed as a reference, since
- `[bar]` is defined:
- .
- [foo][bar][baz]
- [baz]: /url1
- [bar]: /url2
- .
- <p><a href="/url2">foo</a><a href="/url1">baz</a></p>
- .
- Here `[foo]` is not parsed as a shortcut reference, because it
- is followed by a link label (even though `[bar]` is not defined):
- .
- [foo][bar][baz]
- [baz]: /url1
- [foo]: /url2
- .
- <p>[foo]<a href="/url1">bar</a></p>
- .
- ## Images
- Syntax for images is like the syntax for links, with one
- difference. Instead of [link text](#link-text), we have an [image
- description](@image-description). The rules for this are the
- same as for [link text](#link-text), except that (a) an
- image description starts with `![` rather than `[`, and
- (b) an image description may contain links.
- An image description has inline elements
- as its contents. When an image is rendered to HTML,
- this is standardly used as the image's `alt` attribute.
- .
- ![foo](/url "title")
- .
- <p><img src="/url" alt="foo" title="title" /></p>
- .
- .
- ![foo *bar*]
- [foo *bar*]: train.jpg "train & tracks"
- .
- <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
- .
- .
- ![foo ![bar](/url)](/url2)
- .
- <p><img src="/url2" alt="foo bar" /></p>
- .
- .
- ![foo [bar](/url)](/url2)
- .
- <p><img src="/url2" alt="foo bar" /></p>
- .
- Though this spec is concerned with parsing, not rendering, it is
- recommended that in rendering to HTML, only the plain string content
- of the [image description](#image-description) be used. Note that in
- the above example, the alt attribute's value is `foo bar`, not `foo
- [bar](/url)` or `foo <a href="/url">bar</a>`. Only the plain string
- content is rendered, without formatting.
- .
- ![foo *bar*][]
- [foo *bar*]: train.jpg "train & tracks"
- .
- <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
- .
- .
- ![foo *bar*][foobar]
- [FOOBAR]: train.jpg "train & tracks"
- .
- <p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p>
- .
- .
- ![foo](train.jpg)
- .
- <p><img src="train.jpg" alt="foo" /></p>
- .
- .
- My ![foo bar](/path/to/train.jpg "title" )
- .
- <p>My <img src="/path/to/train.jpg" alt="foo bar" title="title" /></p>
- .
- .
- ![foo](<url>)
- .
- <p><img src="url" alt="foo" /></p>
- .
- .
- ![](/url)
- .
- <p><img src="/url" alt="" /></p>
- .
- Reference-style:
- .
- ![foo] [bar]
- [bar]: /url
- .
- <p><img src="/url" alt="foo" /></p>
- .
- .
- ![foo] [bar]
- [BAR]: /url
- .
- <p><img src="/url" alt="foo" /></p>
- .
- Collapsed:
- .
- ![foo][]
- [foo]: /url "title"
- .
- <p><img src="/url" alt="foo" title="title" /></p>
- .
- .
- ![*foo* bar][]
- [*foo* bar]: /url "title"
- .
- <p><img src="/url" alt="foo bar" title="title" /></p>
- .
- The labels are case-insensitive:
- .
- ![Foo][]
- [foo]: /url "title"
- .
- <p><img src="/url" alt="Foo" title="title" /></p>
- .
- As with full reference links, whitespace is allowed
- between the two sets of brackets:
- .
- ![foo]
- []
- [foo]: /url "title"
- .
- <p><img src="/url" alt="foo" title="title" /></p>
- .
- Shortcut:
- .
- ![foo]
- [foo]: /url "title"
- .
- <p><img src="/url" alt="foo" title="title" /></p>
- .
- .
- ![*foo* bar]
- [*foo* bar]: /url "title"
- .
- <p><img src="/url" alt="foo bar" title="title" /></p>
- .
- Note that link labels cannot contain unescaped brackets:
- .
- ![[foo]]
- [[foo]]: /url "title"
- .
- <p>![[foo]]</p>
- <p>[[foo]]: /url "title"</p>
- .
- The link labels are case-insensitive:
- .
- ![Foo]
- [foo]: /url "title"
- .
- <p><img src="/url" alt="Foo" title="title" /></p>
- .
- If you just want bracketed text, you can backslash-escape the
- opening `!` and `[`:
- .
- \!\[foo]
- [foo]: /url "title"
- .
- <p>![foo]</p>
- .
- If you want a link after a literal `!`, backslash-escape the
- `!`:
- .
- \![foo]
- [foo]: /url "title"
- .
- <p>!<a href="/url" title="title">foo</a></p>
- .
- ## Autolinks
- [Autolinks](@autolink) are absolute URIs and email addresses inside `<` and `>`.
- They are parsed as links, with the URL or email address as the link
- label.
- A [URI autolink](@uri-autolink)
- consists of `<`, followed by an [absolute
- URI](#absolute-uri) not containing `<`, followed by `>`. It is parsed
- as a link to the URI, with the URI as the link's label.
- An [absolute URI](@absolute-uri),
- for these purposes, consists of a [scheme](#scheme) followed by a colon (`:`)
- followed by zero or more characters other than ASCII whitespace and
- control characters, `<`, and `>`. If the URI includes these characters,
- you must use percent-encoding (e.g. `%20` for a space).
- The following [schemes](@scheme)
- are recognized (case-insensitive):
- `coap`, `doi`, `javascript`, `aaa`, `aaas`, `about`, `acap`, `cap`,
- `cid`, `crid`, `data`, `dav`, `dict`, `dns`, `file`, `ftp`, `geo`, `go`,
- `gopher`, `h323`, `http`, `https`, `iax`, `icap`, `im`, `imap`, `info`,
- `ipp`, `iris`, `iris.beep`, `iris.xpc`, `iris.xpcs`, `iris.lwz`, `ldap`,
- `mailto`, `mid`, `msrp`, `msrps`, `mtqp`, `mupdate`, `news`, `nfs`,
- `ni`, `nih`, `nntp`, `opaquelocktoken`, `pop`, `pres`, `rtsp`,
- `service`, `session`, `shttp`, `sieve`, `sip`, `sips`, `sms`, `snmp`,`
- soap.beep`, `soap.beeps`, `tag`, `tel`, `telnet`, `tftp`, `thismessage`,
- `tn3270`, `tip`, `tv`, `urn`, `vemmi`, `ws`, `wss`, `xcon`,
- `xcon-userid`, `xmlrpc.beep`, `xmlrpc.beeps`, `xmpp`, `z39.50r`,
- `z39.50s`, `adiumxtra`, `afp`, `afs`, `aim`, `apt`,` attachment`, `aw`,
- `beshare`, `bitcoin`, `bolo`, `callto`, `chrome`,` chrome-extension`,
- `com-eventbrite-attendee`, `content`, `cvs`,` dlna-playsingle`,
- `dlna-playcontainer`, `dtn`, `dvb`, `ed2k`, `facetime`, `feed`,
- `finger`, `fish`, `gg`, `git`, `gizmoproject`, `gtalk`, `hcp`, `icon`,
- `ipn`, `irc`, `irc6`, `ircs`, `itms`, `jar`, `jms`, `keyparc`, `lastfm`,
- `ldaps`, `magnet`, `maps`, `market`,` message`, `mms`, `ms-help`,
- `msnim`, `mumble`, `mvn`, `notes`, `oid`, `palm`, `paparazzi`,
- `platform`, `proxy`, `psyc`, `query`, `res`, `resource`, `rmi`, `rsync`,
- `rtmp`, `secondlife`, `sftp`, `sgn`, `skype`, `smb`, `soldat`,
- `spotify`, `ssh`, `steam`, `svn`, `teamspeak`, `things`, `udp`,
- `unreal`, `ut2004`, `ventrilo`, `view-source`, `webcal`, `wtai`,
- `wyciwyg`, `xfire`, `xri`, `ymsgr`.
- Here are some valid autolinks:
- .
- <http://foo.bar.baz>
- .
- <p><a href="http://foo.bar.baz">http://foo.bar.baz</a></p>
- .
- .
- <http://foo.bar.baz?q=hello&id=22&boolean>
- .
- <p><a href="http://foo.bar.baz?q=hello&id=22&boolean">http://foo.bar.baz?q=hello&id=22&boolean</a></p>
- .
- .
- <irc://foo.bar:2233/baz>
- .
- <p><a href="irc://foo.bar:2233/baz">irc://foo.bar:2233/baz</a></p>
- .
- Uppercase is also fine:
- .
- <MAILTO:FOO@BAR.BAZ>
- .
- <p><a href="MAILTO:FOO@BAR.BAZ">MAILTO:FOO@BAR.BAZ</a></p>
- .
- Spaces are not allowed in autolinks:
- .
- <http://foo.bar/baz bim>
- .
- <p><http://foo.bar/baz bim></p>
- .
- An [email autolink](@email-autolink)
- consists of `<`, followed by an [email address](#email-address),
- followed by `>`. The link's label is the email address,
- and the URL is `mailto:` followed by the email address.
- An [email address](@email-address),
- for these purposes, is anything that matches
- the [non-normative regex from the HTML5
- spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#e-mail-state-%28type=email%29):
- /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
- (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
- Examples of email autolinks:
- .
- <foo@bar.example.com>
- .
- <p><a href="mailto:foo@bar.example.com">foo@bar.example.com</a></p>
- .
- .
- <foo+special@Bar.baz-bar0.com>
- .
- <p><a href="mailto:foo+special@Bar.baz-bar0.com">foo+special@Bar.baz-bar0.com</a></p>
- .
- These are not autolinks:
- .
- <>
- .
- <p><></p>
- .
- .
- <heck://bing.bong>
- .
- <p><heck://bing.bong></p>
- .
- .
- < http://foo.bar >
- .
- <p>< http://foo.bar ></p>
- .
- .
- <foo.bar.baz>
- .
- <p><foo.bar.baz></p>
- .
- .
- <localhost:5001/foo>
- .
- <p><localhost:5001/foo></p>
- .
- .
- http://example.com
- .
- <p>http://example.com</p>
- .
- .
- foo@bar.example.com
- .
- <p>foo@bar.example.com</p>
- .
- ## Raw HTML
- Text between `<` and `>` that looks like an HTML tag is parsed as a
- raw HTML tag and will be rendered in HTML without escaping.
- Tag and attribute names are not limited to current HTML tags,
- so custom tags (and even, say, DocBook tags) may be used.
- Here is the grammar for tags:
- A [tag name](@tag-name) consists of an ASCII letter
- followed by zero or more ASCII letters or digits.
- An [attribute](@attribute) consists of whitespace,
- an [attribute name](#attribute-name), and an optional
- [attribute value specification](#attribute-value-specification).
- An [attribute name](@attribute-name)
- consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII
- letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML
- specification restricted to ASCII. HTML5 is laxer.)
- An [attribute value specification](@attribute-value-specification)
- consists of optional whitespace,
- a `=` character, optional whitespace, and an [attribute
- value](#attribute-value).
- An [attribute value](@attribute-value)
- consists of an [unquoted attribute value](#unquoted-attribute-value),
- a [single-quoted attribute value](#single-quoted-attribute-value),
- or a [double-quoted attribute value](#double-quoted-attribute-value).
- An [unquoted attribute value](@unquoted-attribute-value)
- is a nonempty string of characters not
- including spaces, `"`, `'`, `=`, `<`, `>`, or `` ` ``.
- A [single-quoted attribute value](@single-quoted-attribute-value)
- consists of `'`, zero or more
- characters not including `'`, and a final `'`.
- A [double-quoted attribute value](@double-quoted-attribute-value)
- consists of `"`, zero or more
- characters not including `"`, and a final `"`.
- An [open tag](@open-tag) consists of a `<` character,
- a [tag name](#tag-name), zero or more [attributes](#attribute),
- optional whitespace, an optional `/` character, and a `>` character.
- A [closing tag](@closing-tag) consists of the
- string `</`, a [tag name](#tag-name), optional whitespace, and the
- character `>`.
- An [HTML comment](@html-comment) consists of the
- string `<!--`, a string of characters not including the string `--`, and
- the string `-->`.
- A [processing instruction](@processing-instruction)
- consists of the string `<?`, a string
- of characters not including the string `?>`, and the string
- `?>`.
- A [declaration](@declaration) consists of the
- string `<!`, a name consisting of one or more uppercase ASCII letters,
- whitespace, a string of characters not including the character `>`, and
- the character `>`.
- A [CDATA section](@cdata-section) consists of
- the string `<![CDATA[`, a string of characters not including the string
- `]]>`, and the string `]]>`.
- An [HTML tag](@html-tag) consists of an [open
- tag](#open-tag), a [closing tag](#closing-tag), an [HTML
- comment](#html-comment), a [processing
- instruction](#processing-instruction), an [element type
- declaration](#element-type-declaration), or a [CDATA
- section](#cdata-section).
- Here are some simple open tags:
- .
- <a><bab><c2c>
- .
- <p><a><bab><c2c></p>
- .
- Empty elements:
- .
- <a/><b2/>
- .
- <p><a/><b2/></p>
- .
- Whitespace is allowed:
- .
- <a /><b2
- data="foo" >
- .
- <p><a /><b2
- data="foo" ></p>
- .
- With attributes:
- .
- <a foo="bar" bam = 'baz <em>"</em>'
- _boolean zoop:33=zoop:33 />
- .
- <p><a foo="bar" bam = 'baz <em>"</em>'
- _boolean zoop:33=zoop:33 /></p>
- .
- Illegal tag names, not parsed as HTML:
- .
- <33> <__>
- .
- <p><33> <__></p>
- .
- Illegal attribute names:
- .
- <a h*#ref="hi">
- .
- <p><a h*#ref="hi"></p>
- .
- Illegal attribute values:
- .
- <a href="hi'> <a href=hi'>
- .
- <p><a href="hi'> <a href=hi'></p>
- .
- Illegal whitespace:
- .
- < a><
- foo><bar/ >
- .
- <p>< a><
- foo><bar/ ></p>
- .
- Missing whitespace:
- .
- <a href='bar'title=title>
- .
- <p><a href='bar'title=title></p>
- .
- Closing tags:
- .
- </a>
- </foo >
- .
- <p></a>
- </foo ></p>
- .
- Illegal attributes in closing tag:
- .
- </a href="foo">
- .
- <p></a href="foo"></p>
- .
- Comments:
- .
- foo <!-- this is a
- comment - with hyphen -->
- .
- <p>foo <!-- this is a
- comment - with hyphen --></p>
- .
- .
- foo <!-- not a comment -- two hyphens -->
- .
- <p>foo <!-- not a comment -- two hyphens --></p>
- .
- Processing instructions:
- .
- foo <?php echo $a; ?>
- .
- <p>foo <?php echo $a; ?></p>
- .
- Declarations:
- .
- foo <!ELEMENT br EMPTY>
- .
- <p>foo <!ELEMENT br EMPTY></p>
- .
- CDATA sections:
- .
- foo <![CDATA[>&<]]>
- .
- <p>foo <![CDATA[>&<]]></p>
- .
- Entities are preserved in HTML attributes:
- .
- <a href="ö">
- .
- <p><a href="ö"></p>
- .
- Backslash escapes do not work in HTML attributes:
- .
- <a href="\*">
- .
- <p><a href="\*"></p>
- .
- .
- <a href="\"">
- .
- <p><a href="""></p>
- .
- ## Hard line breaks
- A line break (not in a code span or HTML tag) that is preceded
- by two or more spaces and does not occur at the end of a block
- is parsed as a [hard line break](@hard-line-break) (rendered
- in HTML as a `<br />` tag):
- .
- foo
- baz
- .
- <p>foo<br />
- baz</p>
- .
- For a more visible alternative, a backslash before the newline may be
- used instead of two spaces:
- .
- foo\
- baz
- .
- <p>foo<br />
- baz</p>
- .
- More than two spaces can be used:
- .
- foo
- baz
- .
- <p>foo<br />
- baz</p>
- .
- Leading spaces at the beginning of the next line are ignored:
- .
- foo
- bar
- .
- <p>foo<br />
- bar</p>
- .
- .
- foo\
- bar
- .
- <p>foo<br />
- bar</p>
- .
- Line breaks can occur inside emphasis, links, and other constructs
- that allow inline content:
- .
- *foo
- bar*
- .
- <p><em>foo<br />
- bar</em></p>
- .
- .
- *foo\
- bar*
- .
- <p><em>foo<br />
- bar</em></p>
- .
- Line breaks do not occur inside code spans
- .
- `code
- span`
- .
- <p><code>code span</code></p>
- .
- .
- `code\
- span`
- .
- <p><code>code\ span</code></p>
- .
- or HTML tags:
- .
- <a href="foo
- bar">
- .
- <p><a href="foo
- bar"></p>
- .
- .
- <a href="foo\
- bar">
- .
- <p><a href="foo\
- bar"></p>
- .
- Hard line breaks are for separating inline content within a block.
- Neither syntax for hard line breaks works at the end of a paragraph or
- other block element:
- .
- foo\
- .
- <p>foo\</p>
- .
- .
- foo
- .
- <p>foo</p>
- .
- .
- ### foo\
- .
- <h3>foo\</h3>
- .
- .
- ### foo
- .
- <h3>foo</h3>
- .
- ## Soft line breaks
- A regular line break (not in a code span or HTML tag) that is not
- preceded by two or more spaces is parsed as a softbreak. (A
- softbreak may be rendered in HTML either as a newline or as a space.
- The result will be the same in browsers. In the examples here, a
- newline will be used.)
- .
- foo
- baz
- .
- <p>foo
- baz</p>
- .
- Spaces at the end of the line and beginning of the next line are
- removed:
- .
- foo
- baz
- .
- <p>foo
- baz</p>
- .
- A conforming parser may render a soft line break in HTML either as a
- line break or as a space.
- A renderer may also provide an option to render soft line breaks
- as hard line breaks.
- ## Strings
- Any characters not given an interpretation by the above rules will
- be parsed as string content.
- .
- hello $.;'there
- .
- <p>hello $.;'there</p>
- .
- .
- Foo χρῆν
- .
- <p>Foo χρῆν</p>
- .
- Internal spaces are preserved verbatim:
- .
- Multiple spaces
- .
- <p>Multiple spaces</p>
- .
- <!-- END TESTS -->
- # Appendix A: A parsing strategy {-}
- ## Overview {-}
- Parsing has two phases:
- 1. In the first phase, lines of input are consumed and the block
- structure of the document---its division into paragraphs, block quotes,
- list items, and so on---is constructed. Text is assigned to these
- blocks but not parsed. Link reference definitions are parsed and a
- map of links is constructed.
- 2. In the second phase, the raw text contents of paragraphs and headers
- are parsed into sequences of Markdown inline elements (strings,
- code spans, links, emphasis, and so on), using the map of link
- references constructed in phase 1.
- ## The document tree {-}
- At each point in processing, the document is represented as a tree of
- **blocks**. The root of the tree is a `document` block. The `document`
- may have any number of other blocks as **children**. These children
- may, in turn, have other blocks as children. The last child of a block
- is normally considered **open**, meaning that subsequent lines of input
- can alter its contents. (Blocks that are not open are **closed**.)
- Here, for example, is a possible document tree, with the open blocks
- marked by arrows:
- ``` tree
- -> document
- -> block_quote
- paragraph
- "Lorem ipsum dolor\nsit amet."
- -> list (type=bullet tight=true bullet_char=-)
- list_item
- paragraph
- "Qui *quodsi iracundia*"
- -> list_item
- -> paragraph
- "aliquando id"
- ```
- ## How source lines alter the document tree {-}
- Each line that is processed has an effect on this tree. The line is
- analyzed and, depending on its contents, the document may be altered
- in one or more of the following ways:
- 1. One or more open blocks may be closed.
- 2. One or more new blocks may be created as children of the
- last open block.
- 3. Text may be added to the last (deepest) open block remaining
- on the tree.
- Once a line has been incorporated into the tree in this way,
- it can be discarded, so input can be read in a stream.
- We can see how this works by considering how the tree above is
- generated by four lines of Markdown:
- ``` markdown
- > Lorem ipsum dolor
- sit amet.
- > - Qui *quodsi iracundia*
- > - aliquando id
- ```
- At the outset, our document model is just
- ``` tree
- -> document
- ```
- The first line of our text,
- ``` markdown
- > Lorem ipsum dolor
- ```
- causes a `block_quote` block to be created as a child of our
- open `document` block, and a `paragraph` block as a child of
- the `block_quote`. Then the text is added to the last open
- block, the `paragraph`:
- ``` tree
- -> document
- -> block_quote
- -> paragraph
- "Lorem ipsum dolor"
- ```
- The next line,
- ``` markdown
- sit amet.
- ```
- is a "lazy continuation" of the open `paragraph`, so it gets added
- to the paragraph's text:
- ``` tree
- -> document
- -> block_quote
- -> paragraph
- "Lorem ipsum dolor\nsit amet."
- ```
- The third line,
- ``` markdown
- > - Qui *quodsi iracundia*
- ```
- causes the `paragraph` block to be closed, and a new `list` block
- opened as a child of the `block_quote`. A `list_item` is also
- added as a child of the `list`, and a `paragraph` as a child of
- the `list_item`. The text is then added to the new `paragraph`:
- ``` tree
- -> document
- -> block_quote
- paragraph
- "Lorem ipsum dolor\nsit amet."
- -> list (type=bullet tight=true bullet_char=-)
- -> list_item
- -> paragraph
- "Qui *quodsi iracundia*"
- ```
- The fourth line,
- ``` markdown
- > - aliquando id
- ```
- causes the `list_item` (and its child the `paragraph`) to be closed,
- and a new `list_item` opened up as child of the `list`. A `paragraph`
- is added as a child of the new `list_item`, to contain the text.
- We thus obtain the final tree:
- ``` tree
- -> document
- -> block_quote
- paragraph
- "Lorem ipsum dolor\nsit amet."
- -> list (type=bullet tight=true bullet_char=-)
- list_item
- paragraph
- "Qui *quodsi iracundia*"
- -> list_item
- -> paragraph
- "aliquando id"
- ```
- ## From block structure to the final document {-}
- Once all of the input has been parsed, all open blocks are closed.
- We then "walk the tree," visiting every node, and parse raw
- string contents of paragraphs and headers as inlines. At this
- point we have seen all the link reference definitions, so we can
- resolve reference links as we go.
- ``` tree
- document
- block_quote
- paragraph
- str "Lorem ipsum dolor"
- softbreak
- str "sit amet."
- list (type=bullet tight=true bullet_char=-)
- list_item
- paragraph
- str "Qui "
- emph
- str "quodsi iracundia"
- list_item
- paragraph
- str "aliquando id"
- ```
- Notice how the newline in the first paragraph has been parsed as
- a `softbreak`, and the asterisks in the first list item have become
- an `emph`.
- The document can be rendered as HTML, or in any other format, given
- an appropriate renderer.
|