*FIXME: introduce extensions and extension sets (dialects)* *FIXME: Pick Commonmark as reference dialect* ## Syntax of Markdown dialect Commonmark Markdown consists of blocks of content, optionally prepended a set of YAML-formatted Metadata blocks. Visually, this can be described using a syntax diagram where the possible order of elements are laid out like trains on rails, as seen in @fig-def-Markdown. ![Syntax of `Markdown`.](def_Markdown.svg){#fig-def-Markdown} Here is an example: ```markdown --- author: Jonas Smedegaard --- # Greeting Hello, world! ``` This example involves the syntax for the block types `Header` and `Paragraph` (and for `MetaBlock` which will not be covered here). `Header` consists of space-delimited words followed a line break, and `Paragraph` consists of lines of space-delimited words followed by two or more line breaks. See @fig-def-blocks for syntax diagrams for those block types. ::: {#fig-def-blocks} ![](def_Block.svg) ![](def_Header.svg) ![](def_Paragraph.svg) Syntax of `Block`, `Header` and `Paragraph`. ::: Reading order matters: The syntax diagrams should be read left-to-right and top-to-bottom, also at places with choice -- e.g. the block type `Header` should be tried before `Paragraph`. Otherwise if `Paragraph` syntax was parsed first, then it would match both blocks because that block type begins with any words, including the characters defitive for the `Header` block type. In other words, these syntax diagrams do *not* represent the more common EBNF grammars but instead a parsing expression grammar [@Ford2004], chosen because context-free grammars are unlikely to be able to cover Markdown [@MacFarlane2014]. The PEG grammar covering all syntax diagrams shown here is included as [Appendix @sec-def-peg]. Words are sets of printable characters (including punctuation and other printable characters). They can be styled, have a hyperlink annotated and have CSS structure and styling annotated. Each set can contain each other, or a set of plain words. Se @fig-def-words for their syntax diagrams. ::: {#fig-def-words} ![](def_StyledWords.svg) ![](def_LinkedWords.svg) ![](def_AnnotatedWords.svg) ![](def_PlainWords.svg) Syntax of `StyledWords`, `LinkedWords`, `AnnotatedWords` and `PlainWords`. ::: Other content blocks and inline types exist but are omitted in this description, which is limited to the comonents affected by extending the Markdown language with additional types of annotation. Syntax diagrams for some additional Markdown components are included as [Appendix @sec-def-dia]. ## Syntax of extension Semantic Markdown Semantic Markdown mainly extends the syntax for `AnnotatedWords`, and introduces a new syntax similar to `LinkDefinition`. The syntax drawings in this subsection has the extended syntax marked with a dotted frame. *FIXME: add dot-frame for all drawings used here* `AnnotatedWords` can in principle contain any word, but in practice expects CSS id or class definitions, which means alphanumeric-only words prefixed by either dot or hash. New higher prioritized syntaxes are added that should not clash with these, for URI and CURIE words, as in @fig-def-extensions. *FIXME: mention and draw extended LinkedWordsX as well.* ::: {#fig-def-extensions} ![](def_AnnotatedWordsX.svg) Syntax of `AnnotatedWords` and `LinkedWords`, extended with `SemWords`. ::: The new `SemWords` are components in the RDF language, which is described further in @sec-rdf either an angle-bracketed `Uri` or a `CURIE`. Each component has an optional prefix to denote whether it is an RDF subject, predicate or object. (Again, these RDF terms are described further in @sec-rdf). See @fig-def-additions for their syntax diagrams. *FIXME: mention and draw `Curie` and `NAME`* ::: {#fig-def-additions} ![](def_SemWords.svg) ![](def_SEMPREFIX.svg) Syntax of `SemWords`, `Curie`, `SEMPREFIX` and `NAME`. ::: ## Expectations of processors *FIXME: write this!* * hvad skal med * hvad skal ikke med * hvis PDF..., hvad så? Dette afsnit udgør "requirements" ### Readability In the source format of Markdown with with annotations... * the syntax for annotation **should** be relatively human comprehensible (in the same spirit as Markdown -- e.g. \*\*strong emphasis\*\*) * annotation syntax **must not** conflict with core Markdown (i.e. must not cause disambiguations) * annotation syntaxt **should not** conflict with other Markdown extensions For syntactically correct and structurally supported annotations... * visual output **must** be identical to source without the annotation * metadata of output **must** contain the annotation For syntactically incorrect or structurally unsupported annotations... * the annotation **must not** disappear from visual output * visual output **should** include the annotation in source form ### XMP, RDFa and RDF {#sec-rdf} *FIXME: drop unneeded details, and more clearly begin with HTML and PDF already using RDF* RDF is an abstract data model for knowledge graphs, usable for domain-specific annotations: Terminology for a domain is established by referencing a shared ontology, and terms are composed as sets of subject-predicate-object triples. RDF includes one language, Turtle, strives to be human readable, and languages for embedding triples into other data structures, notably XMP for PDF files and RDFa for HTML. RDF is an abstract data model for knowledge graphs. Multiple RDF languages exist, each covering all or subsets of the RDF model, including human readability optimized Turtle, RDFa for HTML embedding and XMP for PDF embedding. Each RDF language have different constraints, e.g. the XMP language for storing RDF in media files can express express one RDF graph in each XMP object [@Adobe2012, p. 9]. *FIXME: describe terms URI, CURIE, subject, predicate and object*