This chapter will provide and analysis
of the data format Markdown
and the Markdown-based publishing system Quarto.

This project mainly involves navigating in and altering data structures.
Main data structures are the document formats Markdown, HTML and PDF,
and the abstract data language RDF,
serialised as RDFa (embedded in HTML) and PDF (embedded in PDF).

## Markdown

Markdown is a text markup language
with an emphasis on being easy for humans to read
[@Gruber2004].

Compared to word processors like Microsoft Word and LibreOffice Writer,
Markdown authoring stores both content and markup together
in a human-readable tekst file.

::: {#fig-formality}

```
informal        /---------formatted text----------\        formal
<------v-------------v-------------v-----------------------v---->
 plain text     informal markup   formal markup    binary format
                (Markdown)        (HTML, XML, etc.)
```

Markdown is informal, ASCII-based markup
[@Leonard2016, p. 4]

:::

HTML is itself a plaintext format,
but is less human-readable.
Similarly the format LaTeX is also plaintext,
but its markdown arguably distracts the reading process
[@Mailund2019chap2, p. 9].

### Alternatives

Other human-readable document source formats exists.

*TODO: briefly cover reStructuredText, Org-mode and AsciiDoc.*

### Integration

Markdown is in widespread use.

Major source forges use Markdown by default for `README` files
[@Github2025; @GitLab2025; @Codeberg2024].
Some major programming languages
natively support Markdown in embedded docstrings
in core tools
[@Microsoft2023; @Oracle2025; @RustTeam2024];
others offer optional support e.g. through plugins
[@Heesch2025; @Sphinx2025; @JSDoc2023].

## Quarto

Collection of interrelated POSIX scripts and Pandoc extensions
for enabling semantic annotations in Markdown-based authoring workflows.

* filter extension to capture annotations
  * identify semantic metadata in stylistic metadata part of Pandoc YAML header
  * identify semantic metadata in content part of Pandoc document structure
  * append semantic metadata to Pandoc YAML document header
  * strip identified metadata from stylistic metadata and content
* output format extension to generate PDF
  * read semantic metadata from Pandoc YAML document header
  * structure semantic metadata as RDF triples
  * append RDF triples serialized as part of XMP metadata in PDF
* output format extension to generate web page
  * read semantic metadata from Pandoc YAML document header
  * structure semantic metadata as RDF triples
  * append RDF triples serialized as RDFa

### Interfaces

* Pandoc document object model (DOM)
* Resource Description Framework (RDF)
  * XMP
  * RDFa
* Markdown
  * Semantic Markdown
* CommonMark
  * Semantic CommonMark