aboutsummaryrefslogtreecommitdiff
path: root/_background.qmd
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-05-19 15:00:20 +0200
committerJonas Smedegaard <dr@jones.dk>2025-05-19 15:00:20 +0200
commita75d2ccb355e3fb489586ccaf82d780ddaeab0a7 (patch)
treee8bddd54403a397fcd04a0abcdb510ac158806de /_background.qmd
parent651fcfe553442965efbaec17891c4d205e0607a5 (diff)
rename chapter background -> pandoc; expand title for section on usage
Diffstat (limited to '_background.qmd')
-rw-r--r--_background.qmd214
1 files changed, 0 insertions, 214 deletions
diff --git a/_background.qmd b/_background.qmd
deleted file mode 100644
index 0116286..0000000
--- a/_background.qmd
+++ /dev/null
@@ -1,214 +0,0 @@
-*FIXME: This chapter is unfinished --
-currently contains 3-4 large chunks (separated by horisontal lines)
-that need to be merged or maybe some parts dropped altogether...*
-
-This chapter will provide and analysis
-of the data format Markdown
-and the Markdown-based publishing system Quarto.
-
-This project mainly involves navigating in and altering data structures.
-Main data structures are the document formats Markdown, HTML and PDF,
-and the abstract data language RDF,
-serialised as RDFa (embedded in HTML) and PDF (embedded in PDF).
-
-## Markdown
-
-### Structural and layout annotation, and metadata
-
-Original Markdown provides unobtrusive markup
-for content and hypermedia structure,
-to ease the authoring of style-agnostic hypermedia content.
-Later dialects extends the language
-to cover more content and hypermedia structure,
-style annotation
-and text-wide metadata.
-
-The separation of visual concerns from content and structure
-is harnessed by the document converter Pandoc
-and the Pandoc-based document authoring framework Quarto:
-Pandoc with Quarto plugins and templates
-allows annotating a string as a hyperlink or a citation,
-declaring authorship, ownership and release date,
-and rendering as a scholarly paper
-conforming to a prescribed style guide and document format.
-
-### Semantic annotation is missing
-
-None of the existing Markdown dialects,
-however,
-covers annotation of content semantics.
-You cannot -- using existing Markdown dialects --
-annotate a string as contextually related to some content domain,
-in a way that Markdown processors will treat it as such:
-When rendering an output document
-the annotation is omitted from the text
-and optionally accessible as part of document metadata.
-
-Example annotations might include
-some numbers in meter and others in nautical miles,
-or one citation being supportive and another a rebuttal,
-or one quote using "she" as personal pronoun
-and another using it derogatory.
-
-Such meta information tied not to the document as a whole
-but to specific strings in the text
-cannot be written as such --
-i.e. structurally part of the writing
-but communicatively meta to the prose content of the text.
-
----
-
-Markdown is "probably the most popular markup language today"
-[@Rapp2023, p. 42].
-It was originally defined by @Gruber2004
-as a superset of HTML,
-improving readability and ease of writing
-by adding email-style markup
-for common content structure like headers, emphasis, lists and hyperlinks.
-
-A core principle of Markdown is readability:
-
-> A Markdown-formatted document should be publishable as-is,
-> as plain text,
-> without looking like it’s been marked up
-> with tags or formatting instructions..
-> [@Gruber2004, section "Philosophy"].
-
-Many dialects of Markdown have evolved,
-some tightening the language for parsing efficiency and disambiguation,
-some extending to cover additional structures
-and some including support for a YAML or TOML metadata header section.
-
-Markdown as originally designed is a source format to produce HTML.
-If using only Markdown-defined markup, avoiding HTML tags,
-the text is however reliably translatable also to other formats.
-Pandoc is a tool that can convert texts in Markdown dialects
-into many document formats including HTML and (via LaTeX) PDF,
-applying visual style and positioning throught templates.
-Such document workflows,
-including minimal structural markup as part of the creative writing,
-applying visual layout as an automated templating process
-tied to a target document format,
-has been further streamlined for academic texts
-in the Quarto document publishing system.
-
-----
-
-Markdown is a text markup language
-with an emphasis on being easy for humans to read
-[@Gruber2004].
-
-Compared to word processors like Microsoft Word and LibreOffice Writer,
-Markdown authoring stores both content and markup together
-in a human-readable tekst file.
-
-::: {#fig-formality}
-
-```
-informal /---------formatted text----------\ formal
-<------v-------------v-------------v-----------------------v---->
- plain text informal markup formal markup binary format
- (Markdown) (HTML, XML, etc.)
-```
-
-Markdown is informal, ASCII-based markup
-[@Leonard2016, p. 4]
-
-:::
-
-HTML is itself a plaintext format,
-but is less human-readable.
-Similarly the format LaTeX is also plaintext,
-but its markdown arguably distracts the reading process
-[@Mailund2019chap2, p. 9].
-
-### Alternatives
-
-Other human-readable document source formats exists.
-
-*TODO: briefly cover reStructuredText, Org-mode and AsciiDoc.*
-
-### Integration
-
-Markdown is in widespread use.
-
-Major source forges use Markdown by default for `README` files
-[@Github2025; @GitLab2025; @Codeberg2024].
-Some major programming languages
-natively support Markdown in embedded docstrings
-in core tools
-[@Microsoft2023; @Oracle2025; @RustTeam2024];
-others offer optional support e.g. through plugins
-[@Heesch2025; @Sphinx2025; @JSDoc2023].
-
-## Pandoc and Quarto
-
-The Markdown processor Pandoc can transform Markdown not only to HTML
-but also to other output formats like PDF.
-Pandoc offers an API for adapting its content processing
-as well as a templating structure for customizing layout,
-which is streamlined in the document authoring framework Quarto:
-Pandoc with a set of plugins and templates
-enables rendering of scholarly papers
-conforming to prescribed style guides and document formats.
-
----
-
-Pandoc is a document converter built around the markdown markup language,
-able to parse from and serialise to many Markdown dialects
-as well as equivalent subsets of other text markup languages
-including HTML, LaTeX (and by extension PDF),
-Office Open XML (as used by recent releases of Microsoft Word)
-and OpenDocument (as used by OpenOffice and LibreOffice).
-Pandoc is extensible,
-supporting custom code loaded at runtime
-either for custom parsing or serialising,
-or for manipulating the intermediate internal content structure
-called Abstract Syntax Tree (AST).
-
-Pandoc supports redefining input and output formats
-and manipulating the internal document structure
-as part of the automated parts of the framework.
-
-Pandoc is extendable.
-Source and output format can be changed or completely redefined
-and the internal document structure manipulated,
-in the automated parts of the framework.
-
-Collection of interrelated POSIX scripts and Pandoc extensions
-for enabling semantic annotations in Markdown-based authoring workflows.
-
-* filter extension to capture annotations
- * identify semantic metadata in stylistic metadata part of Pandoc YAML header
- * identify semantic metadata in content part of Pandoc document structure
- * append semantic metadata to Pandoc YAML document header
- * strip identified metadata from stylistic metadata and content
-* output format extension to generate PDF
- * read semantic metadata from Pandoc YAML document header
- * structure semantic metadata as RDF triples
- * append RDF triples serialized as part of XMP metadata in PDF
-* output format extension to generate web page
- * read semantic metadata from Pandoc YAML document header
- * structure semantic metadata as RDF triples
- * append RDF triples serialized as RDFa
-
-Markdown provides intuitive and unobtrusive markup syntax
-for structure like headers, emphasis, lists and hyperlinks.
-Pandoc extends Markdown with syntax
-for citation annotation
-and an optional YAML metadata header.
-Quarto extends Markdown further with syntax
-for some styling and some convenience macros,
-and applies templates for a uniform visual styling
-across target document formats.
-
-### Interfaces
-
-* Pandoc document object model (DOM)
-* Resource Description Framework (RDF)
- * XMP
- * RDFa
-* Markdown
- * Semantic Markdown
-* CommonMark
- * Semantic CommonMark