diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 15:00:20 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 15:00:20 +0200 |
| commit | a75d2ccb355e3fb489586ccaf82d780ddaeab0a7 (patch) | |
| tree | e8bddd54403a397fcd04a0abcdb510ac158806de /_background.qmd | |
| parent | 651fcfe553442965efbaec17891c4d205e0607a5 (diff) | |
rename chapter background -> pandoc; expand title for section on usage
Diffstat (limited to '_background.qmd')
| -rw-r--r-- | _background.qmd | 214 |
1 files changed, 0 insertions, 214 deletions
diff --git a/_background.qmd b/_background.qmd deleted file mode 100644 index 0116286..0000000 --- a/_background.qmd +++ /dev/null @@ -1,214 +0,0 @@ -*FIXME: This chapter is unfinished -- -currently contains 3-4 large chunks (separated by horisontal lines) -that need to be merged or maybe some parts dropped altogether...* - -This chapter will provide and analysis -of the data format Markdown -and the Markdown-based publishing system Quarto. - -This project mainly involves navigating in and altering data structures. -Main data structures are the document formats Markdown, HTML and PDF, -and the abstract data language RDF, -serialised as RDFa (embedded in HTML) and PDF (embedded in PDF). - -## Markdown - -### Structural and layout annotation, and metadata - -Original Markdown provides unobtrusive markup -for content and hypermedia structure, -to ease the authoring of style-agnostic hypermedia content. -Later dialects extends the language -to cover more content and hypermedia structure, -style annotation -and text-wide metadata. - -The separation of visual concerns from content and structure -is harnessed by the document converter Pandoc -and the Pandoc-based document authoring framework Quarto: -Pandoc with Quarto plugins and templates -allows annotating a string as a hyperlink or a citation, -declaring authorship, ownership and release date, -and rendering as a scholarly paper -conforming to a prescribed style guide and document format. - -### Semantic annotation is missing - -None of the existing Markdown dialects, -however, -covers annotation of content semantics. -You cannot -- using existing Markdown dialects -- -annotate a string as contextually related to some content domain, -in a way that Markdown processors will treat it as such: -When rendering an output document -the annotation is omitted from the text -and optionally accessible as part of document metadata. - -Example annotations might include -some numbers in meter and others in nautical miles, -or one citation being supportive and another a rebuttal, -or one quote using "she" as personal pronoun -and another using it derogatory. - -Such meta information tied not to the document as a whole -but to specific strings in the text -cannot be written as such -- -i.e. structurally part of the writing -but communicatively meta to the prose content of the text. - ---- - -Markdown is "probably the most popular markup language today" -[@Rapp2023, p. 42]. -It was originally defined by @Gruber2004 -as a superset of HTML, -improving readability and ease of writing -by adding email-style markup -for common content structure like headers, emphasis, lists and hyperlinks. - -A core principle of Markdown is readability: - -> A Markdown-formatted document should be publishable as-is, -> as plain text, -> without looking like it’s been marked up -> with tags or formatting instructions.. -> [@Gruber2004, section "Philosophy"]. - -Many dialects of Markdown have evolved, -some tightening the language for parsing efficiency and disambiguation, -some extending to cover additional structures -and some including support for a YAML or TOML metadata header section. - -Markdown as originally designed is a source format to produce HTML. -If using only Markdown-defined markup, avoiding HTML tags, -the text is however reliably translatable also to other formats. -Pandoc is a tool that can convert texts in Markdown dialects -into many document formats including HTML and (via LaTeX) PDF, -applying visual style and positioning throught templates. -Such document workflows, -including minimal structural markup as part of the creative writing, -applying visual layout as an automated templating process -tied to a target document format, -has been further streamlined for academic texts -in the Quarto document publishing system. - ----- - -Markdown is a text markup language -with an emphasis on being easy for humans to read -[@Gruber2004]. - -Compared to word processors like Microsoft Word and LibreOffice Writer, -Markdown authoring stores both content and markup together -in a human-readable tekst file. - -::: {#fig-formality} - -``` -informal /---------formatted text----------\ formal -<------v-------------v-------------v-----------------------v----> - plain text informal markup formal markup binary format - (Markdown) (HTML, XML, etc.) -``` - -Markdown is informal, ASCII-based markup -[@Leonard2016, p. 4] - -::: - -HTML is itself a plaintext format, -but is less human-readable. -Similarly the format LaTeX is also plaintext, -but its markdown arguably distracts the reading process -[@Mailund2019chap2, p. 9]. - -### Alternatives - -Other human-readable document source formats exists. - -*TODO: briefly cover reStructuredText, Org-mode and AsciiDoc.* - -### Integration - -Markdown is in widespread use. - -Major source forges use Markdown by default for `README` files -[@Github2025; @GitLab2025; @Codeberg2024]. -Some major programming languages -natively support Markdown in embedded docstrings -in core tools -[@Microsoft2023; @Oracle2025; @RustTeam2024]; -others offer optional support e.g. through plugins -[@Heesch2025; @Sphinx2025; @JSDoc2023]. - -## Pandoc and Quarto - -The Markdown processor Pandoc can transform Markdown not only to HTML -but also to other output formats like PDF. -Pandoc offers an API for adapting its content processing -as well as a templating structure for customizing layout, -which is streamlined in the document authoring framework Quarto: -Pandoc with a set of plugins and templates -enables rendering of scholarly papers -conforming to prescribed style guides and document formats. - ---- - -Pandoc is a document converter built around the markdown markup language, -able to parse from and serialise to many Markdown dialects -as well as equivalent subsets of other text markup languages -including HTML, LaTeX (and by extension PDF), -Office Open XML (as used by recent releases of Microsoft Word) -and OpenDocument (as used by OpenOffice and LibreOffice). -Pandoc is extensible, -supporting custom code loaded at runtime -either for custom parsing or serialising, -or for manipulating the intermediate internal content structure -called Abstract Syntax Tree (AST). - -Pandoc supports redefining input and output formats -and manipulating the internal document structure -as part of the automated parts of the framework. - -Pandoc is extendable. -Source and output format can be changed or completely redefined -and the internal document structure manipulated, -in the automated parts of the framework. - -Collection of interrelated POSIX scripts and Pandoc extensions -for enabling semantic annotations in Markdown-based authoring workflows. - -* filter extension to capture annotations - * identify semantic metadata in stylistic metadata part of Pandoc YAML header - * identify semantic metadata in content part of Pandoc document structure - * append semantic metadata to Pandoc YAML document header - * strip identified metadata from stylistic metadata and content -* output format extension to generate PDF - * read semantic metadata from Pandoc YAML document header - * structure semantic metadata as RDF triples - * append RDF triples serialized as part of XMP metadata in PDF -* output format extension to generate web page - * read semantic metadata from Pandoc YAML document header - * structure semantic metadata as RDF triples - * append RDF triples serialized as RDFa - -Markdown provides intuitive and unobtrusive markup syntax -for structure like headers, emphasis, lists and hyperlinks. -Pandoc extends Markdown with syntax -for citation annotation -and an optional YAML metadata header. -Quarto extends Markdown further with syntax -for some styling and some convenience macros, -and applies templates for a uniform visual styling -across target document formats. - -### Interfaces - -* Pandoc document object model (DOM) -* Resource Description Framework (RDF) - * XMP - * RDFa -* Markdown - * Semantic Markdown -* CommonMark - * Semantic CommonMark |
