diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 20:31:16 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 20:37:46 +0200 |
| commit | 60d9e99102abcc21a9356a5a44c5de705ad67284 (patch) | |
| tree | da0b17992d50fec1676d1dfb508e85c55a0bcbe1 | |
| parent | a75d2ccb355e3fb489586ccaf82d780ddaeab0a7 (diff) | |
add initial citation; reduce section on pandoc; tighten references
| -rw-r--r-- | _intro.qmd | 6 | ||||
| -rw-r--r-- | _pandoc.qmd | 146 | ||||
| -rw-r--r-- | ref.bib | 79 |
3 files changed, 55 insertions, 176 deletions
@@ -1,3 +1,9 @@ +> A Markdown-formatted document should be publishable as-is, +> as plain text, +> without looking like it’s been marked up +> with tags or formatting instructions. +> [@Gruber2004syntax, section "Philosophy"] + The markup language Markdown was introduced in 2004 with the specific aim of helping authors focus on content, separate from layout concerns diff --git a/_pandoc.qmd b/_pandoc.qmd index 0116286..6c78b30 100644 --- a/_pandoc.qmd +++ b/_pandoc.qmd @@ -2,86 +2,14 @@ currently contains 3-4 large chunks (separated by horisontal lines) that need to be merged or maybe some parts dropped altogether...* -This chapter will provide and analysis -of the data format Markdown -and the Markdown-based publishing system Quarto. - -This project mainly involves navigating in and altering data structures. -Main data structures are the document formats Markdown, HTML and PDF, -and the abstract data language RDF, -serialised as RDFa (embedded in HTML) and PDF (embedded in PDF). - -## Markdown - -### Structural and layout annotation, and metadata - -Original Markdown provides unobtrusive markup -for content and hypermedia structure, -to ease the authoring of style-agnostic hypermedia content. -Later dialects extends the language -to cover more content and hypermedia structure, -style annotation -and text-wide metadata. - -The separation of visual concerns from content and structure -is harnessed by the document converter Pandoc -and the Pandoc-based document authoring framework Quarto: -Pandoc with Quarto plugins and templates -allows annotating a string as a hyperlink or a citation, -declaring authorship, ownership and release date, -and rendering as a scholarly paper -conforming to a prescribed style guide and document format. - -### Semantic annotation is missing - -None of the existing Markdown dialects, -however, -covers annotation of content semantics. -You cannot -- using existing Markdown dialects -- -annotate a string as contextually related to some content domain, -in a way that Markdown processors will treat it as such: -When rendering an output document -the annotation is omitted from the text -and optionally accessible as part of document metadata. - -Example annotations might include -some numbers in meter and others in nautical miles, -or one citation being supportive and another a rebuttal, -or one quote using "she" as personal pronoun -and another using it derogatory. - -Such meta information tied not to the document as a whole -but to specific strings in the text -cannot be written as such -- -i.e. structurally part of the writing -but communicatively meta to the prose content of the text. - ---- - -Markdown is "probably the most popular markup language today" -[@Rapp2023, p. 42]. -It was originally defined by @Gruber2004 -as a superset of HTML, -improving readability and ease of writing -by adding email-style markup -for common content structure like headers, emphasis, lists and hyperlinks. - -A core principle of Markdown is readability: - -> A Markdown-formatted document should be publishable as-is, -> as plain text, -> without looking like it’s been marked up -> with tags or formatting instructions.. -> [@Gruber2004, section "Philosophy"]. +This chapter will provide an analysis +of the Markdown processor Pandoc. Many dialects of Markdown have evolved, some tightening the language for parsing efficiency and disambiguation, some extending to cover additional structures and some including support for a YAML or TOML metadata header section. -Markdown as originally designed is a source format to produce HTML. -If using only Markdown-defined markup, avoiding HTML tags, -the text is however reliably translatable also to other formats. Pandoc is a tool that can convert texts in Markdown dialects into many document formats including HTML and (via LaTeX) PDF, applying visual style and positioning throught templates. @@ -94,55 +22,6 @@ in the Quarto document publishing system. ---- -Markdown is a text markup language -with an emphasis on being easy for humans to read -[@Gruber2004]. - -Compared to word processors like Microsoft Word and LibreOffice Writer, -Markdown authoring stores both content and markup together -in a human-readable tekst file. - -::: {#fig-formality} - -``` -informal /---------formatted text----------\ formal -<------v-------------v-------------v-----------------------v----> - plain text informal markup formal markup binary format - (Markdown) (HTML, XML, etc.) -``` - -Markdown is informal, ASCII-based markup -[@Leonard2016, p. 4] - -::: - -HTML is itself a plaintext format, -but is less human-readable. -Similarly the format LaTeX is also plaintext, -but its markdown arguably distracts the reading process -[@Mailund2019chap2, p. 9]. - -### Alternatives - -Other human-readable document source formats exists. - -*TODO: briefly cover reStructuredText, Org-mode and AsciiDoc.* - -### Integration - -Markdown is in widespread use. - -Major source forges use Markdown by default for `README` files -[@Github2025; @GitLab2025; @Codeberg2024]. -Some major programming languages -natively support Markdown in embedded docstrings -in core tools -[@Microsoft2023; @Oracle2025; @RustTeam2024]; -others offer optional support e.g. through plugins -[@Heesch2025; @Sphinx2025; @JSDoc2023]. - -## Pandoc and Quarto - The Markdown processor Pandoc can transform Markdown not only to HTML but also to other output formats like PDF. Pandoc offers an API for adapting its content processing @@ -191,24 +70,3 @@ for enabling semantic annotations in Markdown-based authoring workflows. * read semantic metadata from Pandoc YAML document header * structure semantic metadata as RDF triples * append RDF triples serialized as RDFa - -Markdown provides intuitive and unobtrusive markup syntax -for structure like headers, emphasis, lists and hyperlinks. -Pandoc extends Markdown with syntax -for citation annotation -and an optional YAML metadata header. -Quarto extends Markdown further with syntax -for some styling and some convenience macros, -and applies templates for a uniform visual styling -across target document formats. - -### Interfaces - -* Pandoc document object model (DOM) -* Resource Description Framework (RDF) - * XMP - * RDFa -* Markdown - * Semantic Markdown -* CommonMark - * Semantic CommonMark @@ -58,12 +58,23 @@ institution = {Internet Engineering Task Force}, } -@Electronic{Gruber2004, - author = {John Gruber}, - date = {2004-12-17}, - title = {Markdown}, - url = {https://daringfireball.net/projects/markdown/}, - urldate = {2025-02-18}, +@Online{Gruber2004, + author = {John Gruber}, + date = {2004-12-17}, + title = {Markdown}, + url = {https://daringfireball.net/projects/markdown/}, + organization = {{Daring Fireball}}, + urldate = {2025-02-18}, +} + +@Online{Gruber2004syntax, + author = {John Gruber}, + date = {2004}, + title = {Markdown}, + url = {https://daringfireball.net/projects/markdown/syntax}, + organization = {{Daring Fireball}}, + subtitle = {Syntax}, + urldate = {2025-05-19}, } @Book{Leonard2016, @@ -74,7 +85,7 @@ institution = {Internet Engineering Task Force}, } -@Electronic{Heesch2025, +@Online{Heesch2025, author = {Dimitri van Heesch}, date = {2025-01-09}, title = {Doxygen}, @@ -83,15 +94,16 @@ urldate = {2025-02-18}, } -@Electronic{Github2025, +@Online{Github2025, date = {2025-02-18}, editor = {{Github, Inc.}}, + title = {About {READMEs}}, url = {https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes}, organization = {GitHub, Inc.}, urldate = {2025-02-18}, } -@Electronic{GitLab2025, +@Online{GitLab2025, author = {{GitLab Inc.}}, date = {2025-02-17}, title = {GitLab Flavored Markdown (GLFM)}, @@ -100,7 +112,7 @@ urldate = {2025-02-18}, } -@Electronic{Codeberg2024, +@Online{Codeberg2024, author = {{Codeberg Docs Contributors}}, date = {2024-11-29}, title = {Your First Repository}, @@ -109,7 +121,7 @@ urldate = {2025-02-18}, } -@Electronic{Oracle2025, +@Online{Oracle2025, author = {{Oracle}}, date = {2025-01-25}, title = {JavaDoc Guide}, @@ -118,7 +130,7 @@ urldate = {2025-02-18}, } -@Electronic{RustTeam2024, +@Online{RustTeam2024, author = {{the Rust Team}}, date = {2024-04-04}, title = {The rustdoc book}, @@ -127,7 +139,7 @@ urldate = {2025-02-18}, } -@Electronic{Sphinx2025, +@Online{Sphinx2025, author = {{the Sphinx developers}}, date = {2025-01-29}, title = {Sphinx documentation}, @@ -136,7 +148,7 @@ urldate = {2025-02-18}, } -@Electronic{JSDoc2023, +@Online{JSDoc2023, author = {{the contributors to JSDoc}}, date = {2023-10-31}, title = {Use JSDoc}, @@ -145,9 +157,9 @@ urldate = {2025-02-18}, } -@Electronic{Microsoft2023, +@Online{Microsoft2023, date = {2023-07-12}, - editor = {Microsoft}, + editor = {{Microsoft}}, title = {docfx}, url = {https://dotnet.github.io/docfx/docs/basic-concepts.html}, subtitle = {Basic Concepts}, @@ -182,25 +194,28 @@ journaltitle = {Lecture Notes in Computer Science}, } -@Misc{Herman2015, - date = {2015-03-17}, - editor = {Ivan Herman and Ben Adida and Manu Sporny and Mark Birbeck}, - title = {RDFa 1.1 Primer - Third Edition}, - language = {English}, - subtitle = {Rich Structured Data Markup for Web Documents}, - url = {https://www.w3.org/TR/rdfa-primer/}, - urldate = {2025}, +@TechReport{Herman2015, + date = {2015-03-17}, + institution = {{W3C}}, + title = {RDFa 1.1 Primer - Third Edition}, + language = {English}, + subtitle = {Rich Structured Data Markup for Web Documents}, + url = {https://www.w3.org/TR/rdfa-primer/}, + urldate = {2025}, + version = {3}, + editor = {Ivan Herman and Ben Adida and Manu Sporny and Mark Birbeck}, } -@Article{Francart2020, - author = {Thomas Francart}, - date = {2020-02-20}, - title = {Semantic Markdown Specifications}, - editor = {sparna}, - url = {https://blog.sparna.fr/2020/02/20/semantic-markdown/}, +@Online{Francart2020, + author = {Thomas Francart}, + date = {2020-02-20}, + title = {Semantic Markdown Specifications}, + url = {https://blog.sparna.fr/2020/02/20/semantic-markdown/}, + organization = {{Sparna}}, + urldate = {2025-05-19}, } -@Misc{Smedegaard2022, +@Online{Smedegaard2022, author = {Jonas Smedegaard and Thomas Francart}, date = {2022-04-09}, editor = {Jonas Smedegaard}, @@ -208,7 +223,7 @@ url = {https://source.jones.dk/semantic-markdown/about/}, } -@Misc{Daquino2023, +@Article{Daquino2023, author = {Daquino, Marilena and Massari, Arcangelo and Peroni, Silvio and Shotton, David}, date = {2023}, title = {The OpenCitations Data Model}, |
