diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 13:11:22 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-19 13:17:38 +0200 |
| commit | 035842f3ea95cd4f68c5122ece1a5e8252c177a2 (patch) | |
| tree | 6f227a38438b4d5ee47aed3da2e79e20c84a862b | |
| parent | b5efec91e993aa39bbde7e5ca7bf94943a7d4905 (diff) | |
tighten intro
| -rw-r--r-- | _intro.qmd | 85 |
1 files changed, 34 insertions, 51 deletions
@@ -2,56 +2,52 @@ The markup language Markdown was introduced in 2004 with the specific aim of helping authors focus on content, separate from layout concerns [@Gruber2004]. -Markdown has since been widely adopted, -with one common use is as manually authored source format -for generating reading-optimized documents in HTML, PDF and other formats. -HTML and PDF have evolved in the same time period -to support semantic text annotations, -but Markdown can only express structure and hypermedia markup, -not semantic markup. - -Text annotation is the process of applying contextual information to text. -Annotating text differs from annotating the document as a whole -- +It has since then been widely adopted as authoring source format +for generating reading-optimized documents in HTML and PDF +that in the same time period have evolved +to support semantic text annotations. +But Markdown can only express structure and hypermedia annotations, +not semantic annotations. +Annotating text differs from annotating the document as a whole in that the information is tied to specific text strings. -A document annotation essentially says -"this document contains hyperlinks to these URLs" -or "this document contains strong language" -whereas a text annotation says -"this particular string is emphasized" -or "this particular string refers to this specific URL". +A document annotation can say +e.g. "this document contains strong language somewhere" +or "this document contains a link to this URL somwehere", +whereas a text annotation can say +e.g. "this text string is strongly emphasized" +or "this text string links to this URL". -Semantic text annotation (also called semantic markup) +Semantic text annotation is the process of applying information about meaning to specific text strings. -Semantic document annotation is supported by some Markdown dialects -by prepending a metadata section to the content markup, +Some Markdown dialects supports prepending a metadata section +to the whole content markup, +i.e. semantic *document* annotation, but tying semantics to specific strings is currently unsupported. -The author cannot annotate "this currency amount is in 1980 dollars" +The author cannot express "this currency amount is in 1980 dollars" or "this uses the derogatory meaning of the term". -## Problem formulation - This project aims to enable authors to include semantic annotations as part of their writing, similarly unobtrusive as the widely adopted structural and hypermedia markup, by extending a common Markdown processor to handle semantic markup. -This aim has been framed with the following problem statement: -**How can Pandoc -be extended to support context annotations?** +## Problem formulation -To aid in achieving that goal, -the problem statement has been divided into the following subquestions: +So, +**How can a Markdown processor +be extended to support semantic text annotations?** * What are the core qualities of Markdown, - and how could a Markdown dialect express context annotations + and how can a Markdown dialect express semantic text annotations while maintaining those qualities? -* How do Quarto convert Markdown source to HTML or PDF output, +* How do a Markdown processor convert Markdown to HTML or PDF, and how can this workflow be extended - to handle context annotations? -* Which approach to altering Quarto is more likely long-term sustainable? -* How could reliability of a Pandoc plugin for Quarto be evaluated? + to handle semantic text annotations? +* Which approach to altering a Markdown processor + is more likely long-term sustainable? +* How can the reliability of an altered Markdown processor be evaluated? ## Levels of implementation @@ -63,8 +59,10 @@ Consequently, the simplest way to correctly process markdown with semantic text annotations is to omit the annotations altogether, rendering as if they had not been applied at all. + More advanced processing may optionally embed semantic text annotations as semantic document annotations, i.e. embed as document-wide metadata. + Even more advanced processing might enhance the content rendering, similar to how some PDF rendering provide interactive navigation for embedded hypermedia markup like links and anchors. @@ -76,29 +74,14 @@ due to the limited time of the project. A notable challenge is aligning with existing practice and systems. Markdown is known for its unobtrusive plaintext editing format, -and an extension to its vocabulary will need to fit that principle. +and an extension to its syntax will need to fit that principle. Also, -FIXME - -## Motivation - -*FIXME: rewrite to be project motivations (not author motivations)* - -The author of this project prefers -a console-based WYSIWYM authoring environment, -and for political reasons an environment -consisting solely of freely licensed code. -More specifically, -the authoring system chosen of choice is Quarto, -which allows either a WYSIWYM environment (RStudio among others) -or a more technical console-based environment (any modern text editor), -as it uses the plaintext markup format Markdown as source format, -which is automatically processed by the tool Pandoc -into any of a range of output formats -including LaTeX with further postprocessing into PDF. +*FIXME: drop or rewrite...* ## Implementation idea and brief plan +*FIXME: drop or rewrite below section* + Implement plugins for the Pandoc document converter to enable authoring of ontological annotations in the text content, inspired by the conceptual idea in @Francart2020, |
