aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-05-19 13:11:22 +0200
committerJonas Smedegaard <dr@jones.dk>2025-05-19 13:17:38 +0200
commit035842f3ea95cd4f68c5122ece1a5e8252c177a2 (patch)
tree6f227a38438b4d5ee47aed3da2e79e20c84a862b
parentb5efec91e993aa39bbde7e5ca7bf94943a7d4905 (diff)
tighten intro
-rw-r--r--_intro.qmd85
1 files changed, 34 insertions, 51 deletions
diff --git a/_intro.qmd b/_intro.qmd
index cb1ebc9..d5c1d38 100644
--- a/_intro.qmd
+++ b/_intro.qmd
@@ -2,56 +2,52 @@ The markup language Markdown was introduced in 2004
with the specific aim of helping authors focus on content,
separate from layout concerns
[@Gruber2004].
-Markdown has since been widely adopted,
-with one common use is as manually authored source format
-for generating reading-optimized documents in HTML, PDF and other formats.
-HTML and PDF have evolved in the same time period
-to support semantic text annotations,
-but Markdown can only express structure and hypermedia markup,
-not semantic markup.
-
-Text annotation is the process of applying contextual information to text.
-Annotating text differs from annotating the document as a whole --
+It has since then been widely adopted as authoring source format
+for generating reading-optimized documents in HTML and PDF
+that in the same time period have evolved
+to support semantic text annotations.
+But Markdown can only express structure and hypermedia annotations,
+not semantic annotations.
+Annotating text differs from annotating the document as a whole
in that the information is tied to specific text strings.
-A document annotation essentially says
-"this document contains hyperlinks to these URLs"
-or "this document contains strong language"
-whereas a text annotation says
-"this particular string is emphasized"
-or "this particular string refers to this specific URL".
+A document annotation can say
+e.g. "this document contains strong language somewhere"
+or "this document contains a link to this URL somwehere",
+whereas a text annotation can say
+e.g. "this text string is strongly emphasized"
+or "this text string links to this URL".
-Semantic text annotation (also called semantic markup)
+Semantic text annotation
is the process of applying information about meaning
to specific text strings.
-Semantic document annotation is supported by some Markdown dialects
-by prepending a metadata section to the content markup,
+Some Markdown dialects supports prepending a metadata section
+to the whole content markup,
+i.e. semantic *document* annotation,
but tying semantics to specific strings is currently unsupported.
-The author cannot annotate "this currency amount is in 1980 dollars"
+The author cannot express "this currency amount is in 1980 dollars"
or "this uses the derogatory meaning of the term".
-## Problem formulation
-
This project aims to enable authors to include semantic annotations
as part of their writing,
similarly unobtrusive as the widely adopted structural and hypermedia markup,
by extending a common Markdown processor
to handle semantic markup.
-This aim has been framed with the following problem statement:
-**How can Pandoc
-be extended to support context annotations?**
+## Problem formulation
-To aid in achieving that goal,
-the problem statement has been divided into the following subquestions:
+So,
+**How can a Markdown processor
+be extended to support semantic text annotations?**
* What are the core qualities of Markdown,
- and how could a Markdown dialect express context annotations
+ and how can a Markdown dialect express semantic text annotations
while maintaining those qualities?
-* How do Quarto convert Markdown source to HTML or PDF output,
+* How do a Markdown processor convert Markdown to HTML or PDF,
and how can this workflow be extended
- to handle context annotations?
-* Which approach to altering Quarto is more likely long-term sustainable?
-* How could reliability of a Pandoc plugin for Quarto be evaluated?
+ to handle semantic text annotations?
+* Which approach to altering a Markdown processor
+ is more likely long-term sustainable?
+* How can the reliability of an altered Markdown processor be evaluated?
## Levels of implementation
@@ -63,8 +59,10 @@ Consequently, the simplest way
to correctly process markdown with semantic text annotations
is to omit the annotations altogether,
rendering as if they had not been applied at all.
+
More advanced processing may optionally embed semantic text annotations
as semantic document annotations, i.e. embed as document-wide metadata.
+
Even more advanced processing might enhance the content rendering,
similar to how some PDF rendering provide interactive navigation
for embedded hypermedia markup like links and anchors.
@@ -76,29 +74,14 @@ due to the limited time of the project.
A notable challenge is aligning with existing practice and systems.
Markdown is known for its unobtrusive plaintext editing format,
-and an extension to its vocabulary will need to fit that principle.
+and an extension to its syntax will need to fit that principle.
Also,
-FIXME
-
-## Motivation
-
-*FIXME: rewrite to be project motivations (not author motivations)*
-
-The author of this project prefers
-a console-based WYSIWYM authoring environment,
-and for political reasons an environment
-consisting solely of freely licensed code.
-More specifically,
-the authoring system chosen of choice is Quarto,
-which allows either a WYSIWYM environment (RStudio among others)
-or a more technical console-based environment (any modern text editor),
-as it uses the plaintext markup format Markdown as source format,
-which is automatically processed by the tool Pandoc
-into any of a range of output formats
-including LaTeX with further postprocessing into PDF.
+*FIXME: drop or rewrite...*
## Implementation idea and brief plan
+*FIXME: drop or rewrite below section*
+
Implement plugins for the Pandoc document converter
to enable authoring of ontological annotations in the text content,
inspired by the conceptual idea in @Francart2020,