aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-03-26 07:16:11 +0100
committerJonas Smedegaard <dr@jones.dk>2025-03-26 07:16:21 +0100
commitb4545e4fa7a2d241f1d9ebbfe8a7de403e379c14 (patch)
treef38319e9cb3c5377f0af0e9dbbf5e6a378dac60e
parentc151b74eb41084f1872fafb93f282c82793900c1 (diff)
reduce intro; move/add chunks to background and design; update references
-rw-r--r--_background.qmd67
-rw-r--r--_design.qmd19
-rw-r--r--_intro.qmd95
-rw-r--r--ref.bib42
4 files changed, 162 insertions, 61 deletions
diff --git a/_background.qmd b/_background.qmd
index 883a528..d7b4b91 100644
--- a/_background.qmd
+++ b/_background.qmd
@@ -9,6 +9,42 @@ serialised as RDFa (embedded in HTML) and PDF (embedded in PDF).
## Markdown
+Markdown is "probably the most popular markup language today"
+[@Rapp2023, p. 42].
+It was originally defined by @Gruber2004
+as a superset of HTML,
+improving readability and ease of writing
+by adding email-style markup
+for common content structure like headers, emphasis, lists and hyperlinks.
+
+A core principle of Markdown is readability:
+
+> A Markdown-formatted document should be publishable as-is,
+> as plain text,
+> without looking like it’s been marked up
+> with tags or formatting instructions..
+> [@Gruber2004, section "Philosophy"].
+
+Many dialects of Markdown have evolved,
+some tightening the language for parsing efficiency and disambiguation,
+some extending to cover additional structures
+and some including support for a YAML or TOML metadata header section.
+
+Markdown as originally designed is a source format to produce HTML.
+If using only Markdown-defined markup, avoiding HTML tags,
+the text is however reliably translatable also to other formats.
+Pandoc is a tool that can convert texts in Markdown dialects
+into many document formats including HTML and (via LaTeX) PDF,
+applying visual style and positioning throught templates.
+Such document workflows,
+including minimal structural markup as part of the creative writing,
+applying visual layout as an automated templating process
+tied to a target document format,
+has been further streamlined for academic texts
+in the Quarto document publishing system.
+
+----
+
Markdown is a text markup language
with an emphasis on being easy for humans to read
[@Gruber2004].
@@ -58,6 +94,27 @@ others offer optional support e.g. through plugins
## Quarto
+Pandoc is a document converter built around the markdown markup language,
+able to parse from and serialise to many Markdown dialects
+as well as equivalent subsets of other text markup languages
+including HTML, LaTeX (and by extension PDF),
+Office Open XML (as used by recent releases of Microsoft Word)
+and OpenDocument (as used by OpenOffice and LibreOffice).
+Pandoc is extensible,
+supporting custom code loaded at runtime
+either for custom parsing or serialising,
+or for manipulating the intermediate internal content structure
+called Abstract Syntax Tree (AST).
+
+Pandoc supports redefining input and output formats
+and manipulating the internal document structure
+as part of the automated parts of the framework.
+
+Pandoc is extendable.
+Source and output format can be changed or completely redefined
+and the internal document structure manipulated,
+in the automated parts of the framework.
+
Collection of interrelated POSIX scripts and Pandoc extensions
for enabling semantic annotations in Markdown-based authoring workflows.
@@ -75,6 +132,16 @@ for enabling semantic annotations in Markdown-based authoring workflows.
* structure semantic metadata as RDF triples
* append RDF triples serialized as RDFa
+Markdown provides intuitive and unobtrusive markup syntax
+for structure like headers, emphasis, lists and hyperlinks.
+Pandoc extends Markdown with syntax
+for citation annotation
+and an optional YAML metadata header.
+Quarto extends Markdown further with syntax
+for some styling and some convenience macros,
+and applies templates for a uniform visual styling
+across target document formats.
+
### Interfaces
* Pandoc document object model (DOM)
diff --git a/_design.qmd b/_design.qmd
index ee03381..2ff9a7a 100644
--- a/_design.qmd
+++ b/_design.qmd
@@ -2,4 +2,23 @@
## XMP, RDFa and RDF
+RDF is an abstract data model for knowledge graphs,
+usable for domain-specific annotations:
+Terminology for a domain is established by referencing a shared ontology,
+and terms are composed as sets of subject-predicate-object triples.
+RDF includes one language, Turtle, strives to be human readable,
+and languages for embedding triples into other data structures,
+notably XMP for PDF files and RDFa for HTML.
+
+RDF is an abstract data model for knowledge graphs.
+Multiple RDF languages exist,
+each covering all or subsets of the RDF model,
+including human readability optimized Turtle,
+RDFa for HTML embedding
+and XMP for PDF embedding.
+Each RDF language have different constraints,
+e.g. the XMP language for storing RDF in media files
+can express express one RDF graph in each XMP object
+[@Adobe2012, p. 9].
+
*TODO*
diff --git a/_intro.qmd b/_intro.qmd
index 5fc5072..582d9db 100644
--- a/_intro.qmd
+++ b/_intro.qmd
@@ -1,62 +1,35 @@
-The process of authoring a conventional text-based content,
-i.e. texts consisting mainly of a contiguous set of paragraphs,
-can in some sense be described
-as a task of materialising thoughts into a linear form expressed in words.
-The story may consist of moves in time (e.g. a flashback)
-or may not move in time at all (e.g. a dictionary entry),
-but the telling of the story - the text itself - is linear.
+Markdown is a markup language
+that encourages treating structure as integral part of content
+while postponing styling till later.
+This separation of visual concerns from content and structure
+is harnessed by the document converter Pandoc
+and the Pandoc-based document authoring framework Quarto,
+and is suitable for scholarly writing
+where styling may be dictated by a publisher.
-*TODO: reference some supportive Writing Process Research*
+When writing with Quarto,
+you can add intuitive and unobtrusive structural markup
+for headers, emphasis, lists and hyperlinks.
+You can annotate a string as a hyperlink with a title,
+or as a citation with a source reference.
+You can add authors, supervisors and publication date
+as a YAML structure at the top of the text.
+And you can produce a web page or a PDF document from your text,
+sensibly styled and laid out according to academic conventions.
+You cannot, however, annotate a string according to content domain.
-During such reductive transformation of complex thoughts into linear text,
-the authoring may be aided by the ability
-to annotate some of the contexts omitted.
-A train of thought may contain multiple interconnected trajectories,
-where some are left out when reshaping into the written storyline,
-but the contexts they represent may still be helpful
-during the ongoing writing process,
-even if not intended as part of the storyline specifically.
-
-A concrete example common in academic writing is that of citation.
-The source of an included theory or argument or counterpoint
-is not part of the storyline,
-but a reference to it needs to be maintained
-for accurately compiling a reference list later appended to the text.
-For that specific type of author annotation,
-a range of helper tools exist,
-integrated to various degree with various authoring environments.
-Support for author annotations more generically is less common, however.
-
-*TODO: reference later chapter covering known existing tools*
-
-The choice of authoring environment limit choices of functionality.
-Some authors prefer
-a what-you-see-is-what-you-get (WYSIWYG) authoring environment
-where the words when written
-are visually presented as they would appear in the final document,
-e.g. the word processor Microsoft Word or LibreOffice Writer.
-Other authors favor
-a what-you-see-is-what-you-mean (WYSIWYM) authoring environment
-where the words when written
-are visually presented to emphasize their structural function in the text.
-e.g. the word processor RStudio or the document processor LyX.
-Yet others appreciate
-an environment with technical oversight of both structure and layout
-where prose is intermixed with structural and positioning control commands,
-e.g. directly editing of code for the LaTeX typesetting system.
-Each class of authoring system enables a different set of options
-for author annotations.
-
-Whereas the document formats
-for the commonly used tools Microsoft Word and LibreOffice
-are binary,
-the document formats used with RStudio, LyX and LaTeX
-are plaintext,
-which means the data format avoid the use of control characters
-allowing for editing with general-purpose text editors.
-A fundamental benefit of plaintext formatted texts
-is freedom of choice regarding authoring tools
-[@White2022, p. 3].
+Quarto supports hypertext and citation annotations,
+but not arbitrary domain-specific annotations.
+You can spell out in prose
+that one set of numbers is in meter and another in nautical miles,
+or that one citation is supportive and another a rebuttal,
+or that Jane refers to "it" derogatory
+whereas Joe uses "it" as preferred personal pronoun.
+You cannot structurally annotate such details
+omitted from contents of the output document
+yet available for visual styling, indexing
+and other automated processing,
+and as intuitive and unobtrusive writing aid.
## Problem formulation
@@ -71,11 +44,11 @@ To aid in achieving that goal,
the problem statement has been divided into the following subquestions:
* What are the core qualities of Markdown,
- and how can the Markdown syntax be extended
+ and how could a Markdown flavor express domain-specific annotations
while maintaining those qualities?
-* How do Pandoc and Quarto process Markdown input to HTML or PDF output,
- and how can this workflow be derived
- to cover a new flavor of Markdown?
+* How do Quarto convert Markdown source to HTML or PDF output,
+ and how can this workflow be extended
+ to cover Markdown with domain-specific annotations?
* Which approach to altering the workflow of Pandoc and Quarto
is more likely efficient and long-term sustainable?
diff --git a/ref.bib b/ref.bib
index 83d7a52..a090a0d 100644
--- a/ref.bib
+++ b/ref.bib
@@ -233,6 +233,48 @@
publisher = {International Organization for Standardization},
}
+@InBook{Rapp2023,
+ author = {Christian Rapp and Till Heilmann and Otto Kruse},
+ title = {Beyond MS Word: Alternatives and Developments},
+ doi = {10.1007/978-3-031-36033-6_3},
+ pages = {33--47},
+}
+
+@Book{Kruse2023,
+ date = {2023},
+ title = {Digital Writing Technologies in Higher Education},
+ doi = {10.1007/978-3-031-36033-6},
+ editor = {Otto Kruse and Chris M. Anson and Elena Cotos and Antonette Shibani and Christian Rapp and Kalliopi Benetos and Ann Devitt},
+ isbn = {978-3-03-136033-6},
+ publisher = {Springer International Publishing},
+ booksubtitle = {Theory, Research, and Practice},
+ booktitle = {Digital Writing Technologies in Higher Education},
+ file = {:Kruse2023 - Digital Writing Technologies in Higher Education.pdf:PDF},
+}
+
+@Article{Ovadia2014,
+ author = {Steven Ovadia},
+ date = {2014-04},
+ journaltitle = {Behavioral & Social Sciences Librarian},
+ title = {Markdown for Librarians and Academics},
+ doi = {10.1080/01639269.2014.904696},
+ issn = {1544-4546},
+ number = {2},
+ pages = {120--124},
+ volume = {33},
+ file = {:Ovadia2014 - Markdown for Librarians and Academics.pdf:PDF},
+ publisher = {Informa UK Limited},
+}
+
+@TechReport{Adobe2012,
+ date = {2012-04},
+ title = {Extensible Metadata Platform (XMP) Specification},
+ subtitle = {Part 1, Data Model, Serialization, and Core Properties},
+ comment = {https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart1.pdf},
+ editor = {Adobe Systems Incorporated},
+ file = {:XMPSpecificationPart1-2.pdf:PDF},
+}
+
@Comment{jabref-meta: databaseType:biblatex;}
@Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;}