diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-03-26 07:16:11 +0100 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-03-26 07:16:21 +0100 |
| commit | b4545e4fa7a2d241f1d9ebbfe8a7de403e379c14 (patch) | |
| tree | f38319e9cb3c5377f0af0e9dbbf5e6a378dac60e | |
| parent | c151b74eb41084f1872fafb93f282c82793900c1 (diff) | |
reduce intro; move/add chunks to background and design; update references
| -rw-r--r-- | _background.qmd | 67 | ||||
| -rw-r--r-- | _design.qmd | 19 | ||||
| -rw-r--r-- | _intro.qmd | 95 | ||||
| -rw-r--r-- | ref.bib | 42 |
4 files changed, 162 insertions, 61 deletions
diff --git a/_background.qmd b/_background.qmd index 883a528..d7b4b91 100644 --- a/_background.qmd +++ b/_background.qmd @@ -9,6 +9,42 @@ serialised as RDFa (embedded in HTML) and PDF (embedded in PDF). ## Markdown +Markdown is "probably the most popular markup language today" +[@Rapp2023, p. 42]. +It was originally defined by @Gruber2004 +as a superset of HTML, +improving readability and ease of writing +by adding email-style markup +for common content structure like headers, emphasis, lists and hyperlinks. + +A core principle of Markdown is readability: + +> A Markdown-formatted document should be publishable as-is, +> as plain text, +> without looking like it’s been marked up +> with tags or formatting instructions.. +> [@Gruber2004, section "Philosophy"]. + +Many dialects of Markdown have evolved, +some tightening the language for parsing efficiency and disambiguation, +some extending to cover additional structures +and some including support for a YAML or TOML metadata header section. + +Markdown as originally designed is a source format to produce HTML. +If using only Markdown-defined markup, avoiding HTML tags, +the text is however reliably translatable also to other formats. +Pandoc is a tool that can convert texts in Markdown dialects +into many document formats including HTML and (via LaTeX) PDF, +applying visual style and positioning throught templates. +Such document workflows, +including minimal structural markup as part of the creative writing, +applying visual layout as an automated templating process +tied to a target document format, +has been further streamlined for academic texts +in the Quarto document publishing system. + +---- + Markdown is a text markup language with an emphasis on being easy for humans to read [@Gruber2004]. @@ -58,6 +94,27 @@ others offer optional support e.g. through plugins ## Quarto +Pandoc is a document converter built around the markdown markup language, +able to parse from and serialise to many Markdown dialects +as well as equivalent subsets of other text markup languages +including HTML, LaTeX (and by extension PDF), +Office Open XML (as used by recent releases of Microsoft Word) +and OpenDocument (as used by OpenOffice and LibreOffice). +Pandoc is extensible, +supporting custom code loaded at runtime +either for custom parsing or serialising, +or for manipulating the intermediate internal content structure +called Abstract Syntax Tree (AST). + +Pandoc supports redefining input and output formats +and manipulating the internal document structure +as part of the automated parts of the framework. + +Pandoc is extendable. +Source and output format can be changed or completely redefined +and the internal document structure manipulated, +in the automated parts of the framework. + Collection of interrelated POSIX scripts and Pandoc extensions for enabling semantic annotations in Markdown-based authoring workflows. @@ -75,6 +132,16 @@ for enabling semantic annotations in Markdown-based authoring workflows. * structure semantic metadata as RDF triples * append RDF triples serialized as RDFa +Markdown provides intuitive and unobtrusive markup syntax +for structure like headers, emphasis, lists and hyperlinks. +Pandoc extends Markdown with syntax +for citation annotation +and an optional YAML metadata header. +Quarto extends Markdown further with syntax +for some styling and some convenience macros, +and applies templates for a uniform visual styling +across target document formats. + ### Interfaces * Pandoc document object model (DOM) diff --git a/_design.qmd b/_design.qmd index ee03381..2ff9a7a 100644 --- a/_design.qmd +++ b/_design.qmd @@ -2,4 +2,23 @@ ## XMP, RDFa and RDF +RDF is an abstract data model for knowledge graphs, +usable for domain-specific annotations: +Terminology for a domain is established by referencing a shared ontology, +and terms are composed as sets of subject-predicate-object triples. +RDF includes one language, Turtle, strives to be human readable, +and languages for embedding triples into other data structures, +notably XMP for PDF files and RDFa for HTML. + +RDF is an abstract data model for knowledge graphs. +Multiple RDF languages exist, +each covering all or subsets of the RDF model, +including human readability optimized Turtle, +RDFa for HTML embedding +and XMP for PDF embedding. +Each RDF language have different constraints, +e.g. the XMP language for storing RDF in media files +can express express one RDF graph in each XMP object +[@Adobe2012, p. 9]. + *TODO* @@ -1,62 +1,35 @@ -The process of authoring a conventional text-based content, -i.e. texts consisting mainly of a contiguous set of paragraphs, -can in some sense be described -as a task of materialising thoughts into a linear form expressed in words. -The story may consist of moves in time (e.g. a flashback) -or may not move in time at all (e.g. a dictionary entry), -but the telling of the story - the text itself - is linear. +Markdown is a markup language +that encourages treating structure as integral part of content +while postponing styling till later. +This separation of visual concerns from content and structure +is harnessed by the document converter Pandoc +and the Pandoc-based document authoring framework Quarto, +and is suitable for scholarly writing +where styling may be dictated by a publisher. -*TODO: reference some supportive Writing Process Research* +When writing with Quarto, +you can add intuitive and unobtrusive structural markup +for headers, emphasis, lists and hyperlinks. +You can annotate a string as a hyperlink with a title, +or as a citation with a source reference. +You can add authors, supervisors and publication date +as a YAML structure at the top of the text. +And you can produce a web page or a PDF document from your text, +sensibly styled and laid out according to academic conventions. +You cannot, however, annotate a string according to content domain. -During such reductive transformation of complex thoughts into linear text, -the authoring may be aided by the ability -to annotate some of the contexts omitted. -A train of thought may contain multiple interconnected trajectories, -where some are left out when reshaping into the written storyline, -but the contexts they represent may still be helpful -during the ongoing writing process, -even if not intended as part of the storyline specifically. - -A concrete example common in academic writing is that of citation. -The source of an included theory or argument or counterpoint -is not part of the storyline, -but a reference to it needs to be maintained -for accurately compiling a reference list later appended to the text. -For that specific type of author annotation, -a range of helper tools exist, -integrated to various degree with various authoring environments. -Support for author annotations more generically is less common, however. - -*TODO: reference later chapter covering known existing tools* - -The choice of authoring environment limit choices of functionality. -Some authors prefer -a what-you-see-is-what-you-get (WYSIWYG) authoring environment -where the words when written -are visually presented as they would appear in the final document, -e.g. the word processor Microsoft Word or LibreOffice Writer. -Other authors favor -a what-you-see-is-what-you-mean (WYSIWYM) authoring environment -where the words when written -are visually presented to emphasize their structural function in the text. -e.g. the word processor RStudio or the document processor LyX. -Yet others appreciate -an environment with technical oversight of both structure and layout -where prose is intermixed with structural and positioning control commands, -e.g. directly editing of code for the LaTeX typesetting system. -Each class of authoring system enables a different set of options -for author annotations. - -Whereas the document formats -for the commonly used tools Microsoft Word and LibreOffice -are binary, -the document formats used with RStudio, LyX and LaTeX -are plaintext, -which means the data format avoid the use of control characters -allowing for editing with general-purpose text editors. -A fundamental benefit of plaintext formatted texts -is freedom of choice regarding authoring tools -[@White2022, p. 3]. +Quarto supports hypertext and citation annotations, +but not arbitrary domain-specific annotations. +You can spell out in prose +that one set of numbers is in meter and another in nautical miles, +or that one citation is supportive and another a rebuttal, +or that Jane refers to "it" derogatory +whereas Joe uses "it" as preferred personal pronoun. +You cannot structurally annotate such details +omitted from contents of the output document +yet available for visual styling, indexing +and other automated processing, +and as intuitive and unobtrusive writing aid. ## Problem formulation @@ -71,11 +44,11 @@ To aid in achieving that goal, the problem statement has been divided into the following subquestions: * What are the core qualities of Markdown, - and how can the Markdown syntax be extended + and how could a Markdown flavor express domain-specific annotations while maintaining those qualities? -* How do Pandoc and Quarto process Markdown input to HTML or PDF output, - and how can this workflow be derived - to cover a new flavor of Markdown? +* How do Quarto convert Markdown source to HTML or PDF output, + and how can this workflow be extended + to cover Markdown with domain-specific annotations? * Which approach to altering the workflow of Pandoc and Quarto is more likely efficient and long-term sustainable? @@ -233,6 +233,48 @@ publisher = {International Organization for Standardization}, } +@InBook{Rapp2023, + author = {Christian Rapp and Till Heilmann and Otto Kruse}, + title = {Beyond MS Word: Alternatives and Developments}, + doi = {10.1007/978-3-031-36033-6_3}, + pages = {33--47}, +} + +@Book{Kruse2023, + date = {2023}, + title = {Digital Writing Technologies in Higher Education}, + doi = {10.1007/978-3-031-36033-6}, + editor = {Otto Kruse and Chris M. Anson and Elena Cotos and Antonette Shibani and Christian Rapp and Kalliopi Benetos and Ann Devitt}, + isbn = {978-3-03-136033-6}, + publisher = {Springer International Publishing}, + booksubtitle = {Theory, Research, and Practice}, + booktitle = {Digital Writing Technologies in Higher Education}, + file = {:Kruse2023 - Digital Writing Technologies in Higher Education.pdf:PDF}, +} + +@Article{Ovadia2014, + author = {Steven Ovadia}, + date = {2014-04}, + journaltitle = {Behavioral & Social Sciences Librarian}, + title = {Markdown for Librarians and Academics}, + doi = {10.1080/01639269.2014.904696}, + issn = {1544-4546}, + number = {2}, + pages = {120--124}, + volume = {33}, + file = {:Ovadia2014 - Markdown for Librarians and Academics.pdf:PDF}, + publisher = {Informa UK Limited}, +} + +@TechReport{Adobe2012, + date = {2012-04}, + title = {Extensible Metadata Platform (XMP) Specification}, + subtitle = {Part 1, Data Model, Serialization, and Core Properties}, + comment = {https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart1.pdf}, + editor = {Adobe Systems Incorporated}, + file = {:XMPSpecificationPart1-2.pdf:PDF}, +} + @Comment{jabref-meta: databaseType:biblatex;} @Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;} |
