diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-23 13:52:19 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-23 13:52:19 +0200 |
| commit | 53e7a9ffc5b9670af009d7c9f3ac576c03d9974d (patch) | |
| tree | e2c214ff649ebe5707f0562c729e4c047c7529a5 | |
| parent | 37b7ec5c7072cc01c0d5585a28407d9747181d6c (diff) | |
improve intro and perspectives
| -rw-r--r-- | _conclusion.qmd | 17 | ||||
| -rw-r--r-- | _intro.qmd | 143 | ||||
| -rw-r--r-- | ref.bib | 20 |
3 files changed, 116 insertions, 64 deletions
diff --git a/_conclusion.qmd b/_conclusion.qmd index 86cf163..a308337 100644 --- a/_conclusion.qmd +++ b/_conclusion.qmd @@ -68,12 +68,10 @@ and also enables further explorations into more complex workflows. ### Integration with Hypothesis -*TODO: filter extension to extend Pandoc/Quarto citations to cover [CiTO]* +*FIXME: Introduce previous semester project +and elaborate on potential benefits of semanticized annotations.* -[CiTO]: <http://purl.org/spar/cito/2018-02-12> - "CiTO, the Citation Typing Ontology" - -### Generalizing Quarto metadata +### Generalizing Quarto metadata {#sec-quarto} Quarto, a document authoring framework using Pandoc to render academic papers, @@ -96,3 +94,12 @@ Such change, and consequential refinements of default Pandoc templates encouraging more normalized structures e.g. about authors and publishers, might reduce the amount of custom restructuring needed downstream e.g. in Quarto. + +### Nuanced citations in scholarly papers + +Pandoc and Quarto (see @sec-quarto) support annotating scholarly citations. +Recent work on annotating contextualisations of citations, +as presented in @Daquino2023, +however require further hinting than is currently easily achieved +with Pandoc and Quarto tooling, +which can likely leverage on this work as well as the planned next phases. @@ -40,7 +40,18 @@ called "Semantic Markdown" *FIXME: Elaborate on above draft spec and its relevancy.* -*TODO: Maybe move above paragraph down to implementation plan.* +*FIXME: Rewrite below to properly introduce Pandoc.* + +Pandoc reads a text document, +parses its structural components into an Abstract Syntax Tree (AST), +and serialises and writes back into a text document. +The Pandoc AST deliberately prioritises structural information +and is relaxed about visual information, +to preserve literal content +while reducing format-specific stylistic details, +relevant especially when processing between different formats. +Most common is to read plaintext Markdown files +and write LaTeX code further compiled into a PDF file. This project aims to enable authors to include semantic annotations as part of their writing, @@ -49,8 +60,6 @@ by extending the Markdown processor Pandoc to handle semantic text annotations, inspired by the syntax extension Semantic Markdown. -*FIXME: Rewrite and expand above to properly introduce Pandoc.* - ## Problem formulation So, @@ -66,74 +75,90 @@ So, is more likely long-term sustainable? * How can the reliability of a Pandoc extension be evaluated? -## Levels of implementation +## Project constraints -*FIXME: maybe move this subsection to later subsection on expectations for processors* +Driven by an interest in sustained development of this research project +well beyond the delivery covered in this paper, +the project has been voluntarily constrained +by the following early design decisions: -The primary aim is to support authoring; -enhancing renderings of the authored content is secondary. +* The current delivery is intentionally scoped + as a minimum viable product, + with additional features planned as separate future works. +* Programming language and integration design + is largely dictated by existing actively used systems, + rather than convenience of personal familiarity or efficiency. +* The project solely involves freely licensed tools and resources, + and is itself licensed under collaboratively incentivising free licences. -Annotations are metadata, not directly part of the content. -Consequently, the simplest way -to correctly process markdown with semantic text annotations -is to omit the annotations altogether, -rendering as if they had not been applied at all, -as illustrated in @fig-phase1. +These constraints are briefly explained below but not defended. +The author of this project believes +that they may help stimulate real-world practical use +and raise the potential for long-term sustainable active development, +but since the contraints are political in nature, +no attempt is made to evaluate them +or compare against other available options. -{#fig-phase1} +### Scope limited to authoring -More advanced processing may optionally embed semantic text annotations -as semantic document annotations, i.e. embed as document-wide metadata. +The scope of this project is to enable authoring, +with a further aim of later extending to more complex processing. -Even more advanced processing might enhance the content rendering, -similar to how some PDF rendering provide interactive navigation -for embedded hypermedia markup like links and anchors. +The primary aim is to enable authors to annotate semantics +as integral part of their creating writing process. +Future works may expand on this, +enhancing renderings of the authored content, +as discussed in @sec-rdf. -This project aims at the simplest processing level as described above, -due to the limited time of the project. -(further processing is discussed in @sec-rdf). +The idea is to introduce semantic text annotation to Markdown authoring +without disruptions in the Markdown-based workflows. +Annotations in Markdown gets filtered out, +so that further processing need to know about the new markup, +as illustrated in @fig-phase1. -## Maintaining usability and interoperability +{#fig-phase1} -A notable challenge is aligning with existing practice and systems. -Markdown is known for its unobtrusive plaintext editing format, -and an extension to its syntax will need to fit that principle. -Also, -*FIXME: drop or rewrite...* +### Evolve accepted formats and tools -## Implementation plan +Markdown is a widely adopted authoring format, +and Pandoc a widely adopted Markdown processor. +It is easier to invent a new format in a new tool +than convincing widely adopted ones to evolve, +but the latter is, if succesful, likely far more reliable. -*FIXME: drop or rewrite below section* +This project aims at introducing new syntax +while staying close to existing Markdown, +unlike e.g. SAM that deviates notably from Markdown +[@SAM2018]. -Pandoc reads a text document, -parses its structural components into an Abstract Syntax Tree (AST), -and serialises and writes back into a text document. -The Pandoc AST deliberately prioritises structural information -and is relaxed about visual information, -to preserve literal content -while reducing format-specific stylistic details, -relevant especially when processing between different formats. -Most common is to read plaintext Markdown files -and write LaTeX code further compiled into a PDF file. +Many Markdown processors exist, +but few are in widespread use. +Pandoc is widely adopted and in recent times integrated +with R Markdown and the Quarto document publishing system +to streamline the production of academic papers. +Pandoc is a versatile tool that supports extending through +both import plugins and filters for its Abstract Syntax Tree (AST). +The easiest way to implement a derivation of Markdown +is likely by writing a Pandoc import filter, +but that would limit its usefulness with larger frameworks like Quarto: +Although they do support alternative source languages, +important features like citation handling is far better streamlined +when using the main and best supported source format. +This project therefore implements its deviation of Markdown +by reading the deviant source as if it was Markdown, +and then applying a filter to adjust the AST after the deliberate misparsing. + +### Collaborative licensing -Pandoc allows supplying custom reader and writer functions -as well as plugging into and manipulating the AST, -which this project will exploit: -This project will write an extension -to adjust the AST -when abusing the default Markdown reader -to read Markdown with added markup for ontological annotations. +To encourage collaboration and stimulate a circular gift economy +as introduced by @Mikkelsen2000, +this project solely uses freely licensed tools and resources, +and the project itself is copyleft licensed: +Code parts are licensed +under the GNU Public Licence version 3 or newer, +and non-code parts are licensed +under the Creative Commons crediting share-alike 4.0. -First milestone is reached -when the filter can simply suppress the added markup. -A further milestone is to embed the expressed annotations -in supported output formats. -Another further milestone is to make use of the added markup, -e.g. to annotate purpose of scholarly citations -as presented in @Daquino2023. +## Implementation plan -* Outline work -* definér hvad er semantisk textannotation -* analysér eksisterende værktøj -* forentet output -* implementering som filter... +*FIXME: summarize next chapters* @@ -166,6 +166,18 @@ urldate = {2025-02-18}, } +@Thesis{Mikkelsen2000, + author = {Nicolai Bendix Mikkelsen}, + date = {2000-11}, + institution = {Roskilde Universitetscenter}, + title = {Gaveøkonomi [Gift Economy]}, + type = {mathesis}, + language = {danish}, + subtitle = {et perspektiv på udveksling over nettet [a perspective on exchange over the internet]}, + url = {http://gift-economy.jones.dk/speciale/}, + urldate = {2024-04-16}, +} + @Article{White2022, author = {Jason White}, date = {2022-12}, @@ -314,6 +326,14 @@ urldate = {2025-05-17}, } +@Online{SAM2018, + author = {Mark Baker}, + date = {2018-09-24}, + title = {Semantic Authoring Markdown (SAM)}, + url = {https://mbakeranalecta.github.io/sam/language.html}, + urldate = {2025-05-23}, +} + @Comment{jabref-meta: databaseType:biblatex;} @Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;} |
