--- title: | Processing Semantic Text Annotations in Markdown subtitle: Semantic Markdown Implemented as a Pandoc Filter date: 2025-05-27 toc-depth: 2 format: stylish-report-pdf: pdfversion: "2.0" pdfstandard: [A-4f, UA-2] # generate Well-Tagged PDF using tag tagging # requires documentmetadatasupport 1.0n (2025-03-25) # tagging: on # else generate Well-Tagged PDF using latest testphase components # # requires documentmetadatasupport 1.0k (2024-12-21) pdftestphase: latest # fail on error generating Well-Tagged PDF # pdftestphasestrict: true # debugging of Well-Tagged PDF # pdfdebug: ["para"] metadata-files: - _actors.yml keywords: - Markdown - CommoMmark - Lua - pandoc - semantic authoring - semantic annotation link-citations: true bibliography: ref.bib csl: apa resource-path: - /usr/share/citation-style-language/styles filters: - en2em - nobreaks - include-code-files include-in-header: # add alt text to embedded image, to aid non-visual rendering - text: | \renewcommand{\doclicenseImage}[1][]{% \setkeys{doclicense}{#1} \href{\doclicenseURL}{% \includegraphics[ alt=\doclicenseText% width=\doclicense@imagewidth% ]{\doclicenseImageFileName}% } } # fix british spelling of "licence", and append link to published source - text: | \makeatletter \@namedef{doclicense@lang@word@license}{ licence.\\ Source is available at \href{https://source.jones.dk/sem-md}{https://source.jones.dk/sem-md} and decentrally with \href{https://app.radicle.xyz/nodes/seed.radicle.garden/rad:z3ckfAxH326pgPFKe7rDYJ4mxBoWo}{% rad:z3ckfAxH326pgPFKe7rDYJ4mxBoWo}% } \makeatother #keep-tex: true --- # Abstract This report explores the extension of the Markdown markup language to support semantic text annotations, focusing on implementation via a Pandoc filter. Markdown, originally designed for simplicity and readability, lacks native support for tying semantic information to specific text strings. The project implements the Semantic Markdown draft specification by misparsing semantic annotations as regular Markdown and subsequently cleaning them up through a Lua-based Pandoc filter. This approach preserves Markdown's core qualities while enabling enriched semantic authoring integrated with existing workflows. The report analyses Markdown syntax, the Pandoc Abstract Syntax Tree, and the filter API to achieve this integration. Future work aims to enhance annotation extraction and rendering, facilitating improved workflows for collaborative and automated document processing. ::: {lang=da} Denne rapport undersøger udvidelsen af markup-sproget Markdown til at understøtte semantiske tekstannotationer med fokus på implementering via et Pandoc-filter. Markdown, oprindeligt designet til enkelhed og læsbarhed, mangler indbygget støtte til at knytte semantisk information til specifikke tekststrenge. Projektet implementerer udkastet til Semantic Markdown-specifikationen ved først at fejltolke semantiske annotationer som almindelig Markdown og derefter rense dem via et Lua-baseret Pandoc-filter. Denne tilgang bevarer Markdowns kerneegenskaber samtidig med, at den muliggør beriget semantisk forfatterskab integreret i eksisterende arbejdsgange. Rapporten analyserer Markdown-syntaks, Pandocs abstrakte syntaks-træ og filter-API'en for at opnå denne integration. Fremtidige arbejder sigter mod at forbedre udtrækning og gengivelse af annotationer, hvilket understøtter bedre arbejdsgange for samarbejde og automatiseret dokumentbehandling. ::: # Introduction {{< include _intro.qmd >}} # Markdown syntax for annotations {{< include _markdown.qmd >}} # How Pandoc processes Markdown {#sec-pandoc} {{< include _pandoc.qmd >}} # Pandoc filter `sem-md` {#sec-filter} {{< include _filter.qmd >}} # Evaluation of `sem-md` {{< include _evaluation.qmd >}} # Discussion and Conclusion {{< include _conclusion.qmd >}} # Bibliography {.appendix} \begingroup \raggedright ::: {#refs} ::: \endgroup \appendix # Appendix: Pandoc filter `sem-md` {.appendix} ```{.lua include="sem-md/sem-md.lua" code-line-numbers="true"} ``` # Appendix: Markdown syntax as PEG {.appendix #sec-def-peg} ```{.peg include="syntax/def.peg" code-line-numbers="true"} ``` # Appendix: Markdown syntax as syntax diagrams {.appendix #sec-def-dia} {{< include _syntax.qmd >}}