aboutsummaryrefslogtreecommitdiff
path: root/report.qmd
blob: a3ba31537080e3ac59282f7aa9a6929b52b419b4 (plain)

title: | Processing Semantic Text Annotations in Markdown subtitle: Semantic Markdown Implemented as a Pandoc Filter

date: 2025-05-27

toc-depth: 2

format: stylish-report-pdf: pdfversion: "2.0" pdfstandard: [A-4f, UA-2]

generate Well-Tagged PDF using tag tagging

requires documentmetadatasupport 1.0n (2025-03-25)

tagging: on

else generate Well-Tagged PDF using latest testphase components

https://latex3.github.io/tagging-project/documentation/prototype-usage-instructions

requires documentmetadatasupport 1.0k (2024-12-21)

pdftestphase: latest

fail on error generating Well-Tagged PDF

pdftestphasestrict: true

debugging of Well-Tagged PDF

pdfdebug: ["para"]

metadata-files:

  • _actors.yml keywords:
  • Markdown
  • CommoMmark
  • Lua
  • pandoc
  • semantic authoring
  • semantic annotation

link-citations: true bibliography: ref.bib csl: apa resource-path:

  • /usr/share/citation-style-language/styles

filters:

  • en2em
  • nobreaks
  • include-code-files

include-in-header:

add alt text to embedded image, to aid non-visual rendering

  • text: | \renewcommand{\doclicenseImage}[1][]{% \setkeys{doclicense}{#1} \href{\doclicenseURL}{% \includegraphics[ alt=\doclicenseText% width=\doclicense@imagewidth% ]{\doclicenseImageFileName}% } }

fix british spelling of "licence", and append link to published source

  • text: | \makeatletter @namedef{doclicense@lang@word@license}{ licence.\ Source is available at \href{https://source.jones.dk/sem-md}{https://source.jones.dk/sem-md} and decentrally with \href{https://app.radicle.xyz/nodes/seed.radicle.garden/rad:z3ckfAxH326pgPFKe7rDYJ4mxBoWo}{% rad:z3ckfAxH326pgPFKe7rDYJ4mxBoWo}% } \makeatother

#keep-tex: true

Abstract

This report explores the extension of the Markdown markup language to support semantic text annotations, focusing on implementation via a Pandoc filter. Markdown, originally designed for simplicity and readability, lacks native support for tying semantic information to specific text strings. The project implements the Semantic Markdown draft specification by misparsing semantic annotations as regular Markdown and subsequently cleaning them up through a Lua-based Pandoc filter. This approach preserves Markdown's core qualities while enabling enriched semantic authoring integrated with existing workflows. The report analyses Markdown syntax, the Pandoc Abstract Syntax Tree, and the filter API to achieve this integration. Future work aims to enhance annotation extraction and rendering, facilitating improved workflows for collaborative and automated document processing.

::: {lang=da}

Denne rapport undersøger udvidelsen af markup-sproget Markdown til at understøtte semantiske tekstannotationer med fokus på implementering via et Pandoc-filter. Markdown, oprindeligt designet til enkelhed og læsbarhed, mangler indbygget støtte til at knytte semantisk information til specifikke tekststrenge. Projektet implementerer udkastet til Semantic Markdown-specifikationen ved først at fejltolke semantiske annotationer som almindelig Markdown og derefter rense dem via et Lua-baseret Pandoc-filter. Denne tilgang bevarer Markdowns kerneegenskaber samtidig med, at den muliggør beriget semantisk forfatterskab integreret i eksisterende arbejdsgange. Rapporten analyserer Markdown-syntaks, Pandocs abstrakte syntaks-træ og filter-API'en for at opnå denne integration. Fremtidige arbejder sigter mod at forbedre udtrækning og gengivelse af annotationer, hvilket understøtter bedre arbejdsgange for samarbejde og automatiseret dokumentbehandling.

:::

Introduction

{{< include _intro.qmd >}}

Markdown syntax for annotations

{{< include _markdown.qmd >}}

How Pandoc processes Markdown {#sec-pandoc}

{{< include _pandoc.qmd >}}

Pandoc filter sem-md {#sec-filter}

{{< include _filter.qmd >}}

Evaluation of sem-md

{{< include _evaluation.qmd >}}

Discussion and Conclusion

{{< include _conclusion.qmd >}}

Bibliography {.appendix}

\begingroup \raggedright ::: {#refs} ::: \endgroup

\appendix

Appendix: Pandoc filter sem-md {.appendix}

Appendix: Markdown syntax as PEG {.appendix #sec-def-peg}

Appendix: Markdown syntax as syntax diagrams {.appendix #sec-def-dia}

{{< include _syntax.qmd >}}