diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-23 19:10:23 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-23 19:10:23 +0200 |
| commit | 1a5c98ed1478a271e277cb67533b4f37d7c2650c (patch) | |
| tree | 7f2469d0e53ca4ca48e40defe70bb3238395d1c3 | |
| parent | b34efac5a971cfb8a2fb4f4842f9a16b9d036f51 (diff) | |
git commit -m misc updates
| -rw-r--r-- | Makefile | 2 | ||||
| -rw-r--r-- | _conclusion.qmd | 26 | ||||
| -rw-r--r-- | _extensions/ruc-play/semantic-markdown/semantic-markdown.lua | 2 | ||||
| -rw-r--r-- | _filter.qmd | 49 | ||||
| -rw-r--r-- | _pandoc.qmd | 15 | ||||
| -rw-r--r-- | ref.bib | 8 | ||||
| -rw-r--r-- | report.qmd | 2 |
7 files changed, 97 insertions, 7 deletions
@@ -4,7 +4,7 @@ PDF_DOCUMENTS = _site/report.pdf include _make/*.mk -DOCUMENT_APPENDIX_REGEX = Pandoc plugin semantic-markdown +DOCUMENT_APPENDIX_REGEX = Pandoc filter semantic-markdown FILTER = _extensions/ruc-play/semantic-markdown/semantic-markdown.lua diff --git a/_conclusion.qmd b/_conclusion.qmd index a308337..5e7c9ec 100644 --- a/_conclusion.qmd +++ b/_conclusion.qmd @@ -2,16 +2,40 @@ ## Perspectives +The central design choice +of this project being implemented as a filter +would be interesting to investigate. + This work is part of a series of works to integrate semantic text annotations with creative textual interactions. Concretely building upon this work are projects extending the same codebase to support extracting or converting annotations. -Beyond these concrete projects, + +Also, +beyond this concrete project and its planned extensions, the ability to author semantic text annotations and process them allows for improvements in related workflows, including interactive collaborative authoring and streamlining of automated document layout. +### Implementation as import extension + +This project has been implementet as a cleanup filter +for a misparsing of semantic Markdown as regular Markdown +(see @sec-misparsing). +This approach was chosen based in an assumption +on fitting better with existing uses of Pandoc, +notably integrated with the Quarto framework. + +An interesting task would be to challenge that assumption, +by implementing as a Pandoc import extension +and comparing usability and efficiency. +It might also be interesting to try compare code reliability: +Although the code size might be larger, +an import extension would potentially contain far less quirks +due to it fundamentally iterating through well-balanced enclosures +rather than fixing breakage as the misparser cleanup filter does. + ### This work as basis of others Authors interested in authoring with semantic annotations diff --git a/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua b/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua index 6c3dc7e..abdb078 100644 --- a/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua +++ b/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua @@ -1,4 +1,4 @@ ---- semantic-markdown - Pandoc plugin to process semantic hints +--- semantic-markdown - Pandoc filter to process semantic hints --- --- SPDX-FileCopyrightText: 2025 Jonas Smedegaard <dr@jones.dk> --- SPDX-License-Identifier: GPL-3.0-or-later diff --git a/_filter.qmd b/_filter.qmd index b7434d4..60359ad 100644 --- a/_filter.qmd +++ b/_filter.qmd @@ -1,9 +1,56 @@ *TODO: chapter overview* +## Misparsing and then cleaning up {#sec-misparsing} + +Pandoc offers two ways to implement a syntax extension to Markdown. +Either an import extension or a filter for its Abtract Syntax Tree (AST). +This project went with the latter approach, +which may seem like an odd choice at first. + +*TODO: About the approach of parsing as Markdown and adjust the fallout, +instead of writing an import extension.* + +## Choice of Lua + +This project is implemented in the scripting language Lua. + +Pandoc filters can be written in any general-purpose language. +Pandoc provides a JSON serialisation and parsing of its AST, +some filters are written in Haskell using same libraries as Pandoc itself, +and others are implemented in Python or Perl. +Pandoc also offers a JSON serialisation/desrialisation interface for its AST +allowing for even wider creativity in filter implementations. + +In recent years, Pandoc has embedded a Lua interpreter +and offers the alternative of processing Lua filters +without the need for full serialisation and parsing or for making system calls. +For one simple test, a Lua implementation had an overhead of 2% +compared to 35% for a Haskell-based implementation via the JSON interface +[@MacFarlane2025, section "Introduction"]. + +While efficiency might be nice, +user convenience is important in this project. +General experience using the Pandoc-based Quarto framework indicates, +that non-Lua filters often require additional quirks +like placement in a specific directory, adding a symbolic link +or passing its full path, +whereas Lua filters usually works provided only a relative path. +That issue might be specific to the Quarto framework, +but even that aside, Lua-based filters are quite common +and the documentation for writing them more detailed +than the legacy JSON-based interface. + +Programming in Lua has been a new experience +but with a fairly low learning curve, +after discovering (after a day of debugging) the oddity, +compared to dynamically typed languages in general, +that zero is a true value in a boolean context. + ## Components *TODO: First parse Namespace blocks, then AnnotationWords* ## Tracking enclosure states -*TODO: Details of parsing AnnotationWords* +*TODO: Details of parsing AnnotationWords +through correlating Pandoc AST with 4 enclosure states* diff --git a/_pandoc.qmd b/_pandoc.qmd index f81ea80..6f3653c 100644 --- a/_pandoc.qmd +++ b/_pandoc.qmd @@ -1,5 +1,3 @@ -*FIXME: Focus on the oddity of correlating Pandoc AST with 4 enclosure states* - *FIXME: This chapter is unfinished -- currently contains 3-4 large chunks (separated by horisontal lines) that need to be merged or maybe some parts dropped altogether...* @@ -24,6 +22,19 @@ in the Quarto document publishing system. ---- +reads a text document, +parses its structural components into an Abstract Syntax Tree (AST), +and serialises and writes back into a text document. +The Pandoc AST deliberately prioritises structural information +and is relaxed about visual information, +to preserve literal content +while reducing format-specific stylistic details, +relevant especially when processing between different formats. +Most common is to read plaintext Markdown files +and write LaTeX code further compiled into a PDF file. + +---- + The Markdown processor Pandoc can transform Markdown not only to HTML but also to other output formats like PDF. Pandoc offers an API for adapting its content processing @@ -334,6 +334,14 @@ urldate = {2025-05-23}, } +@Online{MacFarlane2025, + author = {John {MacFarlane}}, + date = {2025-05-17}, + title = {Pandoc Lua Filters}, + url = {https://pandoc.org/lua-filters.html}, + urldate = {2025-05-23}, +} + @Comment{jabref-meta: databaseType:biblatex;} @Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;} @@ -112,7 +112,7 @@ are editorial notes not intended for inclusion in the final delivery.* \appendix -# Pandoc plugin `semantic-markdown` {.appendix} +# Pandoc filter `semantic-markdown` {.appendix} ```{.lua include="_extensions/ruc-play/semantic-markdown/semantic-markdown.lua" code-line-numbers="true"} ``` |
