aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-05-23 19:10:23 +0200
committerJonas Smedegaard <dr@jones.dk>2025-05-23 19:10:23 +0200
commit1a5c98ed1478a271e277cb67533b4f37d7c2650c (patch)
tree7f2469d0e53ca4ca48e40defe70bb3238395d1c3
parentb34efac5a971cfb8a2fb4f4842f9a16b9d036f51 (diff)
git commit -m misc updates
-rw-r--r--Makefile2
-rw-r--r--_conclusion.qmd26
-rw-r--r--_extensions/ruc-play/semantic-markdown/semantic-markdown.lua2
-rw-r--r--_filter.qmd49
-rw-r--r--_pandoc.qmd15
-rw-r--r--ref.bib8
-rw-r--r--report.qmd2
7 files changed, 97 insertions, 7 deletions
diff --git a/Makefile b/Makefile
index 04528a3..0ced723 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ PDF_DOCUMENTS = _site/report.pdf
include _make/*.mk
-DOCUMENT_APPENDIX_REGEX = Pandoc plugin semantic-markdown
+DOCUMENT_APPENDIX_REGEX = Pandoc filter semantic-markdown
FILTER = _extensions/ruc-play/semantic-markdown/semantic-markdown.lua
diff --git a/_conclusion.qmd b/_conclusion.qmd
index a308337..5e7c9ec 100644
--- a/_conclusion.qmd
+++ b/_conclusion.qmd
@@ -2,16 +2,40 @@
## Perspectives
+The central design choice
+of this project being implemented as a filter
+would be interesting to investigate.
+
This work is part of a series of works
to integrate semantic text annotations with creative textual interactions.
Concretely building upon this work are projects extending the same codebase
to support extracting or converting annotations.
-Beyond these concrete projects,
+
+Also,
+beyond this concrete project and its planned extensions,
the ability to author semantic text annotations and process them
allows for improvements in related workflows,
including interactive collaborative authoring
and streamlining of automated document layout.
+### Implementation as import extension
+
+This project has been implementet as a cleanup filter
+for a misparsing of semantic Markdown as regular Markdown
+(see @sec-misparsing).
+This approach was chosen based in an assumption
+on fitting better with existing uses of Pandoc,
+notably integrated with the Quarto framework.
+
+An interesting task would be to challenge that assumption,
+by implementing as a Pandoc import extension
+and comparing usability and efficiency.
+It might also be interesting to try compare code reliability:
+Although the code size might be larger,
+an import extension would potentially contain far less quirks
+due to it fundamentally iterating through well-balanced enclosures
+rather than fixing breakage as the misparser cleanup filter does.
+
### This work as basis of others
Authors interested in authoring with semantic annotations
diff --git a/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua b/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua
index 6c3dc7e..abdb078 100644
--- a/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua
+++ b/_extensions/ruc-play/semantic-markdown/semantic-markdown.lua
@@ -1,4 +1,4 @@
---- semantic-markdown - Pandoc plugin to process semantic hints
+--- semantic-markdown - Pandoc filter to process semantic hints
---
--- SPDX-FileCopyrightText: 2025 Jonas Smedegaard <dr@jones.dk>
--- SPDX-License-Identifier: GPL-3.0-or-later
diff --git a/_filter.qmd b/_filter.qmd
index b7434d4..60359ad 100644
--- a/_filter.qmd
+++ b/_filter.qmd
@@ -1,9 +1,56 @@
*TODO: chapter overview*
+## Misparsing and then cleaning up {#sec-misparsing}
+
+Pandoc offers two ways to implement a syntax extension to Markdown.
+Either an import extension or a filter for its Abtract Syntax Tree (AST).
+This project went with the latter approach,
+which may seem like an odd choice at first.
+
+*TODO: About the approach of parsing as Markdown and adjust the fallout,
+instead of writing an import extension.*
+
+## Choice of Lua
+
+This project is implemented in the scripting language Lua.
+
+Pandoc filters can be written in any general-purpose language.
+Pandoc provides a JSON serialisation and parsing of its AST,
+some filters are written in Haskell using same libraries as Pandoc itself,
+and others are implemented in Python or Perl.
+Pandoc also offers a JSON serialisation/desrialisation interface for its AST
+allowing for even wider creativity in filter implementations.
+
+In recent years, Pandoc has embedded a Lua interpreter
+and offers the alternative of processing Lua filters
+without the need for full serialisation and parsing or for making system calls.
+For one simple test, a Lua implementation had an overhead of 2%
+compared to 35% for a Haskell-based implementation via the JSON interface
+[@MacFarlane2025, section "Introduction"].
+
+While efficiency might be nice,
+user convenience is important in this project.
+General experience using the Pandoc-based Quarto framework indicates,
+that non-Lua filters often require additional quirks
+like placement in a specific directory, adding a symbolic link
+or passing its full path,
+whereas Lua filters usually works provided only a relative path.
+That issue might be specific to the Quarto framework,
+but even that aside, Lua-based filters are quite common
+and the documentation for writing them more detailed
+than the legacy JSON-based interface.
+
+Programming in Lua has been a new experience
+but with a fairly low learning curve,
+after discovering (after a day of debugging) the oddity,
+compared to dynamically typed languages in general,
+that zero is a true value in a boolean context.
+
## Components
*TODO: First parse Namespace blocks, then AnnotationWords*
## Tracking enclosure states
-*TODO: Details of parsing AnnotationWords*
+*TODO: Details of parsing AnnotationWords
+through correlating Pandoc AST with 4 enclosure states*
diff --git a/_pandoc.qmd b/_pandoc.qmd
index f81ea80..6f3653c 100644
--- a/_pandoc.qmd
+++ b/_pandoc.qmd
@@ -1,5 +1,3 @@
-*FIXME: Focus on the oddity of correlating Pandoc AST with 4 enclosure states*
-
*FIXME: This chapter is unfinished --
currently contains 3-4 large chunks (separated by horisontal lines)
that need to be merged or maybe some parts dropped altogether...*
@@ -24,6 +22,19 @@ in the Quarto document publishing system.
----
+reads a text document,
+parses its structural components into an Abstract Syntax Tree (AST),
+and serialises and writes back into a text document.
+The Pandoc AST deliberately prioritises structural information
+and is relaxed about visual information,
+to preserve literal content
+while reducing format-specific stylistic details,
+relevant especially when processing between different formats.
+Most common is to read plaintext Markdown files
+and write LaTeX code further compiled into a PDF file.
+
+----
+
The Markdown processor Pandoc can transform Markdown not only to HTML
but also to other output formats like PDF.
Pandoc offers an API for adapting its content processing
diff --git a/ref.bib b/ref.bib
index f5b29a9..53037b3 100644
--- a/ref.bib
+++ b/ref.bib
@@ -334,6 +334,14 @@
urldate = {2025-05-23},
}
+@Online{MacFarlane2025,
+ author = {John {MacFarlane}},
+ date = {2025-05-17},
+ title = {Pandoc Lua Filters},
+ url = {https://pandoc.org/lua-filters.html},
+ urldate = {2025-05-23},
+}
+
@Comment{jabref-meta: databaseType:biblatex;}
@Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;}
diff --git a/report.qmd b/report.qmd
index 19eb3b9..cdd026e 100644
--- a/report.qmd
+++ b/report.qmd
@@ -112,7 +112,7 @@ are editorial notes not intended for inclusion in the final delivery.*
\appendix
-# Pandoc plugin `semantic-markdown` {.appendix}
+# Pandoc filter `semantic-markdown` {.appendix}
```{.lua include="_extensions/ruc-play/semantic-markdown/semantic-markdown.lua" code-line-numbers="true"}
```