aboutsummaryrefslogtreecommitdiff
path: root/_filter.qmd
blob: e3372f53a14494237b7db5ccf55627b28fcbf963 (plain)

TODO: chapter overview

Misparsing and then cleaning up {#sec-misparsing}

Pandoc offers two ways to implement a syntax extension to Markdown. Either an import extension or a filter for its Abtract Syntax Tree (AST). This project went with the latter approach, which may seem like an odd choice at first.

TODO: About the approach of parsing as Markdown and adjust the fallout, instead of writing an import extension.

Choice of Lua

This project is implemented in the scripting language Lua.

Pandoc filters can be written in any general-purpose language. Pandoc provides a JSON serialisation and parsing of its AST, some filters are written in Haskell using same libraries as Pandoc itself, and others are implemented in Python or Perl. Pandoc also offers a JSON serialisation/desrialisation interface for its AST allowing for even wider creativity in filter implementations.

In recent years, Pandoc has embedded a Lua interpreter and offers the alternative of processing Lua filters without the need for full serialisation and parsing or for making system calls. For one simple test, a Lua implementation had an overhead of 2% compared to 35% for a Haskell-based implementation via the JSON interface [@MacFarlane2025, section "Introduction"].

While efficiency might be nice, user convenience is important in this project. General experience using the Pandoc-based Quarto framework indicates, that non-Lua filters often require additional quirks like placement in a specific directory, adding a symbolic link or passing its full path, whereas Lua filters usually works provided only a relative path. That issue might be specific to the Quarto framework, but even that aside, Lua-based filters are quite common and the documentation for writing them more detailed than the legacy JSON-based interface.

Components

TODO: First parse Namespace blocks, then AnnotationWords

Tracking enclosure states

TODO: Details of parsing AnnotationWords through correlating Pandoc AST with 4 enclosure states