aboutsummaryrefslogtreecommitdiff
path: root/_pandoc.qmd
diff options
context:
space:
mode:
Diffstat (limited to '_pandoc.qmd')
-rw-r--r--_pandoc.qmd110
1 files changed, 35 insertions, 75 deletions
diff --git a/_pandoc.qmd b/_pandoc.qmd
index 95ba8e3..5909818 100644
--- a/_pandoc.qmd
+++ b/_pandoc.qmd
@@ -1,69 +1,12 @@
-*FIXME: This chapter is unfinished --
-currently contains 3-4 large chunks (separated by horisontal lines)
-that need to be merged or maybe some parts dropped altogether...*
-This chapter will provide an analysis
-of the Markdown processor Pandoc.
-
-Many dialects of Markdown have evolved,
-some tightening the language for parsing efficiency and disambiguation,
-some extending to cover additional structures,
-and some including support for a metadata header section.
-
-Pandoc is a tool that can convert texts in Markdown dialects
-into many document formats including HTML and (via LaTeX) PDF,
-applying visual style and positioning throught templates.
-Such document workflows,
-including minimal structural markup as part of the creative writing,
-applying visual layout as an automated templating process
-tied to a target document format,
-have been further streamlined for academic texts
-in the Quarto document publishing system.
-
-----
-
-reads a text document,
-parses its structural components into an AST,
-and serialises and writes back into a text document.
-The Pandoc AST deliberately prioritises structural information
-and is relaxed about visual information,
-to preserve literal content
-while reducing format-specific stylistic details,
-relevant especially when processing between different formats.
-The most common use
-is to read plaintext Markdown files and write LaTeX code,
-which is then compiled into a PDF file.
-
-----
-
-The Markdown processor Pandoc can transform Markdown not only to HTML
-but also to other output formats like PDF.
-Pandoc offers an API for adapting its content processing
-as well as a templating structure for customizing layout,
-which is streamlined in the document authoring framework Quarto:
-Pandoc with a set of plugins and templates
-enables rendering of scholarly papers
-conforming to prescribed style guides and document formats.
-
----
-
-Pandoc is a document converter built around the Markdown markup language,
+Pandoc is a versatile and extensible document converter,
able to parse from and serialise to many Markdown dialects
as well as equivalent subsets of other text markup languages
-including HTML, LaTeX (and by extension PDF),
-Office Open XML (as used by recent releases of Microsoft Word),
-and OpenDocument (as used by OpenOffice and LibreOffice).
-
-Pandoc supports redefining input and output formats
-and manipulating the internal document structure
-as part of the automated parts of the framework.
+including HTML and (via LaTeX or other intermediaries) PDF.
+Pandoc offers several APIs for adapting its content processing
+as well as a templating structure for styling of output.
-Pandoc is extensible.
-Source and output format can be changed or completely redefined
-and the internal document structure manipulated,
-in the automated parts of the framework.
-
-Collection of interrelated POSIX scripts and Pandoc extensions
-for enabling semantic annotations in Markdown-based authoring workflows.
+This chapter provides an analysis of Pandoc,
+with a focus on ways to extend it to support a new Markdown extension.
## An AST in the spirit of Markdown
@@ -146,17 +89,34 @@ The author may then need to violate the separation of concerns
often done by adding compositional information
to the bibliographic metadata.
-Pandoc generally maintain a separation of concern
+## The AST filter API is effectively more versatile {#sec-pandoc-filter-versatile}
+
+Pandoc strives to maintain a separation of concern
between content, structure and layout,
-in a workflow of isolated functions and APIs
-for import, filtering and export,
-but cannot guarantee this separation.
-While Pandoc offers an import extension API
-for adding support for new source formats,
-the formats implemented in Pandoc itself are maintained in sync
-with other parts of the workflow,
-ensuring a tighter degree of coordination than the API can offer.
-Such potential for the Pandoc import API
-being inferior to builtin source format parsers
-has influenced the choice of implementation for this project,
+offering separate APIs for import, filtering and export,
+but since some functionality cross those boundaries,
+this separation is not guaranteed,
+and complex features are more likely to work correctly
+with the most commonly used Reader and Writer components.
+
+This is exacerbated when using wrapper tools
+since their extensions are likely tailored to those same
+most commonly used Reader and Writer components.
+The Quarto framework, specifically,
+allows using any Reader same as Pandoc itself,
+but only when using default Reader --
+a custom derivative of the Pandoc default Markdown Reader --
+do you benefit from Quarto-specific macro expansions.
+
+Despite the Reader API being specifically tailored
+for implementing a custom source format parser,
+when integration with complex functions is relevant,
+either within Pandoc itself or part of wrapper tools,
+that API effectively is discouraged.
+
+Since a core principle of Markdown is treat unknown markup as content,
+a viable alternative for an extended Markdown format
+is to use a simpler Markdown Reader and then "finish up" the parsing
+in a filter.
+This is the approach chosen for this project,
as covered next in @sec-filter.