diff options
Diffstat (limited to '_pandoc.qmd')
| -rw-r--r-- | _pandoc.qmd | 110 |
1 files changed, 35 insertions, 75 deletions
diff --git a/_pandoc.qmd b/_pandoc.qmd index 95ba8e3..5909818 100644 --- a/_pandoc.qmd +++ b/_pandoc.qmd @@ -1,69 +1,12 @@ -*FIXME: This chapter is unfinished -- -currently contains 3-4 large chunks (separated by horisontal lines) -that need to be merged or maybe some parts dropped altogether...* -This chapter will provide an analysis -of the Markdown processor Pandoc. - -Many dialects of Markdown have evolved, -some tightening the language for parsing efficiency and disambiguation, -some extending to cover additional structures, -and some including support for a metadata header section. - -Pandoc is a tool that can convert texts in Markdown dialects -into many document formats including HTML and (via LaTeX) PDF, -applying visual style and positioning throught templates. -Such document workflows, -including minimal structural markup as part of the creative writing, -applying visual layout as an automated templating process -tied to a target document format, -have been further streamlined for academic texts -in the Quarto document publishing system. - ----- - -reads a text document, -parses its structural components into an AST, -and serialises and writes back into a text document. -The Pandoc AST deliberately prioritises structural information -and is relaxed about visual information, -to preserve literal content -while reducing format-specific stylistic details, -relevant especially when processing between different formats. -The most common use -is to read plaintext Markdown files and write LaTeX code, -which is then compiled into a PDF file. - ----- - -The Markdown processor Pandoc can transform Markdown not only to HTML -but also to other output formats like PDF. -Pandoc offers an API for adapting its content processing -as well as a templating structure for customizing layout, -which is streamlined in the document authoring framework Quarto: -Pandoc with a set of plugins and templates -enables rendering of scholarly papers -conforming to prescribed style guides and document formats. - ---- - -Pandoc is a document converter built around the Markdown markup language, +Pandoc is a versatile and extensible document converter, able to parse from and serialise to many Markdown dialects as well as equivalent subsets of other text markup languages -including HTML, LaTeX (and by extension PDF), -Office Open XML (as used by recent releases of Microsoft Word), -and OpenDocument (as used by OpenOffice and LibreOffice). - -Pandoc supports redefining input and output formats -and manipulating the internal document structure -as part of the automated parts of the framework. +including HTML and (via LaTeX or other intermediaries) PDF. +Pandoc offers several APIs for adapting its content processing +as well as a templating structure for styling of output. -Pandoc is extensible. -Source and output format can be changed or completely redefined -and the internal document structure manipulated, -in the automated parts of the framework. - -Collection of interrelated POSIX scripts and Pandoc extensions -for enabling semantic annotations in Markdown-based authoring workflows. +This chapter provides an analysis of Pandoc, +with a focus on ways to extend it to support a new Markdown extension. ## An AST in the spirit of Markdown @@ -146,17 +89,34 @@ The author may then need to violate the separation of concerns often done by adding compositional information to the bibliographic metadata. -Pandoc generally maintain a separation of concern +## The AST filter API is effectively more versatile {#sec-pandoc-filter-versatile} + +Pandoc strives to maintain a separation of concern between content, structure and layout, -in a workflow of isolated functions and APIs -for import, filtering and export, -but cannot guarantee this separation. -While Pandoc offers an import extension API -for adding support for new source formats, -the formats implemented in Pandoc itself are maintained in sync -with other parts of the workflow, -ensuring a tighter degree of coordination than the API can offer. -Such potential for the Pandoc import API -being inferior to builtin source format parsers -has influenced the choice of implementation for this project, +offering separate APIs for import, filtering and export, +but since some functionality cross those boundaries, +this separation is not guaranteed, +and complex features are more likely to work correctly +with the most commonly used Reader and Writer components. + +This is exacerbated when using wrapper tools +since their extensions are likely tailored to those same +most commonly used Reader and Writer components. +The Quarto framework, specifically, +allows using any Reader same as Pandoc itself, +but only when using default Reader -- +a custom derivative of the Pandoc default Markdown Reader -- +do you benefit from Quarto-specific macro expansions. + +Despite the Reader API being specifically tailored +for implementing a custom source format parser, +when integration with complex functions is relevant, +either within Pandoc itself or part of wrapper tools, +that API effectively is discouraged. + +Since a core principle of Markdown is treat unknown markup as content, +a viable alternative for an extended Markdown format +is to use a simpler Markdown Reader and then "finish up" the parsing +in a filter. +This is the approach chosen for this project, as covered next in @sec-filter. |
