diff options
| -rw-r--r-- | _filter.qmd | 10 | ||||
| -rw-r--r-- | _pandoc.qmd | 70 |
2 files changed, 74 insertions, 6 deletions
diff --git a/_filter.qmd b/_filter.qmd index af38d88..2e6b785 100644 --- a/_filter.qmd +++ b/_filter.qmd @@ -74,5 +74,13 @@ is needed for processing KeyWords. ## Keeping track of enclosure states +The main part of the filter is the callback function `Statements()`, +called by Pandoc for each block of the AST. +The function iterates through each content element of the block, +keeping track of whether it is free from annotation-related enclosures, +or are in an enclosure for the content part or the annotation part. + + + *TODO: Details of cleaning up KeyWords -through correlating Pandoc AST with 4 enclosure states* +through correlating Pandoc AST with 5 enclosure states* diff --git a/_pandoc.qmd b/_pandoc.qmd index 5909818..e6595de 100644 --- a/_pandoc.qmd +++ b/_pandoc.qmd @@ -8,10 +8,11 @@ as well as a templating structure for styling of output. This chapter provides an analysis of Pandoc, with a focus on ways to extend it to support a new Markdown extension. -## An AST in the spirit of Markdown +## Separation of concerns The data processing workflow of Pandoc -is shaped in the spirit of Markdown. +is shaped in the spirit of Markdown +(see @sec-spirit). The Pandoc AST evolved from the needs of tracking Markdown. It mainly keeps track of structure of content, @@ -38,7 +39,7 @@ The styling of the target document is isolated as one or more templates. Parsing of source content and rendering of target document are both isolated as sets of Reader and Writer routines. All three -- -template, Reader and Writer -- +template, import and export -- can be superseded at runtime, either by passing command-line options or by declaring options in document-wide metadata in the source file @@ -48,7 +49,7 @@ or by declaring options in document-wide metadata in the source file "**custom** import -> AST, { AST, **custom** template } -> **custom** export" customizable workflow components* -The AST itself is also exposed as a so-called filter API, +The AST itself is also exposed through a filter API, allowing for intervening in the data processing workflow after source import and before target export, by directly mangling the AST. @@ -57,6 +58,62 @@ by directly mangling the AST. "import -> AST -> **filter**; { **filter**, template } -> export" workflow components with a filter applied* +## The AST segments content into smallest type {#sec-pandoc-ast} + +The Pandoc AST contains objects, +each representing an element of the content -- +some object types containing lists of other element objects, +and some themselves holding a string of content. + +A Reader function breaks content into element objects, +e.g. the "Hello, world! example @fig-hello +(ignoring the metadata part) +gets broken down into these four elements: + +* `Pandoc` block, containing two block objects + * `Header` block, containing one inline objects + * `Str` inline, representing the string "Greeting" + * `Para` block, containing three inline objects + * `Str` inline, representing the string "Hello," + * `Space` inline, repreenting the string " " + * `Str` inline, representing the string "world!" + +This structure has similar traits +as as the Document Object Model (DOM) for web pages +(which is not surprising, +given that Markdown is a superset of HTML). +Breaking inline strings at each space +is aligned with HTML behaviour for spaces, +where a line break requires explicit markup +and any amount of simple space is treated as one soft-break, +which means that it behaves as either a single word delimiting space +or as a line break if needed by the rendering medium. +Grouping content into sets of inline elements within block elements +is aligned with HTML distinction between blocks and inlines, +where content is either part of a relative loose string +flowing within the constraints of a block +(inline elements) +or itself a block (block elements). + +*FIXME: below should cover +how import and filter APIs ideally and possibly interact with the AST* + +The Pandoc filter API, +which exposes the full AST, +is efficient and intuitive for example to traverse all blocks +(omitting inlines) and capitalize each word within each Headline, +or to only traverse inlines and replace "http://" with "https://" +in the link target of all link elements. + +Since the full AST is available, +the filter API can also be used to reorganise content, +e.g. to redefine plain tekst as belonging to a link, +or vice versa, +but if content is "put into the wrong boxes" +(for example if parsed wrongly as is done in this project), +then the organisation of the AST +can become more of an obstacle than an optimization. + ## Complex features {#sec-pandoc-complex} Some functionalities of Pandoc hook into multiple components @@ -89,7 +146,7 @@ The author may then need to violate the separation of concerns often done by adding compositional information to the bibliographic metadata. -## The AST filter API is effectively more versatile {#sec-pandoc-filter-versatile} +## The filter API is either efficient or versatile {#sec-pandoc-filter-versatile} Pandoc strives to maintain a separation of concern between content, structure and layout, @@ -120,3 +177,6 @@ is to use a simpler Markdown Reader and then "finish up" the parsing in a filter. This is the approach chosen for this project, as covered next in @sec-filter. + +*FIXME: also mention (reiterate) when wrapping up above +that the filter API is more cumbersome* |
