From 54ccba54a75cf05b800efac1fa63e9a450a71afa Mon Sep 17 00:00:00 2001 From: Jonas Smedegaard Date: Mon, 26 May 2025 15:06:51 +0200 Subject: misc content updates --- _filter.qmd | 45 +++++++++++++++++++++++++-------------------- _intro.qmd | 11 ++++++++++- _markdown.qmd | 5 +++-- ref.bib | 9 +++++++++ 4 files changed, 47 insertions(+), 23 deletions(-) diff --git a/_filter.qmd b/_filter.qmd index 051ef18..af38d88 100644 --- a/_filter.qmd +++ b/_filter.qmd @@ -7,32 +7,26 @@ either an import extension or a filter for its AST. This project chose the latter approach, which may initially seem unusual. -*TODO: About the approach of parsing as Markdown and adjust the fallout, -instead of writing an import extension.* - As described in @sec-improve, of priority for this project is to improve existing tools rather than implement parallel competing ones, despite the latter potentially being easier to do or leading to a simpler product. Pandoc offers an API specifically for custom-implementing a source format -(see @sec-pandoc-apis). -In @sec-pandoc-complex +described in @sec-pandoc-apis, +but as pointed out in @sec-pandoc-filter-versatile, +that interface limits uses of the implementation +when combined with other extensions, +notably those provided with the Quarto framework. -*FIXME: tie pieces together, and continue from there -with the consequence of it was actually tackled. +Markdown leniently tolerates broken markup (see @sec-spirit) -- +what the parser cannot recognize as markup is simply treated as content. +This project abuses that feature of Markdown +to deliberately misparse Semantic Markdown as CommonMark at first, +and then parse the misparsed content again using the filter API, +adjusting to the extended syntax. - +*TODO: More details...* ## The choice of Lua @@ -65,9 +59,20 @@ than the legacy JSON-based interface. ## Parsing tasks -*TODO: First parse Namespace blocks, then AnnotationWords* +The filter traverses the AST several times. +First it processes PrefixDefinition blocks, +and then sifts through all inline content +cleanup up misparsed KeyWords. + +For this Minimum Viable Product (see @sec-phase1), +dropping unneeded block-level elments before processing inline ones +is slightly simpler and slightly more efficient. +More importantly, however, +is that for future planned works (see @sec-rdf) +information gathered from PrefixDefinition +is needed for processing KeyWords. ## Keeping track of enclosure states -*TODO: Details of parsing AnnotationWords +*TODO: Details of cleaning up KeyWords through correlating Pandoc AST with 4 enclosure states* diff --git a/_intro.qmd b/_intro.qmd index 3730b43..2ed07de 100644 --- a/_intro.qmd +++ b/_intro.qmd @@ -86,7 +86,7 @@ by the following early design decisions: * The project solely involves freely licensed tools and resources, and is itself licensed under Free licences that encourage collaboration. -### Scope limited to use for authoring +### Scope limited to use for authoring {#sec-phase1} The scope of this project is to enable annotating while authoring. Rendering that makes use of annotations is outside this scope, @@ -104,6 +104,15 @@ as illustrated in @fig-phase1. ### Added syntax in the spirit of Markdown {#sec-spirit} +Markdown is a lenient markup language. +Partly this is derived from Markdown being a superset of HTML, +which is (or was, when Markdown was introduced) based on SGML +that by design permits parsers to omit markup they cannot handle +[@Connolly1994, section "SGML as a Layered Communications Medium"]. +Partly it is due to a deliberate design choice, +as reflected in the quote introducing this paper: +The format should be easy to both write and read in source form. + *FIXME: rewrite as more fluent prose* In the source format of Markdown with annotations... diff --git a/_markdown.qmd b/_markdown.qmd index 10572e1..32934ee 100644 --- a/_markdown.qmd +++ b/_markdown.qmd @@ -1,5 +1,5 @@ -This chapter will introduce a grammar -represented as visual diagrams, +This chapter will first introduce a grammar +translated into visual diagrams, and then provide two analyses of the markup language Markdown by use of that grammar. First an analysis of a widely used subset of the language @@ -141,6 +141,7 @@ including at points with choices. ## Syntax of dialect CommonMark {#sec-commonmark} +*TODO: is something missing here? Section start oddly?* More specifically, the example @fig-hello contains two different types of blocks diff --git a/ref.bib b/ref.bib index 8567e06..66e22ff 100644 --- a/ref.bib +++ b/ref.bib @@ -379,6 +379,15 @@ urldate = {2025-05-24}, } +@Online{Connolly1994, + author = {Daniel W. Connolly}, + date = {1994-02-15}, + title = {Toward a Formalism for Communication On the Web}, + url = {https://www.w3.org/MarkUp/html-spec/html-essay.html}, + organization = {{W3C}}, + urldate = {2025-05-26}, +} + @Comment{jabref-meta: databaseType:biblatex;} @Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;} -- cgit v1.2.3