aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-05-26 15:06:51 +0200
committerJonas Smedegaard <dr@jones.dk>2025-05-26 15:06:51 +0200
commit54ccba54a75cf05b800efac1fa63e9a450a71afa (patch)
tree7d3d35159deabbd760251e247db1f5cd16cd8d3e
parent0ebd5db4526e1112a2d2752b23ca1af8fca83af9 (diff)
misc content updates
-rw-r--r--_filter.qmd45
-rw-r--r--_intro.qmd11
-rw-r--r--_markdown.qmd5
-rw-r--r--ref.bib9
4 files changed, 47 insertions, 23 deletions
diff --git a/_filter.qmd b/_filter.qmd
index 051ef18..af38d88 100644
--- a/_filter.qmd
+++ b/_filter.qmd
@@ -7,32 +7,26 @@ either an import extension or a filter for its AST.
This project chose the latter approach,
which may initially seem unusual.
-*TODO: About the approach of parsing as Markdown and adjust the fallout,
-instead of writing an import extension.*
-
As described in @sec-improve,
of priority for this project is to improve existing tools
rather than implement parallel competing ones,
despite the latter potentially being easier to do
or leading to a simpler product.
Pandoc offers an API specifically for custom-implementing a source format
-(see @sec-pandoc-apis).
-In @sec-pandoc-complex
+described in @sec-pandoc-apis,
+but as pointed out in @sec-pandoc-filter-versatile,
+that interface limits uses of the implementation
+when combined with other extensions,
+notably those provided with the Quarto framework.
-*FIXME: tie pieces together, and continue from there
-with the consequence of it was actually tackled.
+Markdown leniently tolerates broken markup (see @sec-spirit) --
+what the parser cannot recognize as markup is simply treated as content.
+This project abuses that feature of Markdown
+to deliberately misparse Semantic Markdown as CommonMark at first,
+and then parse the misparsed content again using the filter API,
+adjusting to the extended syntax.
-<!--
-The easiest way to implement a derivation of Markdown
-is likely by writing a Pandoc import filter,
-but that would limit its usefulness with larger frameworks like Quarto.
-Although they do support alternative source languages,
-important features such as citation handling are far better streamlined
-when using the main and best-supported source format.
-This project therefore implements its deviation of Markdown
-by reading the deviant source as if it were Markdown,
-and then applying a filter to adjust the AST after the deliberate misparsing.
--->
+*TODO: More details...*
## The choice of Lua
@@ -65,9 +59,20 @@ than the legacy JSON-based interface.
## Parsing tasks
-*TODO: First parse Namespace blocks, then AnnotationWords*
+The filter traverses the AST several times.
+First it processes PrefixDefinition blocks,
+and then sifts through all inline content
+cleanup up misparsed KeyWords.
+
+For this Minimum Viable Product (see @sec-phase1),
+dropping unneeded block-level elments before processing inline ones
+is slightly simpler and slightly more efficient.
+More importantly, however,
+is that for future planned works (see @sec-rdf)
+information gathered from PrefixDefinition
+is needed for processing KeyWords.
## Keeping track of enclosure states
-*TODO: Details of parsing AnnotationWords
+*TODO: Details of cleaning up KeyWords
through correlating Pandoc AST with 4 enclosure states*
diff --git a/_intro.qmd b/_intro.qmd
index 3730b43..2ed07de 100644
--- a/_intro.qmd
+++ b/_intro.qmd
@@ -86,7 +86,7 @@ by the following early design decisions:
* The project solely involves freely licensed tools and resources,
and is itself licensed under Free licences that encourage collaboration.
-### Scope limited to use for authoring
+### Scope limited to use for authoring {#sec-phase1}
The scope of this project is to enable annotating while authoring.
Rendering that makes use of annotations is outside this scope,
@@ -104,6 +104,15 @@ as illustrated in @fig-phase1.
### Added syntax in the spirit of Markdown {#sec-spirit}
+Markdown is a lenient markup language.
+Partly this is derived from Markdown being a superset of HTML,
+which is (or was, when Markdown was introduced) based on SGML
+that by design permits parsers to omit markup they cannot handle
+[@Connolly1994, section "SGML as a Layered Communications Medium"].
+Partly it is due to a deliberate design choice,
+as reflected in the quote introducing this paper:
+The format should be easy to both write and read in source form.
+
*FIXME: rewrite as more fluent prose*
In the source format of Markdown with annotations...
diff --git a/_markdown.qmd b/_markdown.qmd
index 10572e1..32934ee 100644
--- a/_markdown.qmd
+++ b/_markdown.qmd
@@ -1,5 +1,5 @@
-This chapter will introduce a grammar
-represented as visual diagrams,
+This chapter will first introduce a grammar
+translated into visual diagrams,
and then provide two analyses
of the markup language Markdown by use of that grammar.
First an analysis of a widely used subset of the language
@@ -141,6 +141,7 @@ including at points with choices.
## Syntax of dialect CommonMark {#sec-commonmark}
+*TODO: is something missing here? Section start oddly?*
More specifically,
the example @fig-hello contains two different types of blocks
diff --git a/ref.bib b/ref.bib
index 8567e06..66e22ff 100644
--- a/ref.bib
+++ b/ref.bib
@@ -379,6 +379,15 @@
urldate = {2025-05-24},
}
+@Online{Connolly1994,
+ author = {Daniel W. Connolly},
+ date = {1994-02-15},
+ title = {Toward a Formalism for Communication On the Web},
+ url = {https://www.w3.org/MarkUp/html-spec/html-essay.html},
+ organization = {{W3C}},
+ urldate = {2025-05-26},
+}
+
@Comment{jabref-meta: databaseType:biblatex;}
@Comment{jabref-meta: fileDirectory-jonas-bastian:/home/jonas/Projects/RUC/LIB/md;}