aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-05-24 12:48:58 +0200
committerJonas Smedegaard <dr@jones.dk>2025-05-24 12:48:58 +0200
commit642c7c79b20028631cda30ef7562b22455a21231 (patch)
treea3a2f56d5bba8d208bb3b044d612f1a2f05671f3
parent77739f570ef39d9047232b6bb23cb59082480fa8 (diff)
tightne british english grammar
-rw-r--r--_filter.qmd16
-rw-r--r--_intro.qmd54
-rw-r--r--_markdown.qmd29
-rw-r--r--_pandoc.qmd15
-rw-r--r--report.qmd4
5 files changed, 60 insertions, 58 deletions
diff --git a/_filter.qmd b/_filter.qmd
index 10fc93c..b746141 100644
--- a/_filter.qmd
+++ b/_filter.qmd
@@ -2,10 +2,10 @@
## Misparsing and then cleaning up {#sec-misparsing}
-Pandoc offers two ways to implement a syntax extension to Markdown.
-Either an import extension or a filter for its AST.
-This project went with the latter approach,
-which may seem like an odd choice at first.
+Pandoc offers two ways to implement a syntax extension to Markdown:
+either an import extension or a filter for its AST.
+This project chose the latter approach,
+which may initially seem unusual.
*TODO: About the approach of parsing as Markdown and adjust the fallout,
instead of writing an import extension.*
@@ -28,13 +28,13 @@ For one simple test, a Lua implementation had an overhead of 2%
compared to 35% for a Haskell-based implementation via the JSON interface
[@MacFarlane2025, section "Introduction"].
-While efficiency might be nice,
-user convenience is important in this project.
-General experience using the Pandoc-based Quarto framework indicates,
+While efficiency is desirable,
+user convenience is prioritised in this project.
+General experience using the Pandoc-based Quarto framework indicates
that non-Lua filters often require additional quirks
like placement in a specific directory, adding a symbolic link
or passing its full path,
-whereas Lua filters usually works provided only a relative path.
+whereas Lua filters usually work provided only a relative path.
That issue might be specific to the Quarto framework,
but even that aside, Lua-based filters are quite common
and the documentation for writing them more detailed
diff --git a/_intro.qmd b/_intro.qmd
index c39a6cd..ad0d5e4 100644
--- a/_intro.qmd
+++ b/_intro.qmd
@@ -8,9 +8,9 @@ The markup language Markdown was introduced in 2004
with the specific aim of helping authors focus on content,
separate from layout concerns
[@Gruber2004].
-It has since then been widely adopted as authoring source format
-for generating reading-optimized documents in HTML and PDF
-that in the same time period have evolved
+It has since been widely adopted as an authoring source format
+for generating reading-optimised documents in HTML and PDF,
+which over the same time period have evolved
to support semantic text annotations.
But Markdown can only express structure and hypermedia annotations,
not semantic annotations.
@@ -27,16 +27,16 @@ Semantic text annotation
is the process of applying information about meaning
to specific text strings.
Some Markdown dialects support prepending a metadata section
-to the whole content markup,
-i.e. semantic *document* annotation,
+to the whole content markup
+(i.e., semantic *document* annotation),
but tying semantics to specific strings is currently unsupported.
The author cannot express "this currency amount is in 1980 dollars"
or "this uses the derogatory meaning of the term".
-Few years ago a shoutout was made on a blog
-for extending Markdown to cover semantic text annotations
+A few years ago, a call was made on a blog
+to extend Markdown to cover semantic text annotations
[@Francart2020].
-This lead to a draft specification called "Semantic Markdown"
+This led to a draft specification called "Semantic Markdown"
[@Smedegaard2022];
no actual implementation was made at the time, however.
@@ -50,20 +50,20 @@ by way of a Pandoc filter to handle semantic text annotations.
## Problem formulation
So,
-**How can Pandoc be extended to support semantic text annotations?**
+**how can Pandoc be extended to support semantic text annotations?**
* What are the core qualities of Markdown,
and how can a Markdown dialect express semantic text annotations
while maintaining those qualities?
* *TODO: Analysis of spec, phrased as a question*
-* How do Pandoc parse Markdown into a generalised content structure,
+* How does Pandoc parse Markdown into a generalised content structure,
and how can that process be extended
to handle semantic text annotations?
* Which approach to extending Pandoc
- is more likely long-term sustainable?
+ is more likely to be sustainable in the long term?
* How can the reliability of a Pandoc extension be evaluated?
-*TODO: Maybe reframe as investigation of the spec,
+*TODO: Maybe reframe as an investigation of the spec,
implementing it to identify its strengths and weaknesses*
*FIXME: align above questions with actual findings*
@@ -71,7 +71,7 @@ implementing it to identify its strengths and weaknesses*
## Project constraints
Driven by an interest in sustained development of this research project
-well beyond the delivery covered in this paper,
+beyond the scope of this paper,
the project has been voluntarily constrained
by the following early design decisions:
@@ -79,24 +79,24 @@ by the following early design decisions:
as a minimum viable product,
with additional features planned as separate future works.
* Programming language and integration design
- is largely dictated by existing actively used systems,
+ are largely dictated by existing actively used systems,
rather than convenience of personal familiarity or efficiency.
* The project solely involves freely licensed tools and resources,
- and is itself licensed under collaboratively incentivising free licences.
+ and is itself licensed under Free licences that encourage collaboration.
### Scope limited to authoring
The scope of this project is to enable authoring,
-with a further aim of later extending to more complex processing.
+with a further aim of extending to more complex processing in the future.
The primary aim is to enable authors to annotate semantics
-as integral part of their creating writing process.
+as integral part of their creative writing process.
Future works may expand on this,
enhancing renderings of the authored content,
as discussed in @sec-rdf.
The idea is to introduce semantic text annotation to Markdown authoring
-without disruptions in the Markdown-based workflows.
+without disruptions to the Markdown-based workflows.
A filter is added to the document generation workflow
that strips semantic text annotations from the Markdown content,
so that further processing need not know about the new markup,
@@ -104,13 +104,13 @@ as illustrated in @fig-phase1.
![A filter to strip annotations from content.](workflow/phase1.svg){#fig-phase1}
-### Evolve accepted formats and tools
+### Evolving accepted formats and tools
Markdown is a widely adopted authoring format,
and Pandoc a widely adopted Markdown processor.
It is easier to invent a new format in a new tool
-than convincing widely adopted ones to evolve,
-but the latter is, if succesful, likely far more reliable.
+than to convince widely adopted ones to evolve;
+however,the latter, if successful, is likely to be far more reliable.
This project aims at introducing new syntax
while staying close to existing Markdown,
@@ -119,24 +119,24 @@ unlike e.g. SAM that deviates notably from Markdown
Many Markdown processors exist,
but few are in widespread use.
-Pandoc is widely adopted and in recent times integrated
+Pandoc is widely adopted and has in recent times been integrated
with R Markdown and the Quarto document publishing system
to streamline the production of academic papers.
The easiest way to implement a derivation of Markdown
is likely by writing a Pandoc import filter,
-but that would limit its usefulness with larger frameworks like Quarto:
+but that would limit its usefulness with larger frameworks like Quarto.
Although they do support alternative source languages,
-important features like citation handling is far better streamlined
-when using the main and best supported source format.
+important features such as citation handling are far better streamlined
+when using the main and best-supported source format.
This project therefore implements its deviation of Markdown
-by reading the deviant source as if it was Markdown,
+by reading the deviant source as if it were Markdown,
and then applying a filter to adjust the AST after the deliberate misparsing.
### Collaborative licensing
To encourage collaboration and stimulate a circular gift economy
as introduced by @Mikkelsen2000,
-this project solely uses freely licensed tools and resources,
+this project exclusively uses freely licensed tools and resources,
and the project itself is copyleft licensed:
Code parts are licensed
under the GNU Public Licence version 3 or newer,
diff --git a/_markdown.qmd b/_markdown.qmd
index ac75d96..928adac 100644
--- a/_markdown.qmd
+++ b/_markdown.qmd
@@ -5,11 +5,11 @@ that is relevant for annotation,
and then an analysis of a not yet used subset
proposed specifically for the intent of marking up annotations.
-Ahead of these analyses,
-it helps to define terminology of "dialect" and "extension":
-Markdown is not a single strictly defined language,
-but a range of slightly varying languages
-all derived from the same slightly ambiguous original specification.
+Before these analyses,
+it is helpful to define terminology of "dialect" and "extension":
+Markdown is not a single, strictly defined language
+but rather a range of slightly varying languages,
+all derived from the same somewhat ambiguous original specification.
One way to approach this variability,
used among other places in the documentation of the Markdown processor Pandoc
and reused here,
@@ -95,14 +95,13 @@ to be able to cover Markdown
The PEG grammar covering all syntax diagrams shown here
is included as [Appendix @sec-def-peg].
-Words are sets of printable characters
-(including punctuation and other printable characters).
+Words are sets of printable characters, including punctuation.
They can be styled,
have a hyperlink annotated
and have CSS structure and styling annotated.
Each set can contain each other,
or a set of plain words.
-Se @fig-def-words for their syntax diagrams.
+See @fig-def-words for their syntax diagrams.
::: {#fig-def-words}
@@ -186,28 +185,30 @@ Syntax of `SemWords`, `Curie`, `SEMPREFIX` and `NAME`.
*FIXME: write this!*
+<!--
* hvad skal med
* hvad skal ikke med
* hvis PDF..., hvad så?
Dette afsnit udgør "requirements"
+-->
### Readability
In the source format of Markdown with annotations...
-* the syntax for annotation **should** be relatively human comprehensible
- (in the same spirit as Markdown -- e.g. \*\*strong emphasis\*\*)
+* the syntax for annotation **should** be relatively comprehensible to humans
+ (in the same spirit as Markdown -- for example, \*\*strong emphasis\*\*)
* annotation syntax **must not** conflict with core Markdown
- (i.e. must not cause disambiguations)
-* annotation syntaxt **should not** conflict with other Markdown extensions
+ (i.e. must not cause ambiguities)
+* annotation syntax **should not** conflict with other Markdown extensions
For syntactically correct and structurally supported annotations...
-* visual output **must** be identical to source without the annotation
+* visual output **must** be identical to the source without the annotation
* metadata of output **must** contain the annotation
For syntactically incorrect or structurally unsupported annotations...
* the annotation **must not** disappear from visual output
-* visual output **should** include the annotation in source form
+* visual output **should** include the annotation in its source form
diff --git a/_pandoc.qmd b/_pandoc.qmd
index d70968d..f408595 100644
--- a/_pandoc.qmd
+++ b/_pandoc.qmd
@@ -7,7 +7,7 @@ of the Markdown processor Pandoc.
Many dialects of Markdown have evolved,
some tightening the language for parsing efficiency and disambiguation,
-some extending to cover additional structures
+some extending to cover additional structures,
and some including support for a metadata header section.
Pandoc is a tool that can convert texts in Markdown dialects
@@ -17,7 +17,7 @@ Such document workflows,
including minimal structural markup as part of the creative writing,
applying visual layout as an automated templating process
tied to a target document format,
-has been further streamlined for academic texts
+have been further streamlined for academic texts
in the Quarto document publishing system.
----
@@ -30,8 +30,9 @@ and is relaxed about visual information,
to preserve literal content
while reducing format-specific stylistic details,
relevant especially when processing between different formats.
-Most common is to read plaintext Markdown files
-and write LaTeX code further compiled into a PDF file.
+The most common use
+is to read plaintext Markdown files and write LaTeX code,
+which is then compiled into a PDF file.
----
@@ -46,18 +47,18 @@ conforming to prescribed style guides and document formats.
---
-Pandoc is a document converter built around the markdown markup language,
+Pandoc is a document converter built around the Markdown markup language,
able to parse from and serialise to many Markdown dialects
as well as equivalent subsets of other text markup languages
including HTML, LaTeX (and by extension PDF),
-Office Open XML (as used by recent releases of Microsoft Word)
+Office Open XML (as used by recent releases of Microsoft Word),
and OpenDocument (as used by OpenOffice and LibreOffice).
Pandoc supports redefining input and output formats
and manipulating the internal document structure
as part of the automated parts of the framework.
-Pandoc is extendable.
+Pandoc is extensible.
Source and output format can be changed or completely redefined
and the internal document structure manipulated,
in the automated parts of the framework.
diff --git a/report.qmd b/report.qmd
index 89329ae..474b419 100644
--- a/report.qmd
+++ b/report.qmd
@@ -77,8 +77,8 @@ include-in-header:
# Abstract
-*FIXME and TODO notes in cursive text (like this)
-are editorial notes not intended for inclusion in the final delivery.*
+*FIXME and TODO notes in cursive text (such as this)
+are editorial comments not intended for inclusion in the final version.*
# Introduction