diff options
| author | Jonas Smedegaard <dr@jones.dk> | 2025-05-22 13:04:05 +0200 |
|---|---|---|
| committer | Jonas Smedegaard <dr@jones.dk> | 2025-05-22 13:04:05 +0200 |
| commit | e7bf6fc56d9e33239dfaa0a18ed5ebb220f9320b (patch) | |
| tree | a8c4b19664197dd8a5172937e3a14f5acd1f3734 | |
| parent | 3b42a8b705d48e360fa792d4ed868ba4a64db091 (diff) | |
postpone describing RDF to subsection Perspectives
| -rw-r--r-- | _conclusion.qmd | 80 | ||||
| -rw-r--r-- | _intro.qmd | 6 | ||||
| -rw-r--r-- | _markdown.qmd | 32 | ||||
| -rw-r--r-- | _pandoc.qmd | 14 |
4 files changed, 85 insertions, 47 deletions
diff --git a/_conclusion.qmd b/_conclusion.qmd index 919b7f6..052e07b 100644 --- a/_conclusion.qmd +++ b/_conclusion.qmd @@ -2,11 +2,85 @@ ## Perspectives -Ideas for further explorations: +The existence of a filter to simply silence semantic text annotations +is arguably helpful in breaking the chicken-and-egg problem +between authors finding it relevant to annotate their texts +and renderers to take annotations into account in their renderings. +What immediately follows, then, +is to address the other half of that problem: +Implement renderers that makes use of Markdown with embedded annotations. + +Beyond both authoring and rendering of annotations, +several already emerging use cases may be aided by this work. + +### Rendering of annotations {#sec-rdf} + +*FIXME: rewrite and reduce +to describe concrete future works +of rendering RDFa in HTML and XMP in PDF.* + +* output format extension to generate PDF + * read semantic metadata from Pandoc YAML document header + * structure semantic metadata as RDF triples + * append RDF triples serialized as part of XMP metadata in PDF +* output format extension to generate web page + * read semantic metadata from Pandoc YAML document header + * structure semantic metadata as RDF triples + * append RDF triples serialized as RDFa + +Some document containers support metadata +expressed in some serialization of the abstract language RDF, +e.g. as XMP metadata in PDF output +[@PDFAssociation2020 chapter 14.3] +and as RDFa in html output +[@Herman2015]. + +RDF is an abstract data model for knowledge graphs, +usable for domain-specific annotations: +Terminology for a domain is established by referencing a shared ontology, +and terms are composed as sets of subject-predicate-object triples. +RDF includes one language, Turtle, strives to be human readable, +and languages for embedding triples into other data structures, +notably XMP for PDF files and RDFa for HTML. + +RDF is an abstract data model for knowledge graphs. +Multiple RDF languages exist, +each covering all or subsets of the RDF model, +including human readability optimized Turtle, +RDFa for HTML embedding +and XMP for PDF embedding. +Each RDF language have different constraints, +e.g. the XMP language for storing RDF in media files +can express express one RDF graph in each XMP object +[@Adobe2012, p. 9]. + +### Integration with Hypothesis -* integration with Hypothesis * filter extension to extend Pandoc/Quarto citations to cover [CiTO] -* filter extension to include more details of Quarto author metadata on XMP [CiTO]: <http://purl.org/spar/cito/2018-02-12> "CiTO, the Citation Typing Ontology" + +### Generalizing Quarto metadata + +Quarto, +a document authoring framework using Pandoc to render academic papers, +includes a sometimes quite elaborate restructuring and layout +of author and publisher metadata. +Currently this processing is done inconsistently across target formats, +and even for formats like HTML and PDF that supports RDF-based metadata +(as described in @sec-rdf), +the information is only laid out visually, +with the elaborately prepared structure not preserved. +A Pandoc filter could be written, +or this filter extended, +to embed structured data as RDF +for target formats supporting it. + +Also, +Pandoc could extend its AST to block- and inline-specific metadata +(in addition to the existing document-wide metadata). +Such change, and consequential refinements of default Pandoc templates +encouraging more normalized structures e.g. about authors and publishers, +might reduce the amount of custom restructuring +needed downstream e.g. in Quarto. @@ -123,11 +123,7 @@ to read Markdown with added markup for ontological annotations. First milestone is reached when the filter can simply suppress the added markup. A further milestone is to embed the expressed annotations -in supported output formats, -e.g. as XMP metadata in PDF output -[@PDFAssociation2020 chapter 14.3] -and as RDFa in html output -[@Herman2015]. +in supported output formats. Another further milestone is to make use of the added markup, e.g. to annotate purpose of scholarly citations as presented in @Daquino2023. diff --git a/_markdown.qmd b/_markdown.qmd index 64cff7b..4ce8bb3 100644 --- a/_markdown.qmd +++ b/_markdown.qmd @@ -28,6 +28,13 @@ chosen because it covers semantic text annotation and, as far as we are aware, is the only description for a Markdown extension with this coverage. +Additionally, +the embedded language for the annotations themselves +used in this specification +will likely ease future work of enhanced renderings of Markdown, +since it is abstractly equivalent +to metadata embedding formats of both PDF and HTML +(as discussed in more detail at @sec-rdf). ## Syntax of Markdown dialect Commonmark @@ -205,28 +212,3 @@ For syntactically incorrect or structurally unsupported annotations... * the annotation **must not** disappear from visual output * visual output **should** include the annotation in source form - -### XMP, RDFa and RDF {#sec-rdf} - -*FIXME: drop unneeded details, and more clearly begin with HTML and PDF already using RDF* - -RDF is an abstract data model for knowledge graphs, -usable for domain-specific annotations: -Terminology for a domain is established by referencing a shared ontology, -and terms are composed as sets of subject-predicate-object triples. -RDF includes one language, Turtle, strives to be human readable, -and languages for embedding triples into other data structures, -notably XMP for PDF files and RDFa for HTML. - -RDF is an abstract data model for knowledge graphs. -Multiple RDF languages exist, -each covering all or subsets of the RDF model, -including human readability optimized Turtle, -RDFa for HTML embedding -and XMP for PDF embedding. -Each RDF language have different constraints, -e.g. the XMP language for storing RDF in media files -can express express one RDF graph in each XMP object -[@Adobe2012, p. 9]. - -*FIXME: describe terms URI, CURIE, subject, predicate and object* diff --git a/_pandoc.qmd b/_pandoc.qmd index 082d18a..f81ea80 100644 --- a/_pandoc.qmd +++ b/_pandoc.qmd @@ -58,17 +58,3 @@ in the automated parts of the framework. Collection of interrelated POSIX scripts and Pandoc extensions for enabling semantic annotations in Markdown-based authoring workflows. - -* filter extension to capture annotations - * identify semantic metadata in stylistic metadata part of Pandoc YAML header - * identify semantic metadata in content part of Pandoc document structure - * append semantic metadata to Pandoc YAML document header - * strip identified metadata from stylistic metadata and content -* output format extension to generate PDF - * read semantic metadata from Pandoc YAML document header - * structure semantic metadata as RDF triples - * append RDF triples serialized as part of XMP metadata in PDF -* output format extension to generate web page - * read semantic metadata from Pandoc YAML document header - * structure semantic metadata as RDF triples - * append RDF triples serialized as RDFa |
