--- semantic-markdown - Pandoc plugin to process semantic hints --- --- SPDX-FileCopyrightText: 2025 Jonas Smedegaard --- SPDX-License-Identifier: GPL-3.0-or-later --- --- ## Examples --- --- Ideally, this text: --- --- ```Markdown+RDF --- Simple ontological annotation: --- [This]{foaf:depiction} is not a pipe. --- --- Nested, mixed-use and custom-namespaced annotations: --- [[Ceci]{foaf:depiction} n'est pas une pipe.]{lang=fr bibo:Quote} --- --- {bibo}: http://purl.org/ontology/bibo/ --- ``` --- --- ...should with this filter be transformed to this text: --- --- ```Markdown --- --- --- turtle: | --- @prefix bibo: http://purl.org/ontology/bibo/ --- --- _:001 a foaf:depiction . --- _:002 a foaf:depiction . --- _:003 a bibo:Quote . --- --- --- Simple ontological annotation: --- This is not a pipe. --- --- Nested, mixed-use and custom-namespaced annotations: --- [Ceci n'est pas une pipe.]{lang=fr} --- ``` --- --- When target document format is html, --- this filter should ideally produce RDFa 1.1 Lite or Core data. --- (Lite is *not* a subset of Core as it deviates slightly). --- --- * v0.0.1 --- * initial release --- --- @version 0.0.1 --- @see --- @see --- @see --- @see -- TODO: maybe use topdown traversal -- * order of declaring annotations might matter (but should not) -- * might enable simpler functions and/or faster processing -- @see -- ensure stable character classes independent of system locale -- @see os.setlocale 'C' -- TODO: cover non-ASCII Unicode characters -- @see --- Curie_long - CURIE with prefix and reference as set of chars --- @see local _name_start_char = "A-Z_a-z" local _name_char = _name_start_char.."-0-9" local _ref = "[".._name_start_char.."][".._name_char.."]*" local _prefix = "[".._name_start_char.."_-][".._name_char.."]*" local curie_long = _prefix..":".._ref --- curie_no_ref - CURIE with only prefix as set of chars local curie_no_ref = _prefix..":" --- curie_local - CURIE with only name as set of chars local curie_local = ":".._ref --- curie_default - CURIE without prefix or name as char local curie_default = ":" -- TODO: curie_re - CURIE as `LPeg.re` regex object -- TODO: test and replace above curie* patterns -- @see --local curie_re = re.compile("(".._prefix..")?:(".._ref..")?") -- FIXME: define RDF context same as RDFa -- TODO: maybe support overriding context with a JSON-LD URI -- @see --- Namespaces - process RDF namespace IRI declarations --- --- Takes as input a list of Para block elements. --- For each block matching the pattern for a namespace IRI definition, --- the declared namespace is extracted. --- Returns an empty paragraph in case of a match, --- or nothing (to signal preservation of original content). --- --- Example: --- --- ```Markdown --- # Annotated paragraph using a custom namespace --- --- My favorite animal is the [Liger]{ov:preferredAnimal}. --- --- {ov}: http://open.vocab.org/terms/ --- ``` --- --- @param blocks Markdown with ontological annotations as Blocks --- @returns Markdown without ontological annotations as Blocks --- @see --- @see local function Namespaces(blocks) -- paragraph with only a braced reference CURIE, colon and one word if #blocks.content == 3 and blocks.content[1].t == "Str" and blocks.content[2].t == "Space" and blocks.content[1].text:match "^{"..curie_no_ref.."}%:%:$" then -- default namespace, misparsed as a citation if blocks.content[3].t == "Cite" and #blocks.content[3].content == 1 -- TODO: maybe check case-insensitively and blocks.content[3].content[1].text == "@default" then -- FIXME: add CURIE to metadata return {} end -- namespace if blocks.content[3].t == "Str" -- TODO: maybe check case-insensitively -- TODO: relax to match URI syntax without hardcoded protocols and blocks.content[3].text:match "^https?:" then -- FIXME: add CURIE and URI to metadata return {} end end end --- Statements - process inline RDF statements --- --- Locate and extract ontological annotations --- within a [Block] element of a Pandoc Abstract Syntax Tree (AST). --- --- Markup for ontological annotations is an extension to Markdown --- using similar syntax as hypermedia annotations, --- but listing RDFa [CURIEs] in a braced enclosure. --- --- ```ASCII-art --- Simple ontological annotation: --- "A [map]{foaf:depiction} is not the territory" --- | ||\~~~~~~~~~~~~/| --- a bc CURIEa d --- --- Nested and mixed-use annotations: --- ["[Ceci]{foaf:depiction} n'est pas une pipe"]{lang=fr dc:Text} --- | | ||\~~~~~~~~~~~~/| || \~~~~~/| --- a a1 |c1 CURIEa d1 bc CURIEb d --- b1 --- --- Chained hypermedia and ontological annotations: --- "A [map](https://osm.org/){foaf:depiction} is not the territory" --- | || ||\~~~~~~~~~~~~/| --- a be fc CURIEa d --- --- Legend: --- a-b: braceted enclosure around content --- c-d: bracketed enclosure around ontological or other annotation --- e-f: parenthesized enclosure around hypermedia annotation --- ``` --- --- Ontological annotations are parsed and reorganised --- using the following algorithm: --- --- 1. locate pairs of bracketed text and braced text --- either adjacent or separated by parenthesized text, --- where braced text contains one or more [CURIEs] --- 2. for each pair, --- 1. add CURIEs in braced text to metadata --- 2. add positions of brackets to metadata --- 3. delete CURIEs --- 4. delete braced enclosure if now structurally empty --- 5. delete brackets if now unannotated --- --- The implementation is inspired by Pandoc [issue#6038]. --- --- @param inlines Markdown with semantic annotations as Inlines --- @returns Markdown stripped of semantic annotations as Inlines --- @see [Block]: --- @see [CURIEs]: --- @see [issue#6038]: -- TODO: maybe instead as step #5 add/reuse hypermedia anchor function Statements (block) -- flags for enclosing stages -- TODO: support nested bracket enclosure local bracketed, braced -- amount of detected statements in this block local statement_count = 0 local stack = {} for i, el in ipairs(block.content) do local pos = 0 local stack_next = "" -- non-string element if el.t ~= 'Str' then -- TODO: support mixed-use braced enclosure if not braced then table.insert(stack, el) end goto continue end -- unenclosed -- TODO: support backslash except immediately before bracket if not (bracketed or braced) then _, x, s = string.find(el.text, "^([^%[\\]*)") if x then a = x + 1 else a = 1 end if el.text:sub(a, a) == "[" then -- entering bracketed enclosure bracketed = true pos = a + 1 stack_next = stack_next..s -- staying unenclosed else table.insert(stack, el) goto continue end end -- in bracketed enclosure -- TODO: support backslash except immediately before bracket/brace -- TODO: support nested bracket enclosure if bracketed and not braced then _, x, s = string.find(el.text, "^([^%[%]}\\]*)", pos) if x then b = x + 1 else b = pos end if el.text:sub(b, b) == "]" then c = b + 1 -- entering braced enclosure if el.text:sub(c, c) == "{" then braced = true pos = c + 1 stack_next = stack_next..s -- leaving non-annotation enclosure else bracketed = false braced = false -- TODO: clear only back to entering this bracketed enclosure stack = {} -- TODO: parse remains of Str goto continue end -- staying enclosed else stack_next = stack_next..s end end -- in braced enclosure, leaving it -- TODO: support mixed-use enclosure -- TODO: cover curie_prefix and curie_local and curie_default if braced then _, d1 = string.find(el.text, "^"..curie_long.."}", pos) _, d2 = string.find(el.text, "^"..curie_no_ref.."}", pos) _, d3 = string.find(el.text, "^"..curie_local.."}", pos) _, d4 = string.find(el.text, "^"..curie_local.."}", pos) if d1 then d = d1 elseif d2 then d = d2 elseif d3 then d = d3 elseif d4 then d = d4 end if d then statement_count = statement_count + 1 table.insert(stack, pandoc.Str(stack_next)) stack_next = "" bracketed = false braced = false pos = d + 1 -- TODO: parse remains of Str end end -- end of element, push collected string to stack if string.len(stack_next) > 0 and pos >= el.text:len() then table.insert(stack, pandoc.Str(stack_next)) stack_next = "" end -- done parsing current Inline element ::continue:: end if statement_count > 0 then return pandoc.Blocks {stack} end end -- First resolve namespace declarations, then statements. -- -- Although this filter is *not* a full RDF parser, -- order matters for the parts we do handle -- -- e.g. namespace resolving is similar to other RDF formats -- with detailed documented process ordering. -- -- @see local meta = {} return { -- move aside MetaBlocks to speed up processing content -- -- @see { Meta = function(m) meta = m; return {} end }, {Para = Namespaces}, {Block = Statements}, -- FIXME: add custom declared namespaces in Meta -- TODO: maybe add only actively used namespaces -- (do same as for unused link definitions) { Meta = function(_) return meta; end }, --{ Meta = function(_) return NamespacesToMeta(meta); end }, }