aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas Smedegaard <dr@jones.dk>2025-03-16 18:27:19 +0100
committerJonas Smedegaard <dr@jones.dk>2025-03-16 18:27:19 +0100
commit7e436e30ab1feb4be2ad5faada1c448939e1de08 (patch)
tree36861ec7ab94e969fd82d0be063113be81414161
parentdbb50bc03105bd7a208e46a9b0ddbe11811b8f20 (diff)
drop draft text by Sabin
-rw-r--r--_chapter1.qmd84
1 files changed, 0 insertions, 84 deletions
diff --git a/_chapter1.qmd b/_chapter1.qmd
deleted file mode 100644
index 8c6a941..0000000
--- a/_chapter1.qmd
+++ /dev/null
@@ -1,84 +0,0 @@
-Chapter 1: Introduction
-
-Overview of the Project
-The main goal of the project is to extend the Quarto document publishing system to handle semantic annotations in Markdown files.
-
-
-Background and Motivation
-Need to digitizing story fragments:
-
-Chapter 1: Introduction
-In this chapter we are going to explore the key tools and technologies used in this project. This includes a brief overview of Quarto, Markdown, Lua, other relevant technologies that play a vital role in achieving the project’s goal.
-
-Markdown
-
-Markdown is a lightweight markup language used to format plain text.Created by John Gruber in 2004, Markdown emphasizes readability and ease of use, allowing writers to focus on content rather than complex formatting syntax [@Gruber2004]. It’s designed to be easy to read and write, with a simple syntax that allows writers to format headings, lists, and links. Quarto uses Markdown as its input format to generate richly formatted documents, making it an essential tool in academic and technical publishing. For the purposes of this project, Markdown serves as the foundation for embedding semantic annotations, which will be processed and transformed into different output formats.
-
-While HTML and LaTeX are also plaintext formats, they are less human-readable due to their syntax and complex markup rules. For instance, HTML's use of tags such as <strong> for bold text or <em> for emphasis can distract from the reading process, whereas Markdown uses simpler syntax like ** for bold and * for emphasis [@Mailund2019chap2, p. 9]. This simplicity makes Markdown an attractive choice for writers who need to produce structured documents without the overhead of learning complex markup languages.
-
-Advantages of Markdown
-Markdown offers simplicity, portability, readability and extensibility.
-
-Quarto
-Quarto is an open-source scientific and technical publishing system that builds on the strengths of Pandoc to create dynamic, multi-format documents from plaintext files written in Markdown [@Allaire2022]. It integrates with various formats like HTML, PDF, and slides and uses Markdown as its input format. One of its key features is its extensibility, enabling users to enhance the publishing process by writing plugins and scripts. This project aims to extend Quarto’s capabilities by introducing a custom extension that handles semantic annotations embedded in Markdown files.
-Markdown + YAML metadata → Quarto → Pandoc → Output Formats
-
-Quarto acts as a pipeline that processes Markdown files, enhanced with YAML metadata, through Pandoc to generate a variety of output formats [@Allaire2022].
-
-How quatro works
-Quarto operates as a wrapper around Pandoc, leveraging its powerful document conversion capabilities. The typical workflow involves input which is user written content in Markdown which also includes YAML metadata for document configuration, processing where it processes the markdown files, applying extensions and templates to enhance the content and then it converts the processed Markdown into the desired output format like HTML or PDF.
-
-The pipeline is highly customizable , allowing users to tailor each step to their specific needs, for example, users can write Lua filters to manipulate the document’s structure.
-
-Connection to our project
-Quarto is the platform we're extending to support semantic annotations. Its DOM-based architecture allows it to inject and process metadata during the conversion process. For example, we can write a Lua plugin to detect semantic annotations in Markdown and transform them into structured metadata for HTML or PDF outputs.
-
-Lua
-Lua is a lightweight, embeddable scripting language designed for performance, simplicity, and extensibility [@Ierusalimschy1996]. It is widely used in applications ranging from game development to embedded system and it has become the language of choice for extending tools like Pandoc and Quarto due to its small footprint and use of use and integration. For this project, Lua will be used to write a plugin that processes semantic annotations in Markdown files and generates appropriate metadata in HTML and PDF outputs. Lua’s versatility makes it ideal for such tasks, allowing for fine-grained control over document processing while maintaining high performance.
-
-Simplicity → Embeddability → Performance → Extensibility
-
-Lua is the preferred language for this project because:
-
-It is natively supported by Pandoc and Quarto.
-It is lightweight and fast, making it ideal for document processing.
-Its simplicity allows for rapid development of custom filters and plugins.
-
-Lua's design philosophy emphasizes simplicity and embeddability, making it a powerful tool for extending software like Quarto and Pandoc [@Ierusalimschy1996].
-
-Our project will likely involve writing Lua filters to detect and transform semantic annotations in Markdown files. A simple example of Lua Filter would be
-Example of lua filter that transforms all level-2 headings into level-3
-function Header(elem)
- if elem.level == 2 then
- elem.level = 3
- end
- return elem
-end
-
-
-
-Semantic Annotations
-Semantic annotations are metadata that add meaning to text by tagging elements with additional information about their purpose, context, or relationships [@FrancartSemanticMarkdown2020]. Semantic annotations refer to the practice of embedding additional meaning or metadata within a document, allowing for richer data representation and understanding. In the context of Quarto and Markdown, semantic annotations will enable the document to contain extra information (e.g., definitions, references, and classifications) that can be used in various output formats. For example, annotations in Markdown will be transformed into RDFa (for HTML) or XMP metadata (for PDF) to enhance the document’s accessibility and interactivity.Unlike syntactic formatting (e.g., bold or italics), semantic annotations describe the purpose or role of a piece of text.
-
-Content + Semantic Metadata → Enhanced Understanding
-
-Semantic annotations bridge the gap between human-readable content and machine-interpretable metadata, enabling richer document analysis and processing [@FrancartSemanticMarkdown2020].
-
-Purpose
-In academic writing, semantic annotations help organize knowledge, improve information retrieval, and enhance storytelling. They allow authors to tag concepts, definitions, and relationships, making it easier to restructure and process content later.
-
-
-Output Formats (HTML and PDF)
-Once the semantic annotations are embedded into the Markdown file, they need to be correctly transformed and rendered in the output formats. Quarto supports generating HTML and PDF outputs. For HTML, semantic annotations will be represented using RDFa, which is a form of structured metadata embedded in HTML content. For PDF, annotations will be encoded as XMP metadata, which allows for the integration of additional information into the PDF document's properties. These transformations will ensure that the enriched content is preserved in both formats.
-
-The transformation pipeline involves converting Markdown files, enhanced with semantic annotations, into multiple output formats using Pandoc and Quarto [@Allaire2022].
-
-Collaboration Tools
-We are using Radicle and Git as a tools for distributed version control and collaborative software development. Radicle and Git enable distributed workflows, ensuring that team members can collaborate effectively which maintaining a complete history of changes.
-
-Git: Git is a distributed version control system for tracking changes in source code and facilitating collaborative software development. It features includes distributed workflow i.e. every developer has a full copy of the repository also enabling offline work and decentralized collaboration, branching and merging , history tracking and integration.
-
-Radicle: Radicle is decentralized code collaboration platform built on Git. It provides a peer-to-peer alternatives to centralized platform like Github and GitLab, emphasizing privacy, security , and user control. Some key features that makes us use radicle are decentralization, code collaboration and privacy.
-
-Conclusion of Chapter 1
-This chapter provided an overview of the key tools and technologies used in this project, including Quarto, Markdown, Lua, and the concepts of semantic annotations. Understanding these tools and their roles in the project is crucial for the successful implementation of the extension that processes semantic annotations in Markdown files. The following chapters will delve deeper into the technical aspects of the implementation and the challenges encountered along the way.