aboutsummaryrefslogtreecommitdiff
path: root/_chapter1.qmd
diff options
context:
space:
mode:
authorgsabin <sabinghimire071@gmail.com>2025-02-24 22:15:22 +0100
committergsabin <sabinghimire071@gmail.com>2025-02-24 22:15:22 +0100
commit24d5f6d30dcadcb9b644fb60c76a8244b08ebf65 (patch)
tree4f1d2d326c3314704dd5fe8851a06d47e91b8a41 /_chapter1.qmd
parentf6d749082c76b26e72dafd9b02d8a95260a03019 (diff)
Updated _chapter1.qmd with new content
Diffstat (limited to '_chapter1.qmd')
-rw-r--r--_chapter1.qmd62
1 files changed, 48 insertions, 14 deletions
diff --git a/_chapter1.qmd b/_chapter1.qmd
index d46eb3c..8c6a941 100644
--- a/_chapter1.qmd
+++ b/_chapter1.qmd
@@ -1,29 +1,42 @@
-Chapter 1
+Chapter 1: Introduction
+Overview of the Project
+The main goal of the project is to extend the Quarto document publishing system to handle semantic annotations in Markdown files.
+
+
+Background and Motivation
+Need to digitizing story fragments:
+
+Chapter 1: Introduction
In this chapter we are going to explore the key tools and technologies used in this project. This includes a brief overview of Quarto, Markdown, Lua, other relevant technologies that play a vital role in achieving the project’s goal.
Markdown
-Markdown is a lightweight markup language used to format plain text. It’s designed to be easy to read and write, with a simple syntax that allows writers to format headings, lists, and links. Quarto uses Markdown as its input format to generate richly formatted documents, making it an essential tool in academic and technical publishing. For the purposes of this project, Markdown serves as the foundation for embedding semantic annotations, which will be processed and transformed into different output formats.
-Why it’s is useful in academic writing
-Markdown allows writers to focus on content without worrying about formatting, and is ideal for taking notes and structuring thoughts in academic writing.
+Markdown is a lightweight markup language used to format plain text.Created by John Gruber in 2004, Markdown emphasizes readability and ease of use, allowing writers to focus on content rather than complex formatting syntax [@Gruber2004]. It’s designed to be easy to read and write, with a simple syntax that allows writers to format headings, lists, and links. Quarto uses Markdown as its input format to generate richly formatted documents, making it an essential tool in academic and technical publishing. For the purposes of this project, Markdown serves as the foundation for embedding semantic annotations, which will be processed and transformed into different output formats.
-Usage
+While HTML and LaTeX are also plaintext formats, they are less human-readable due to their syntax and complex markup rules. For instance, HTML's use of tags such as <strong> for bold text or <em> for emphasis can distract from the reading process, whereas Markdown uses simpler syntax like ** for bold and * for emphasis [@Mailund2019chap2, p. 9]. This simplicity makes Markdown an attractive choice for writers who need to produce structured documents without the overhead of learning complex markup languages.
+Advantages of Markdown
+Markdown offers simplicity, portability, readability and extensibility.
Quarto
-Quarto is an open-source publishing system that allows users to create dynamic and reproducible documents. It integrates with various formats like HTML, PDF, and slides and uses Markdown as its input format. One of its key features is its extensibility, enabling users to enhance the publishing process by writing plugins and scripts. This project aims to extend Quarto’s capabilities by introducing a custom extension that handles semantic annotations embedded in Markdown files.
-
-Connection to our project
-Quarto is the platform we’’re extending to support semantic annotations. Its DOM-based architecture allows to inject and process metadata during the conversion process. For example, we can write a Lua plugin to detect semantic annotations in Markdown and transform them into structured metadata for HTML or PDF outputs.
-
+Quarto is an open-source scientific and technical publishing system that builds on the strengths of Pandoc to create dynamic, multi-format documents from plaintext files written in Markdown [@Allaire2022]. It integrates with various formats like HTML, PDF, and slides and uses Markdown as its input format. One of its key features is its extensibility, enabling users to enhance the publishing process by writing plugins and scripts. This project aims to extend Quarto’s capabilities by introducing a custom extension that handles semantic annotations embedded in Markdown files.
+Markdown + YAML metadata → Quarto → Pandoc → Output Formats
+Quarto acts as a pipeline that processes Markdown files, enhanced with YAML metadata, through Pandoc to generate a variety of output formats [@Allaire2022].
+How quatro works
+Quarto operates as a wrapper around Pandoc, leveraging its powerful document conversion capabilities. The typical workflow involves input which is user written content in Markdown which also includes YAML metadata for document configuration, processing where it processes the markdown files, applying extensions and templates to enhance the content and then it converts the processed Markdown into the desired output format like HTML or PDF.
+The pipeline is highly customizable , allowing users to tailor each step to their specific needs, for example, users can write Lua filters to manipulate the document’s structure.
+Connection to our project
+Quarto is the platform we're extending to support semantic annotations. Its DOM-based architecture allows it to inject and process metadata during the conversion process. For example, we can write a Lua plugin to detect semantic annotations in Markdown and transform them into structured metadata for HTML or PDF outputs.
Lua
-Lua is a lightweight, high-performance scripting language widely used in embedded systems, game development, and software extensions. Quarto leverages Lua for writing custom extensions due to its simplicity, efficiency, and small footprint. For this project, Lua will be used to write a plugin that processes semantic annotations in Markdown files and generates appropriate metadata in HTML and PDF outputs. Lua’s versatility makes it ideal for such tasks, allowing for fine-grained control over document processing while maintaining high performance.
+Lua is a lightweight, embeddable scripting language designed for performance, simplicity, and extensibility [@Ierusalimschy1996]. It is widely used in applications ranging from game development to embedded system and it has become the language of choice for extending tools like Pandoc and Quarto due to its small footprint and use of use and integration. For this project, Lua will be used to write a plugin that processes semantic annotations in Markdown files and generates appropriate metadata in HTML and PDF outputs. Lua’s versatility makes it ideal for such tasks, allowing for fine-grained control over document processing while maintaining high performance.
+
+Simplicity → Embeddability → Performance → Extensibility
Lua is the preferred language for this project because:
@@ -31,9 +44,25 @@ It is natively supported by Pandoc and Quarto.
It is lightweight and fast, making it ideal for document processing.
Its simplicity allows for rapid development of custom filters and plugins.
+Lua's design philosophy emphasizes simplicity and embeddability, making it a powerful tool for extending software like Quarto and Pandoc [@Ierusalimschy1996].
+
+Our project will likely involve writing Lua filters to detect and transform semantic annotations in Markdown files. A simple example of Lua Filter would be
+Example of lua filter that transforms all level-2 headings into level-3
+function Header(elem)
+ if elem.level == 2 then
+ elem.level = 3
+ end
+ return elem
+end
+
+
Semantic Annotations
-Semantic annotations refer to the practice of embedding additional meaning or metadata within a document, allowing for richer data representation and understanding. In the context of Quarto and Markdown, semantic annotations will enable the document to contain extra information (e.g., definitions, references, and classifications) that can be used in various output formats. For example, annotations in Markdown will be transformed into RDFa (for HTML) or XMP metadata (for PDF) to enhance the document’s accessibility and interactivity.Unlike syntactic formatting (e.g., bold or italics), semantic annotations describe the purpose or role of a piece of text. For example, @concept{Artificial Intelligence} indicates that "Artificial Intelligence" is a key concept in the document.
+Semantic annotations are metadata that add meaning to text by tagging elements with additional information about their purpose, context, or relationships [@FrancartSemanticMarkdown2020]. Semantic annotations refer to the practice of embedding additional meaning or metadata within a document, allowing for richer data representation and understanding. In the context of Quarto and Markdown, semantic annotations will enable the document to contain extra information (e.g., definitions, references, and classifications) that can be used in various output formats. For example, annotations in Markdown will be transformed into RDFa (for HTML) or XMP metadata (for PDF) to enhance the document’s accessibility and interactivity.Unlike syntactic formatting (e.g., bold or italics), semantic annotations describe the purpose or role of a piece of text.
+
+Content + Semantic Metadata → Enhanced Understanding
+
+Semantic annotations bridge the gap between human-readable content and machine-interpretable metadata, enabling richer document analysis and processing [@FrancartSemanticMarkdown2020].
Purpose
In academic writing, semantic annotations help organize knowledge, improve information retrieval, and enhance storytelling. They allow authors to tag concepts, definitions, and relationships, making it easier to restructure and process content later.
@@ -42,9 +71,14 @@ In academic writing, semantic annotations help organize knowledge, improve infor
Output Formats (HTML and PDF)
Once the semantic annotations are embedded into the Markdown file, they need to be correctly transformed and rendered in the output formats. Quarto supports generating HTML and PDF outputs. For HTML, semantic annotations will be represented using RDFa, which is a form of structured metadata embedded in HTML content. For PDF, annotations will be encoded as XMP metadata, which allows for the integration of additional information into the PDF document's properties. These transformations will ensure that the enriched content is preserved in both formats.
-Other Technologies Involved
-Pandoc, LaTeX,RDFa and XMP Metadata:
+The transformation pipeline involves converting Markdown files, enhanced with semantic annotations, into multiple output formats using Pandoc and Quarto [@Allaire2022].
+
+Collaboration Tools
+We are using Radicle and Git as a tools for distributed version control and collaborative software development. Radicle and Git enable distributed workflows, ensuring that team members can collaborate effectively which maintaining a complete history of changes.
+
+Git: Git is a distributed version control system for tracking changes in source code and facilitating collaborative software development. It features includes distributed workflow i.e. every developer has a full copy of the repository also enabling offline work and decentralized collaboration, branching and merging , history tracking and integration.
+Radicle: Radicle is decentralized code collaboration platform built on Git. It provides a peer-to-peer alternatives to centralized platform like Github and GitLab, emphasizing privacy, security , and user control. Some key features that makes us use radicle are decentralization, code collaboration and privacy.
Conclusion of Chapter 1
This chapter provided an overview of the key tools and technologies used in this project, including Quarto, Markdown, Lua, and the concepts of semantic annotations. Understanding these tools and their roles in the project is crucial for the successful implementation of the extension that processes semantic annotations in Markdown files. The following chapters will delve deeper into the technical aspects of the implementation and the challenges encountered along the way.