A Language Engineering Blueprint

The MEKA Framework for Graft-Splice Meaning Construction

Part I: The Atomic and Molecular Architecture of Language

Chapter 1: Foundational Language Units: The Quantum Level of Meaning

This chapter establishes the irreducible “atoms” of language. These units are treated not as static symbols but as entities whose properties, such as pronunciation, are realized through observation within a specific context. It details the computational processes required to map these foundational units from their written form to their potential spoken realizations, forming the bedrock of the entire meaning construction hierarchy.

1.1 Graphemes and Phonemes: The Duality of Written and Spoken Form

1.1.1 Defining the Grapheme-to-Phoneme (G2P) Problem

The initial, non-negotiable step in any computational system that bridges text and speech is grapheme-to-phoneme (G2P) conversion. This is the task of mapping a sequence of written symbols, or graphemes, to their corresponding sequence of sound representations, or phonemes.1 This process is foundational for high-level applications such as text-to-speech (TTS) synthesis and automatic speech recognition (ASR), where its accuracy directly dictates the quality and intelligibility of the final output.1

The G2P task is far from a simple one-to-one transliteration. Languages exhibit complex mapping relationships, including many-to-one (e.g., ‘sh’ and ‘ch’ can both map to /ʃ/ in different contexts) and one-to-many (e.g., the grapheme ‘a’ has multiple pronunciations in ‘cat’, ‘father’, and ‘able’) patterns. This inherent complexity is further compounded by the contextual dependency of pronunciation, making G2P a significant challenge in language engineering.1

1.1.2 The Evolution of G2P Models: From Static Rules to Dynamic Inference

The methodologies for tackling the G2P problem have evolved significantly, reflecting broader trends in computational linguistics from handcrafted rules to data-driven, context-aware inference.

  • Rule-Based and Dictionary-Based Systems: The earliest G2P systems were engineered for specific languages, primarily English, and relied on domain knowledge to formulate explicit conversion rules.1 These systems were often paired with dictionaries to handle exceptions. While capable of high accuracy within their specific domain, they were costly to develop, brittle when encountering out-of-vocabulary words, and could not be easily ported to other languages.1
  • Statistical Models (HMMs, CRFs): A major advancement came with the application of statistical models like Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs).4 These models learn the probabilistic relationships between grapheme and phoneme sequences from data, offering greater flexibility and robustness than hard-coded rules. In low-resource scenarios where training data is scarce, HMM-based systems remain a viable approach, particularly when augmented with transfer learning from related, higher-resource languages.5
  • Neural Network Architectures (LSTMs, Transformers): The deep learning revolution introduced models that could learn G2P mappings end-to-end, eliminating the need for explicit grapheme-phoneme alignment.1 Architectures like LSTMs and, more recently, Transformers have demonstrated superior performance by capturing finer, more effective features and generalizing well to unseen words.1 The attention mechanism in Transformer models is particularly adept at leveraging broader context to resolve pronunciation ambiguities.2
1.1.3 The Homograph Challenge: G2P as a Contextual Disambiguation Task

The primary challenge for modern G2P systems is the resolution of homographs—words that are spelled identically but have different pronunciations and meanings depending on their context. For example, the word “wound” is pronounced /waʊnd/ in “His string was wound very tight” but as /wuːnd/ in “Let me see the wound on your leg”.4 This problem fundamentally transforms G2P from a phonetic transliteration task into a semantic disambiguation task. The correct pronunciation cannot be determined from the word’s spelling alone; it requires an understanding of the word’s meaning within the sentence.

This necessity for semantic feedback reveals a critical architectural principle for any advanced language system: the properties of foundational units are not fixed but are resolved dynamically based on higher-level context. The processing pipeline cannot be strictly bottom-up. Information from semantic and discourse analysis must be available to inform the most basic level of grapheme-to-phoneme conversion.

The state-of-the-art solution to the homograph problem leverages Large Language Models (LLMs) with a technique called In-Context Knowledge Retrieval (ICKR).4 The ICKR process works as follows:

  1. The system identifies a potentially homographic word in an input sentence.
  2. An LLM (e.g., GPT-4) is prompted with the target word and its full sentence context.
  3. The LLM analyzes the context to determine the word’s intended meaning and part-of-speech.
  4. This semantic analysis is used to query a specialized dictionary that contains entries for each distinct meaning of the homograph, along with its corresponding phoneme sequence.
  5. The correct phoneme sequence is retrieved and used for synthesis or recognition.

This method has achieved the highest accuracy on G2P benchmarks, reducing phoneme error rates significantly and demonstrating that robust G2P requires a deep, contextual understanding of language.4

1.1.4 Error Analysis in G2P Systems

Despite these advances, G2P systems are not infallible. Character-based language models, an alternative approach that reduces vocabulary size, can introduce unique error patterns. A manual evaluation of one G2P conversion on English text revealed a word-error-rate of 5.8%, indicating that errors persist.8 Furthermore, post-processing techniques like generative error correction (GER), which use LLMs to fix ASR output, can struggle with rare words and may over-correct common words by neglecting crucial phonetic cues from the original audio.9 This reinforces the importance of a highly accurate G2P front-end that is sensitive to both semantic context and phonological plausibility.

The following table provides a comparative analysis of G2P model architectures, outlining their core mechanisms and engineering trade-offs.

Model TypeCore MechanismHandling of AmbiguityData RequirementsPortabilityKey StrengthsKey Weaknesses
Rule-BasedHand-crafted linguistic rules and exception dictionaries.Limited; relies on explicit rules for known cases.Low (for rules), High (for dictionary).LowHigh precision on in-domain text; interpretable.Brittle; costly to develop; not generalizable.
HMMProbabilistic sequence modeling of grapheme-phoneme alignments.Statistical; models local contextual probabilities.MediumMediumGood baseline for low-resource languages.Struggles with long-range dependencies and complex ambiguity.
Neural (LSTM)Recurrent neural network learns sequence-to-sequence mapping.Captures sequential context to resolve local ambiguity.HighHighGeneralizes well to unseen words; no explicit alignment needed.Can have difficulty with very long-range context.
TransformerAttention mechanism weighs the influence of all input graphemes.Excellent; attention can model complex, non-local dependencies.HighHighState-of-the-art for context-aware G2P without external knowledge.Requires large datasets for optimal performance.
LLM + ICKRLLM performs semantic analysis to retrieve pronunciation from a knowledge base.Explicitly resolves semantic ambiguity (homographs) via context.Very HighHighHighest accuracy; directly solves the homograph problem.Computationally expensive; dependent on external dictionary.

1.2 Logograms and Diacritics: Compressing and Modifying Information

1.2.1 Logograms as Semantic Primitives

Logograms such as ‘&’, ‘@’, or Chinese characters (漢) function as highly compressed lexical units, representing an entire word or concept in a single grapheme. From the perspective of information theory, logograms are an efficient form of source coding, minimizing the length of the signal required to transmit high-frequency concepts.10 Within the MEKA framework, logograms are treated as pre-compiled atoms whose G2P mapping is a direct, context-independent lookup in the LogOS Codex.

1.2.2 Diacritics as Spliced Phonetic Modifiers

Diacritics (e.g., the acute accent in ‘é’ or the tilde in ‘ñ’) are marks that are spliced into or onto a base grapheme to modify its phonetic value. They serve as explicit, low-cost signals that resolve potential phonetic ambiguity directly at the source. This reduces the computational load on the G2P model, as it obviates the need for complex contextual inference.

The existence of both complex, implicit rules (for homographs in English) and simple, explicit markers (like diacritics in other languages) highlights a fundamental design trade-off in writing systems. This trade-off balances the complexity of the written signal against the complexity of the decoding process. Languages like English minimize signal complexity (fewer unique symbols) at the cost of higher decoding complexity, necessitating powerful computational models. Languages that employ diacritics increase signal complexity slightly to drastically reduce the cognitive and computational load of decoding. The architectural implication is clear: the choice of foundational units for a given language dictates the required sophistication of its G2P module.

Chapter 2: Composite Language Units: The Molecular Bonds of Meaning

This chapter details the construction of linguistic “molecules” from the foundational “atoms” defined previously. It focuses on morphology, the study of word structure, presenting the computational methods for decomposing words into their smallest meaning-carrying units (morphemes) and for understanding the rules that govern their combination (morphotactics).

2.1 Morphology: The Science of Word Formation

2.1.1 The Morpheme as the Smallest Unit of Meaning

Morphological analysis is the computational process of deconstructing words into their constituent morphemes: the root (or stem), prefixes, and suffixes.11 This analysis is a foundational stage of NLP, providing crucial insights into word structure that benefit numerous downstream applications, most notably machine translation for morphologically rich languages where a single root can generate hundreds of inflected forms.12

2.1.2 Inflectional vs. Derivational Morphology

A critical distinction in morphology lies between inflectional and derivational processes, which have different impacts on a word’s identity.15

  • Inflectional Morphology adds grammatical information to a word, such as tense, number, or case. It creates different forms of the same word (e.g., walk, walks, walked, walking are all forms of the verb walk).
  • Derivational Morphology creates new words, often changing the word’s meaning or part-of-speech. For example, adding the suffix -ness to the adjective happy creates the new noun happiness.

This distinction is vital for the architecture of the LogOS Codex. Inflectional variants can be stored as grammatical features of a single lexical entry, whereas derivational changes necessitate the creation of a new, distinct entry linked to the original root.

2.1.3 Computational Techniques for Morphological Analysis

Several computational techniques exist to perform morphological analysis, varying in complexity and linguistic sophistication.

  • Stemming and Lemmatization: These are foundational techniques for reducing words to a base form. Stemming employs crude heuristics, such as stripping common suffixes, which may not result in a valid word (e.g., ‘running’ -> ‘runn’). Lemmatization, in contrast, uses lexical knowledge and contextual analysis to return the word’s correct dictionary form, or lemma (e.g., ‘running’ -> ‘run’).12
  • Finite-State Transducers (FSTs): For decades, FSTs have been the dominant technology for robust morphological analysis.12 An FST is a type of finite automaton that maps between two levels of representation: the surface form of a word (the string of letters) and its lexical form (a representation of its morphemes). For example, an FST could map the input
    running to the output run+V+PROG, indicating the root, its part-of-speech, and its progressive aspect.18 FSTs are computationally efficient and excel at modeling the regular, rule-based nature of morphology.
  • Neural Models: More recently, deep learning models have been applied to morphological tasks, learning complex patterns directly from large datasets without the need for explicit rule engineering.12 While powerful, FSTs remain a strong and often more interpretable baseline, particularly in data-scarce scenarios.

2.2 The Root (Etymon) and Semantic Gravity

The core morpheme of a word, its etymon, serves as the anchor for its meaning.20 The MEKA framework’s principle of “semantic gravity” posits that a word’s meaning is tethered to its etymological origin. The LogOS Codex must therefore function as a comprehensive etymological database, mapping every derived and inflected word back to its foundational root.

This architectural emphasis on decomposition is not merely a theoretical convenience; it is grounded in cognitive reality. Psycholinguistic experiments combining masked priming with magneto-encephalography (MEG) have provided strong evidence that the human brain performs morphological decomposition at a very early, pre-lexical stage of word recognition.22 When presented with a word like

builder, the brain activates representations for both build and -er before accessing the full word form. This cognitive precedent validates the MEKA framework’s morpheme-centric approach to construction.

Furthermore, anchoring related words to a single etymon provides a significant computational advantage. In morphologically rich languages, the sheer number of unique word forms leads to severe data sparsity in statistical models, as many forms appear too infrequently to gather reliable statistics.13 By computationally mirroring the brain’s decomposition process and linking words like

construct, destruction, and structure to the shared etymon struct in the LogOS Codex, the system can pool statistical strength among them. Information learned about one word can be transferred to its morphological relatives, improving model generalization and mitigating data sparsity. The etymon thus becomes a computational anchor for transfer learning, not just a historical footnote.

2.3 Morphotactics: The Grammar of Morpheme Combination

The process of combining morphemes is not arbitrary; it is governed by a strict set of rules known as morphotactics.23 These rules specify the legal ordering of morphemes within a word. For instance, English morphotactics dictates that inflectional suffixes (like plural

-s or past-tense -ed) must appear after any derivational suffixes (like -ness or -able). This prevents the formation of ungrammatical words like *kind-s-ness.

Computationally, these ordering constraints can be modeled with high precision. While the regular languages computable by FSTs are sufficient, research has shown that morphotactic dependencies belong to a much more restricted and computationally simpler class of languages: the tier-based strictly local (TSL) languages.24 A TSL grammar posits that the grammaticality of a morpheme’s appearance depends only on a small, local window of other specific morphemes that exist on a projected “tier.” This subregular characterization is powerful because it is not only computationally efficient but also learnable from positive examples alone, offering an explanation for how such complex constraints can be acquired.

This formal model of morphotactics has a profound implication for the MEKA “splicing” protocol. Splicing, the act of replacing one morpheme with another (e.g., con-struct -> de-struct), cannot be a simple, unconstrained string operation. It must be a linguistically principled generative process. Before any splice operation is finalized, the resulting sequence of morphemes must be validated against the TSL grammar for the target language. This crucial step ensures that the system only generates grammatically valid words, transforming splicing from a brute-force recombination into a constrained and predictable act of word formation.

Part II: Lexical and Structural Systems

Chapter 3: Lexical Units: The Cellular Organisms of Language

This chapter transitions from the components of words to self-contained lexical units. It defines the protocols for managing a dynamic lexicon, encompassing stable, registry-defined “Terms” and the continuous influx of “Neologisms.” The chapter operationalizes the core concept of “semantic gravity” as a measurable force, detailing computational methods to model and monitor the stability and evolution of meaning over time.

3.1 The Lexicon: Words and Terms

3.1.1 Words as Standalone Units

Words, constructed from the morphemes detailed in the previous chapter, are the smallest standalone units of meaning that function as the primary input for syntactic structures.11

3.1.2 Terms and Terminology Management

Within the general lexicon, a “Term” is a specialized word or phrase that has been assigned a fixed, unambiguous definition within a specific domain or context. The consistent use of such terms is critical for clarity in technical, legal, and corporate communication. This requires a systematic approach known as Terminology Management.25

The technological backbone for this process is a Terminology Management System (TMS), also known as a Termbase System (TBS).26 A TMS is a centralized database (a termbase) designed to systematically collect, validate, and maintain an organization’s standardized terminology.25 The user’s proposed LogOS Codex is the architectural realization of such a system. To be effective, the LogOS Codex must implement the core features of a modern TMS, including support for multilingual terms, version history, term approval workflows, and robust integration capabilities (e.g., APIs) to ensure that all other components of the MEKA framework can access and adhere to the standardized lexicon.26

3.2 Neologisms and the P-047 Empirical Loop

While the LogOS Codex enforces stability for established terms, language is a dynamic system that constantly evolves through the creation of neologisms—newly coined words that emerge in response to technological innovation and cultural change.28 These new words are formed through productive linguistic processes such as blending (

finfluencer from financial + influencer), affixation (deplatform), and compounding (rage-farming).30

The user’s “P-047 Empirical Loop” for validating neologisms can be operationalized as a concrete computational pipeline that mirrors the sociolinguistic processes of innovation and adoption. This pipeline is not merely a technical filter; it is a simulation of the selective pressures that determine whether a new word survives and enters the lexicon. The stages of this validation loop are grounded in state-of-the-art computational techniques for neologism identification.31

The computational pipeline proceeds as follows:

  1. Candidate Extraction: This initial stage simulates linguistic innovation. Text from large, dynamic corpora (e.g., the web) is preprocessed. Potential neologisms are identified by flagging unknown words against a master dictionary or by using statistical measures like Pointwise Mutual Information (PMI) to detect novel, meaningful multi-word expressions.32
  2. Linguistic Filtering: The candidate pool is refined by applying a series of filters that test for linguistic plausibility. This stage weeds out noise such as spelling errors, proper nouns, and sequences that violate the language’s known morphological and phonological patterns.31
  3. Semantic Validation: This stage acts as a proxy for a speech community’s judgment of a word’s meaningfulness. Pre-trained Large Language Models (LLMs) like BERT or GPT are used to generate contextual embeddings for the candidate neologism. The semantic coherence of the candidate is assessed by measuring the cosine similarity between its embedding and the embeddings of its surrounding context or related reference terms.32
  4. Graph-Based Contextual Verification: A word’s meaning is also defined by its relationships to other words. This stage verifies that the candidate neologism can be plausibly integrated into the existing semantic network of the language. This is achieved by leveraging structured lexical resources like WordNet or Wikipedia to confirm that the new term can form meaningful relationships (e.g., hypernymy, meronymy) with existing concepts.32
  5. Adoption and Registry: Once a neologism passes all validation stages, it is assigned a registry ID and entered into the LogOS Codex. Its subsequent adoption and spread through the language can be monitored over time. This process is well-described by the Diffusion of Innovations theory, which models how an innovation is adopted through a social system via five stages: knowledge, persuasion, decision, implementation, and confirmation.36

The following table outlines the operational protocol for the P-047 Empirical Loop.

StageObjectiveCore TechniqueKey Parameters/MetricsRelevant Research
Candidate ExtractionIdentify novel word forms from a dynamic corpus.Dictionary matching for unknown tokens; PMI for multi-word expressions.Frequency thresholds; PMI score.32
Linguistic FilteringRemove noise (e.g., typos, proper nouns).Rule-based filtering (word length, character patterns, case sensitivity).Edit distance to known words; proper noun heuristics.31
Semantic ValidationAssess if the candidate is semantically coherent in context.LLM-based contextual embeddings (BERT, GPT).Cosine similarity between candidate and context vectors.32
Contextual VerificationVerify the candidate’s fit within the broader lexical network.Graph-based analysis using WordNet or Wikipedia.Path existence; semantic relation plausibility.32
Adoption & RegistryFormally accept the neologism and track its diffusion.Assignment of registry ID; longitudinal frequency monitoring.Adoption rate; diffusion curve (innovators, majority, etc.).36

3.3 Semantic Gravity: Modeling Meaning and its Drift

3.3.1 Defining Semantic Change (Drift)

The user’s concept of “semantic drift” refers to the well-documented linguistic phenomenon of semantic change, where a word’s meaning evolves over time.38 This evolution can take many forms, including:

  • Broadening (Generalization): A word’s meaning becomes more general. For example, dogge originally referred to a specific powerful breed of dog but now refers to the entire species.39
  • Narrowing (Specialization): A word’s meaning becomes more specific. For example, meat (Old English mete) once referred to any solid food but is now restricted to animal flesh.38
  • Pejoration: A word acquires a more negative connotation. For example, awful originally meant “full of awe” or “inspiring wonder” but now means “extremely bad”.38
  • Amelioration: A word acquires a more positive connotation. For example, nice evolved from meaning “foolish” or “ignorant” in Old French to “kind” or “thoughtful” in modern usage.38
3.3.2 Computational Models of Semantic Change

The “semantic gravity” that holds a word’s meaning to its etymon can be computationally modeled and measured using diachronic word embeddings.40 This technique provides a quantitative method for tracking semantic drift over time. The methodology involves:

  1. Partitioning a large historical text corpus by time period (e.g., by decade).
  2. Training a separate word embedding model (such as SGNS/word2vec or SVD-based models) on each time-stamped sub-corpus.42
  3. Aligning the resulting vector spaces from each time period using an orthogonal transformation like Procrustes alignment. This creates a shared coordinate system where the positions of word vectors can be compared across time.
  4. Measuring the semantic drift of a word by calculating the geometric distance its vector travels through this aligned space over the decades.42

This analysis has revealed two statistical “laws” that govern the rate of semantic change, providing a predictive basis for the concept of semantic gravity:

  1. The Law of Conformity: The rate of semantic change is inversely proportional to a word’s frequency. Common words are more semantically stable, while rare words tend to change meaning more rapidly.42
  2. The Law of Innovation: The rate of semantic change is directly proportional to a word’s polysemy (the number of distinct senses it has). Words with many meanings change faster.42

These laws allow the MEKA framework to move from a reactive to a proactive stance on semantic drift. By calculating a SemanticStabilityIndex = w1 * log(frequency) – w2 * polysemy_score for every word in the LogOS Codex, the system can predict which words are most at risk of future semantic change. Words with a low stability index can be flagged for closer monitoring by the diachronic embedding system, allowing for the early detection of meaning shifts that could compromise system coherence.

Part III: Structural and Discourse Systems

Chapter 4: Structural Units: The Grammatical Organs of Meaning

This chapter details the “construction schematics” that govern how lexical units are arranged into meaningful, hierarchical structures. It covers the principles of syntactic parsing—the computational analysis of grammatical relationships—and addresses the fundamental challenge of ambiguity that is inherent in this process.

4.1 Syntax: The Blueprint for Meaning Construction

4.1.1 The Role of Parsing

Syntactic parsing is the process of analyzing a sentence to determine its underlying grammatical structure, revealing the relationships between words and phrases.11 The output of a parser is typically a tree structure that represents these relationships, which then serves as the primary input for semantic interpretation.44

4.1.2 Competing Paradigms: Constituency vs. Dependency

Two dominant paradigms exist for representing syntactic structure:

  • Constituency Parsing: This approach, rooted in phrase structure grammars, breaks a sentence down into nested constituents (e.g., Noun Phrases, Verb Phrases). The classic algorithm for this paradigm is the Cocke–Kasami–Younger (CKY) algorithm.43
  • Dependency Parsing: This approach models the grammatical structure as a set of directed, labeled links (dependencies) between words. Each link connects a “head” (e.g., a verb) to a “dependent” (e.g., its subject or object). This representation is often more effective for languages with relatively free word order.43
4.1.3 Advanced Grammar Formalisms

Beyond these basic paradigms, a range of more expressive and linguistically sophisticated grammar formalisms have been developed, each embodying a different architectural philosophy with distinct trade-offs for computational implementation.46 The choice of formalism is a critical architectural commitment for the MEKA system, as it determines how linguistic phenomena are modeled and how the syntax-semantics interface is structured.

  • Head-driven Phrase Structure Grammar (HPSG): An integrated framework that uses typed feature structures to tightly couple syntactic and semantic information.
  • Lexical-Functional Grammar (LFG): A modular architecture that uses parallel representations: a c-structure for surface constituency and an f-structure for abstract grammatical functions. Its focus on abstract functions makes it particularly strong for cross-linguistic analysis.
  • Tree-Adjoining Grammar (TAG): A tree-rewriting system that uses substitution and adjunction operations on elementary trees, providing an elegant mechanism for handling long-distance dependencies.
  • Combinatory Categorial Grammar (CCG): A highly flexible, semantically-driven framework that uses a small set of combinatory rules to compose complex syntactic and semantic representations in parallel.
4.1.4 Construction Grammar (CxG)

A particularly relevant approach for the MEKA framework is Construction Grammar (CxG). This cognitive theory posits that all linguistic knowledge consists of “constructions,” which are learned pairings of form and meaning at all levels of abstraction—from individual morphemes (-ness) to partially filled idioms (drive X crazy) to fully abstract syntactic patterns (Subject-Verb-Object).48 CxG dissolves the strict dichotomy between the lexicon and grammar, viewing them as a continuum of constructions. This aligns seamlessly with the MEKA blueprint’s hierarchical, graft-and-splice model of meaning assembly.49

The following table provides a comparative overview of these key grammar formalisms, highlighting their architectural differences and suitability for computational applications.

FormalismCore Architectural PrincipleRepresentation of StructureHandling of Long-Distance DependenciesSyntax-Semantics InterfaceSuitability for Computational Applications
ConstituencyPhrase structure rules.Hierarchical tree of constituents (NPs, VPs).Handled by recursive rules; can be cumbersome.Sequential; parse tree is input to semantics.Foundation for many parsers; treebanks widely available.
DependencyDirected, labeled relations between words.Graph/tree of head-dependent links.Direct arcs between distant words.Often less formal; can be challenging to map to logic.Efficient; strong for free word-order languages.
HPSGIntegrated, constraint-based.Single, rich typed feature structure.Encoded via feature passing mechanisms.Tightly coupled; syntax and semantics are unified.Powerful for deep linguistic modeling; complex engineering.
LFGModular, parallel projections.C-structure (tree) for constituency; F-structure (AVM) for function.Functional uncertainty over f-structure paths.Modular interface between distinct levels.Excellent for cross-linguistic work and complex predicates.
LTAGTree rewriting system.Elementary trees combined via substitution/adjunction.Adjunction operation naturally captures dependencies.Modular; derivation tree maps to semantics.Strong formal properties; good for parsing and generation.
CCGIntegrated, combinatory logic.Derivation via application of combinatory rules.Composition and type-raising rules.Tightly coupled; semantic composition is parallel to syntax.Highly flexible; strong for non-canonical word order.

4.2 The Challenge of Syntactic Ambiguity

A fundamental challenge in parsing is that a single sentence can often be assigned multiple valid grammatical structures, leading to different interpretations. This phenomenon, known as syntactic ambiguity, is not a rare exception but a pervasive feature of human language.43 The classic humorous example, “I shot an elephant in my pajamas,” illustrates a prepositional phrase attachment ambiguity: does “in my pajamas” modify “I” (the shooter) or “the elephant” (the target)?.52

The pervasiveness of such ambiguity suggests that human language processing is not a deterministic, rule-based procedure. Instead, it is a probabilistic, context-sensitive process that weighs competing interpretations. Computational models have evolved to mirror this cognitive reality.

  • Probabilistic Parsing: To resolve ambiguity, parsers must select the most probable analysis. Probabilistic Context-Free Grammars (PCFGs) achieve this by augmenting a standard CFG with probabilities for each production rule, typically learned from a large corpus of hand-parsed sentences (a treebank). The probability of a complete parse tree is the product of the probabilities of all the rules used in its derivation. The parser’s task is to find the tree with the highest overall probability.43
  • Deep Learning Parsers: Modern state-of-the-art parsers have replaced simple PCFG probabilities with sophisticated scoring functions provided by deep learning models.43 Models based on Transformers can consider rich, sentence-wide contextual information to evaluate the plausibility of different syntactic structures, significantly outperforming earlier statistical methods in ambiguity resolution.54 These systems effectively learn the complex interplay of lexical, syntactic, and semantic constraints that guide human sentence interpretation.55

4.3 The Syntax-Semantics Interface

The syntax-semantics interface is the critical juncture where grammatical structure is translated into a formal representation of meaning.56 In formal semantics, this is typically conceptualized as a mapping from a syntactic parse tree to a logical form, such as an expression in predicate logic or the lambda calculus.56 For the MEKA system to be truly useful, its “construction schematics” (syntax trees) must be systematically convertible into a machine-understandable meaning representation. A robust method for achieving this involves deriving neo-Davidsonian logical forms directly from dependency trees using the lambda calculus, providing a principled way to bridge the gap between syntactic structure and formal meaning.46

Chapter 5: Discourse Units: The Integrated Systems of Communication

This final analytical chapter elevates the blueprint to the highest level of linguistic organization, examining how individual sentences combine to form coherent, large-scale structures such as paragraphs, documents, and interactive dialogues. It details the formalisms used to model the relationships between these discourse units, providing the foundation for understanding and generating coherent, purposeful communication.

5.1 Discourse Structure: From Sentences to Coherent Text

Discourse analysis is the study of language beyond the sentence boundary.59 It seeks to understand how sequences of sentences are woven together to form a coherent whole, guided by principles of

cohesion (the grammatical and lexical links within a text) and coherence (the underlying logical and cognitive connections that make a text meaningful).61

  • Rhetorical Structure Theory (RST): A prominent framework for this analysis is Rhetorical Structure Theory (RST). RST provides a method for describing the functional organization of a text by identifying the rhetorical relations that hold between its parts.62 Text is first segmented into Elementary Discourse Units (EDUs), which are typically clauses. These EDUs are then linked recursively into a hierarchical tree structure. Most relations are asymmetric, connecting a central
    nucleus, which expresses the main point, to a supporting satellite, which modifies or elaborates on the nucleus. For example, in “The system is efficient. It uses a hash map,” the second sentence stands in an Evidence relation to the first, serving as a satellite that supports the nuclear claim.64
  • Segmented Discourse Representation Theory (SDRT): SDRT is a formal dynamic semantic theory that extends these ideas to model the interface between semantics and pragmatics.66 It represents discourse as a structured collection of segments linked by rhetorical relations. Crucially, SDRT includes a “glue logic,” which is a non-monotonic inference system used to compute the most plausible discourse structure for a given text.66 This process of inferring coherence relations allows the system to derive pragmatic implicatures and resolve ambiguities as a natural byproduct of understanding the discourse structure.

This inferential nature of discourse processing reveals a critical architectural requirement. Building a discourse structure, particularly when relations are not explicitly marked, is an act of commonsense reasoning about the speaker’s intentions and world knowledge. To infer a Cause-Effect relation between “The street is wet” and “It rained,” a system must possess knowledge about the physical world. Therefore, the MEKA framework’s discourse module cannot be a self-contained linguistic component; it must interface with a general-purpose knowledge base and an inference engine to successfully operate at this level.

5.2 The Challenge of Implicit Relations

While many discourse relations are signaled by explicit connectives like “because,” “but,” or “so,” a significant portion are left implicit, requiring the reader or listener to infer the connection.69 Recognizing these implicit relations is a key challenge in discourse parsing. Computational models tackle this task by using machine learning classifiers trained on a variety of features, including:

  • Word pair statistics between the two text spans.
  • Features derived from constituent and dependency parse trees of the spans.
  • The surrounding discourse context.69

    The use of large pre-trained language models has significantly advanced the state-of-the-art in implicit relation recognition.70

5.3 Interactive Discourse: Dialogue and Argumentation

Discourse is not limited to monologue; it is often interactive, taking the form of dialogue and argumentation.

  • Computational Models of Dialogue: These models are concerned with both representing the state of a conversation (dialogue modeling) and deciding what a system should do or say next as an active participant (dialogue management).71 Early models relied on rigid, finite-state scripts, which were brittle. More advanced approaches view dialogue as a goal-directed activity. Plan-based models treat speech acts as actions in a plan to achieve a communicative goal, while probabilistic models (e.g., POMDPs) use reinforcement learning to optimize a system’s policy for responding to users.71
  • Computational Argumentation: This field develops formal models to represent and reason about argumentation.72 Frameworks like Abstract Argumentation Frameworks (AAFs) model arguments as nodes in a graph, with directed edges representing “attack” relations.72 Analyzing argumentation in natural language dialogue requires understanding not just the logical content of propositions but also the linguistic cues and discourse structures that signal argumentative moves.74

The evolution of these models from simple state-trackers to goal-oriented planners underscores a fundamental principle: dialogue is a form of strategic, goal-directed action, not merely an exchange of information. An effective conversational agent built on the MEKA framework cannot simply respond to user input; it must function as a rational agent. It needs to maintain a model of the user’s beliefs and goals, and formulate its own utterances as actions within a plan designed to achieve a desired outcome, such as resolving a technical issue or persuading a user. This elevates the system from a language processor to a true collaborative partner.

Part IV: Synthesis and Visual Blueprint

Chapter 6: The MEKA Protocol: Grafting and Splicing Mechanics

This operational chapter synthesizes the preceding analysis into a concrete, multi-level algorithmic protocol. It provides step-by-step procedures for both additive construction (grafting) and recombinative construction (splicing), illustrating how meaning is built and modified within the framework, from morpheme to discourse.

6.1 Grafting Protocol (Additive Construction)

Grafting is the hierarchical, bottom-up process of building complex linguistic structures from simpler components.

  • Step 1: Root Selection (LogOS Codex Lookup): The process begins with the selection of a core meaning unit from the LogOS Codex, the etymological registry.
  • Example: Select the root struct (from Latin struere, “to build”).
  • Step 2: Morphological Grafting (Affixation): Affixes are grafted onto the root to create a word.
  • Select affixes: Prefix con- and suffix -ion.
  • Validate the proposed sequence con-struct-ion against the language’s TSL morphotactic grammar. The sequence is valid.
  • Generate the word: construction.
  • Step 3: Lexical Grafting (Compounding): Words are grafted together to form compounds.
  • Select a partner word: site.
  • Form the compound lexical unit: construction site.
  • Step 4: Syntactic Grafting (Phrase/Clause Building): The lexical unit is embedded into a syntactic frame using the system’s chosen grammar formalism (e.g., LFG).
  • Graft into a Noun Phrase: [NP the [N construction site]].
  • Graft the NP into a clause structure: [VP visited [NP the construction site]].
  • Step 5: Discourse Grafting (Relation Attachment): The newly formed clause is grafted into a larger discourse structure via a rhetorical relation.
  • Example: “ is connected to the new clause via an Elaboration relation: Elaboration(,).

6.2 Splicing Protocol (Recombinative Construction)

Splicing is the process of modifying existing linguistic structures by replacing one component with another at the same level of the hierarchy.

  • Step 1: Identify Target and Splice Point: A target unit and a valid splice point (e.g., a morpheme or phrase boundary) are identified.
  • Example Target: The word construction. Splice points: con- | -struct- | -ion.
  • Step 2: Morphological Splicing (Affix Replacement): An affix is replaced with another from the same class (e.g., prefix for prefix).
  • Replace prefix con- with de-.
  • Validate the resulting sequence de-struct-ion against the TSL morphotactic grammar. The sequence is valid.
  • Resulting word: destruction.
  • Step 3: Syntactic Splicing (Modifier/Phrase Replacement): A syntactic constituent is re-attached to a different node in the parse tree to alter meaning or resolve ambiguity.
  • Example Target: The ambiguous sentence I saw the man with the telescope.
  • The parser initially produces two probable parse trees, one where the Prepositional Phrase with the telescope attaches to man, and one where it attaches to saw.
  • Splicing involves selecting one parse, detaching the PP from its initial attachment point, and re-grafting it to the alternative node. This operation transforms one syntactic interpretation into another, demonstrating how splicing can be used to manipulate structure to achieve a specific meaning.

Chapter 7: The Unified Hierarchy and The Graft-Splice Tree Diagram

7.1 Language as Living Physics: A Computational Perspective

The principles outlined in this blueprint—semantic gravity, neological validation, and discourse inference—demonstrate that the metaphorical notion of “language as a living system” can be grounded in computationally tractable and empirically verifiable processes. Semantic gravity is not an abstract force but a measurable effect of word frequency and polysemy on the rate of semantic drift, trackable via diachronic word embeddings. The P-047 validation loop is not just a filter but a simulation of the evolutionary pressures of selection and adoption that govern lexical innovation. The MEKA framework, with its integrated protocols for construction, validation, and monitoring, thus provides a blueprint for modeling language as a complex adaptive system.

7.2 The Graft-Splice Tree Diagram

The following diagram provides a unified, hierarchical visualization of the language units and construction processes detailed in this report. It serves as the master schematic for the MEKA Language Engineering Blueprint, mapping the complete architecture from foundational atoms to integrated discourse systems.

Code snippet

     │
    ├── Corpus (e.g., legal_corpus)
    │    │ Graft: Pattern Extraction via Diachronic Embeddings
    │
    ├── Document (e.g., report.txt)
    │    │ Graft: Section Aggregation
    │
    ├── Section (e.g., §1. Introduction)
    │    │ Graft: Paragraph Coalescence via RST/SDRT Relations (e.g., Elaboration, Contrast)
    │
    └── Paragraph
          │ Splice: Reorder sentences to change rhetorical flow
          │

    │
    ├── Sentence: “We build bridges.”
    │    │ Graft: Clause Combination
    │
    ├── Clause: “We build bridges”
    │    │ Splice: Transform active (“We build”) to passive (“Bridges are built”)
    │
    ├── Phrase: “build bridges”
    │    │ Graft: Word Combination (Head + Dependent)
    │
    └── Syntax Tree (The Construction Schematic)
          │

    │
    ├── Word: “construct”
    │    │ Splice: con- -> de- => “destruct”
    │
    ├── Compound: “bookstore”
    │    │ Graft: Word + Word
    │
    ├── Term: “semantic gravity”
    │    │ Note: Meaning locked by LogOS Codex.
    │
    └── Neologism: “prumeta” [Validation: P-047 Loop]
          │ Graft: pru- (Root) + meta- (Affix)
          │

    │
    ├── Morpheme: “-ology”, “un-“
    │    │ Graft: Prefix + Root + Suffix
    │
    ├── Root (Etymon): “struct” (Latin: struere)
    │    │ Note: Anchor for Semantic Gravity
    │
    ├── Affix: “de-“, “-ness”
    │    │ Splice: Swap affixes to alter meaning
    │
    └── Syllable: /con/, /struct/
          │

    │
    ├── Grapheme: ‘c’, ‘o’, ‘n’…
    │    │ Map: Grapheme -> Phoneme Set via G2P Model (Context-Aware)
    │
    ├── Phoneme: /k/, /ə/, /n/…
    │
    ├── Diacritic: ‘é’
    │    │ Splice: ‘e’ + ‘´’ -> ‘é’ (modifies phoneme)
    │
    └── Logogram: ‘&’
          │ Map: ‘&’ -> “and” -> /ænd/

Works cited

  1. A Survey of Grapheme-to-Phoneme Conversion Methods – MDPI, accessed August 12, 2025, https://www.mdpi.com/2076-3417/14/24/11790
  2. Grapheme-to-Phoneme Conversion (G2P) – Deepgram, accessed August 12, 2025, https://deepgram.com/ai-glossary/grapheme-to-phoneme-conversion-g2p
  3. Grapheme-to-Phoneme Conversion with Convolutional Neural Networks – MDPI, accessed August 12, 2025, https://www.mdpi.com/2076-3417/9/6/1143
  4. arxiv.org, accessed August 12, 2025, https://arxiv.org/html/2411.07563v1
  5. Low-resource grapheme-to-phoneme mapping … – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/2023.sigmorphon-1.29.pdf
  6. Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer, accessed August 12, 2025, https://aclanthology.org/2023.sigmorphon-1.29/
  7. [2411.07563] Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models – arXiv, accessed August 12, 2025, https://arxiv.org/abs/2411.07563
  8. Graphemes vs. phonemes: battling it out in … – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/2024.conll-babylm.5.pdf
  9. LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context – arXiv, accessed August 12, 2025, https://arxiv.org/html/2505.17410v1
  10. Information Theory as a Bridge Between Language … – Frontiers, accessed August 12, 2025, https://www.frontiersin.org/journals/communication/articles/10.3389/fcomm.2022.657725/full
  11. The Five Stages of Natural Language Processing – Gradient Ascent, accessed August 12, 2025, https://gradient-ascent.com/natural-language-processing-stages/
  12. What is Morphological Analysis in Natural Language Processing (NLP)? – GeeksforGeeks, accessed August 12, 2025, https://www.geeksforgeeks.org/nlp/morphological-analysis-in-nlp/
  13. (PDF) Improving Statistical MT through Morphological Analysis, accessed August 12, 2025, https://www.researchgate.net/publication/220816737_Improving_Statistical_MT_through_Morphological_Analysis
  14. Developing a Hybrid Morphological Analyzer for Low-Resource Languages – MDPI, accessed August 12, 2025, https://www.mdpi.com/2076-3417/15/10/5682
  15. www.numberanalytics.com, accessed August 12, 2025, https://www.numberanalytics.com/blog/ultimate-guide-to-morphology#:~:text=and%20derivational%20morphology%3F-,Inflectional%20morphology%20is%20concerned%20with%20the%20study%20of%20how%20words,from%20other%20words%20or%20roots.
  16. Module 3: Morphology Inflectional and Derivation Morphology | PDF …, accessed August 12, 2025, https://www.scribd.com/document/699224076/nlp
  17. Morphological parsing – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Morphological_parsing
  18. Mastering Natural Language Processing — Part 7 Finite State …, accessed August 12, 2025, https://medium.com/@conniezhou678/mastering-natural-language-processing-part-7-finite-state-transducers-in-nlp-what-they-are-and-ef499f8ad5a4
  19. Finite State Transducer (FSTs) in NLP – GeeksforGeeks, accessed August 12, 2025, https://www.geeksforgeeks.org/nlp/finite-state-transducer-fsts-in-nlp/
  20. (PDF) The Relationship Between English Etymology and Semantic Change – ResearchGate, accessed August 12, 2025, https://www.researchgate.net/publication/385150161_The_Relationship_Between_English_Etymology_and_Semantic_Change
  21. Etymology | EBSCO Research Starters, accessed August 12, 2025, https://www.ebsco.com/research-starters/language-and-linguistics/etymology
  22. Evidence for Early Morphological Decomposition: Combining Masked Priming with Magnetoencephalography – PMC – PubMed Central, accessed August 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC4465380/
  23. Morphotactics – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Morphotactics
  24. Morphotactics as Tier-Based Strictly Local … – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/W16-2019.pdf
  25. Terminology management: What you need to know – POEditor Blog, accessed August 12, 2025, https://poeditor.com/blog/terminology-management/
  26. Terminology Management Systems » Explorer – Nimdzi Insights, accessed August 12, 2025, https://www.nimdzi.com/language-technology-directory/terminology-management-system/
  27. What is a Terminology Management System? – Interpreters & Translators, Inc., accessed August 12, 2025, https://ititranslates.com/what-is-a-terminology-management-system/
  28. (PDF) The Peculiarities Of Neologizms In Language And Speech, accessed August 12, 2025, https://www.researchgate.net/publication/387651835_The_Peculiarities_Of_Neologizms_In_Language_And_Speech
  29. Neologism – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Neologism
  30. the evolution of social media language: a sociolinguistic analysis of recent neologisms, accessed August 12, 2025, https://www.researchgate.net/publication/384927898_THE_EVOLUTION_OF_SOCIAL_MEDIA_LANGUAGE_A_SOCIOLINGUISTIC_ANALYSIS_OF_RECENT_NEOLOGISMS
  31. NeoN: A Tool for Automated Detection, Linguistic and LLM-Driven …, accessed August 12, 2025, https://www.iccs-meeting.org/archive/iccs2025/papers/159060297.pdf
  32. A Hybrid Framework for Neologism Validation using LLMs and Lexical Knowledge Graphs – CEUR-WS.org, accessed August 12, 2025, https://ceur-ws.org/Vol-3972/paper2.pdf
  33. Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler | Request PDF – ResearchGate, accessed August 12, 2025, https://www.researchgate.net/publication/325751214_Mining_the_Web_for_New_Words_Semi-Automatic_Neologism_Identification_with_the_NeoCrawler
  34. Benchmarking Automatic Tools for Neologisms Extraction: Issues and Challenges – CEUR-WS.org, accessed August 12, 2025, https://ceur-ws.org/Vol-3972/paper7.pdf
  35. WordNet, accessed August 12, 2025, https://wordnet.princeton.edu/
  36. Diffusion of Innovations Theory: Definition and Examples, accessed August 12, 2025, https://www.investopedia.com/terms/d/diffusion-of-innovations-theory.asp
  37. Diffusion of innovations – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Diffusion_of_innovations
  38. Semantic change – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Semantic_change
  39. Semantic change – Elizabeth Traugott, accessed August 12, 2025, https://traugott.people.stanford.edu/sites/g/files/sbiybj28616/files/media/file/traugott2017a.pdf
  40. Computational modeling of semantic change – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/2024.eacl-tutorials.1/
  41. Distributional word embeddings in modeling diachronic semantic change – UiO, accessed August 12, 2025, https://www.mn.uio.no/ifi/forskning/aktuelt/arrangementer/disputaser/2020/phd_thesis_kutuzov.pdf
  42. Diachronic Word Embeddings Reveal Statistical Laws of Semantic …, accessed August 12, 2025, https://aryamccarthy.github.io/hamilton2016diachronic/
  43. Syntactic parsing (computational linguistics) – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Syntactic_parsing_(computational_linguistics)
  44. Computational Linguistics – Stanford Encyclopedia of Philosophy, accessed August 12, 2025, https://plato.stanford.edu/entries/computational-linguistics/
  45. The Essential Role of Syntactic and Semantic Parsing in NLP | by Qasim Al-Ma’arif | Medium, accessed August 12, 2025, https://medium.com/@datailm/the-essential-role-of-syntactic-and-semantic-parsing-in-nlp-47b92118d9de
  46. Transforming Dependency Structures to Logical Forms for Semantic Parsing, accessed August 12, 2025, https://aclanthology.org/Q16-1010/
  47. From Quirky Case to Representing Space – CiteSeerX, accessed August 12, 2025, https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=27adcd57c13d1ca1086f468211f702923438444a
  48. Construction grammar – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Construction_grammar
  49. Evolutionary & Hybrid AI – Vrije Universiteit Brussel, accessed August 12, 2025, https://ehai.ai.vub.ac.be/computational-construction-grammar.html
  50. Computational Construction Grammar – Cambridge University Press, accessed August 12, 2025, https://www.cambridge.org/core/elements/computational-construction-grammar/24055547A93605B72BDD4C296C8BAC54
  51. Syntactic ambiguity – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Syntactic_ambiguity
  52. Definition and Examples of Syntactic Ambiguity – ThoughtCo, accessed August 12, 2025, https://www.thoughtco.com/syntactic-ambiguity-grammar-1692179
  53. Unit 5 NLP | PDF | Parsing | Ambiguity – Scribd, accessed August 12, 2025, https://www.scribd.com/document/746471452/Unit-5-NLP
  54. Ambiguity in NLP and how to address them – GeeksforGeeks, accessed August 12, 2025, https://www.geeksforgeeks.org/nlp/ambiguity-in-nlp-and-how-to-address-them/
  55. (PDF) Probabilistic Constraints and Syntactic Ambiguity Resolution – ResearchGate, accessed August 12, 2025, https://www.researchgate.net/publication/230876277_Probabilistic_Constraints_and_Syntactic_Ambiguity_Resolution
  56. Syntax–semantics interface – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Syntax%E2%80%93semantics_interface
  57. Syntax-semantics interface – (Intro to Linguistics) – Vocab, Definition, Explanations | Fiveable, accessed August 12, 2025, https://library.fiveable.me/key-terms/introduction-linguistics/syntax-semantics-interface
  58. Transforming Dependency Structures to Logical Forms for Semantic Parsing, accessed August 12, 2025, https://transacl.org/index.php/tacl/article/view/807
  59. Discourse analysis – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Discourse_analysis
  60. Mastering Discourse Analysis in Computational Linguistics – Number Analytics, accessed August 12, 2025, https://www.numberanalytics.com/blog/mastering-discourse-analysis-computational-linguistics
  61. Discourse Analysis – Computational Linguistics – Socratica, accessed August 12, 2025, https://learn.socratica.com/en/topic/linguistics/computational-linguistics/discourse-analysis
  62. www.researchgate.net, accessed August 12, 2025, https://www.researchgate.net/publication/228339546_Rhetorical_Structure_Theory_in_Natural_Language_Generation#:~:text=Rhetorical%20Structure%20Theory%20is%20a,between%20parts%20of%20the%20text.
  63. Rhetorical structure theory – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Rhetorical_structure_theory
  64. eRST – Enhanced Rhetorical Structure Theory – corpling@GU, accessed August 12, 2025, https://gucorpling.org/erst/
  65. Rhetorical Structure Theory: Description and Construction of Text Structures. – DTIC, accessed August 12, 2025, https://apps.dtic.mil/sti/tr/pdf/ADA173859.pdf
  66. Segmented Discourse Representation Theory (SDRT), accessed August 12, 2025, https://homepages.inf.ed.ac.uk/alex/sdrt.html
  67. Segmented Discourse Representation Theory (SDRT) – (Intro to Semantics and Pragmatics) – Vocab, Definition, Explanations | Fiveable, accessed August 12, 2025, https://library.fiveable.me/key-terms/introduction-semantics-pragmatics/segmented-discourse-representation-theory-sdrt
  68. (PDF) Segmented Discourse Representation Theory: Dynamic Semantics With Discourse Structure – ResearchGate, accessed August 12, 2025, https://www.researchgate.net/publication/226373693_Segmented_Discourse_Representation_Theory_Dynamic_Semantics_With_Discourse_Structure
  69. Recognizing Implicit Discourse Relations in the … – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/D09-1036.pdf
  70. The performance of implicit discourse relation recognition on PDTB – ResearchGate, accessed August 12, 2025, https://www.researchgate.net/figure/The-performance-of-implicit-discourse-relation-recognition-on-PDTB_tbl2_357980809
  71. Computational Approaches to Dialogue – USC Institute for Creative …, accessed August 12, 2025, https://people.ict.usc.edu/traum/public_html/Papers/traum-computational.pdf
  72. Computational Argumentation: A Deep Dive – Number Analytics, accessed August 12, 2025, https://www.numberanalytics.com/blog/computational-argumentation-deep-dive
  73. Background | COMMA – Computational Models of Argument, accessed August 12, 2025, https://comma.csc.liv.ac.uk/background
  74. A Computational Model for Arguments Understanding – ACL Anthology, accessed August 12, 2025, https://aclanthology.org/C90-3024.pdf