A Unified Architectural Framework Across the P→P→L→C→C Continuum
(Physics → Physiology → Linguistics → Cognition → Communication)
I. Introduction: The Mandate for an Empirical Linguistic Engine (ELE)
The development of the Empirical Linguistic Engine (ELE) requires a decisive move away from purely abstract linguistic formalism toward a unified, recursive, and empirically validated computational architecture.
This architecture must:
- Integrate physical and biological constraints
- Respect cognitive and social theories
- Define explicit interfaces, measurable units, and operational principles
- Honor the causal hierarchy:
Physics → Physiology → Linguistics → Cognition → Communication
(P→P→L→C→C)
The Necessity of Grounding and Unification
Historically, the study of language has been fractured:
- Generative linguistics (Chomsky, 1950s) posits an innate Language Acquisition Device (LAD), a specialized, modular system devoted exclusively to language. [1]
- Cognitive linguistics (1970s onward) rejects strict modularity, arguing that language and cognition share mechanisms, and that language is fundamentally embodied and situated. [3]
From this perspective, traditional analyses focused solely on phonetics, syntax, and semantics are insufficient, especially when explaining communication deficits in everyday social interaction. [5]
The ELE adopts an integrative stance:
- It leverages the computational precision of structured representations.
- It mandates adherence to physical and physiological grounding.
- It directly addresses critiques of large language models (LLMs) that claim text-only training is ungrounded and lacks “real understanding”. [6]
ELE does this by explicitly integrating empirical constraints from the physical world into its architecture.
The Computational Imperative: Recursion and Multi-Scale Architectures
Human language is:
- Hierarchical
- Productivity-driven
- Capable of generating infinite expressions from finite means
The key mechanism enabling this is recursion—the embedding of structures within themselves. [8]
This functional requirement induces architectural constraints for any computational linguistic engine.
Modern neural network research shows:
- Complex language systems spontaneously develop hierarchical temporal structures that align with linguistic levels, without explicit pre-programming. [9]
- Multi-Timescale Recurrent Neural Networks (MTRNNs) self-organize into distinct timescales:
- Short timescales (~0.17 words): phonological processes
- Medium timescales (1–10 words): morphological and syntactic structures
- Long timescales (≈360,000+ words): lexical and semantic relationships [9]
This validates a functional separation of linguistic strata and supports the adoption of Recursive Linguistic Models (RLMs). [10]
RLMs:
- Depart from flat, purely sequential processing
- Allow recursive querying of:
- the model itself, or
- an external environment (e.g., a REPL)
This is crucial for:
- Managing vast context lengths
- Modeling human-like discourse
- Handling situated pragmatics at scale
II. Stage 1: Physics — The Aerodynamic Foundation
The linguistic chain begins in physics, not in abstract grammar.
At the foundational level:
- Air pressure and airflow are manipulated.
- These set material limitations and temporal boundaries for language.
Acoustic and Aerodynamic Principles
Speech production (phonation) depends on:
- Vital Capacity (VC) — maximum exhaled air volume after maximal inhalation
- VC and derived metrics (e.g., Phonation Quotient, Estimated Mean Airflow Rate) inform:
- vocal stamina
- maximum utterance length [12]
Key control variable:
- Lung (subglottal) pressure → drives self-oscillation models of the vocal folds. [14]
Outputs at this stage:
- Phonation Trigger
- Acoustic waveform
Examples:
- VC (mL) defines maximum temporal bounds of continuous utterance
- Typical norms:
- Males ≤ 39 years: ~3530 mL
- Females ≤ 39 years: ~3080 mL [12]
These volumes constrain Maximum Phonation Time, which in turn:
- Sets natural boundaries for breath groups and prosodic phrases.
- Tethers abstract linguistic units like “phrase” to physical respiratory capacity. [15]
A realistic ELE must therefore:
- Initialize speech generation with real-time or modeled aerodynamic data.
Historical-Conceptual Grounding: Pneuma
The linkage between air and cognition is ancient:
- Greek Pneuma (πνεῦμα) = “breath”, “wind”, “spirit”. [16]
- In classical thought, pneuma was seen as:
- inhaled air transformed into a vital spirit
- traveling through organs to the brain
- distributed via nerves as “animal spirit” [18]
Anaximenes (~500 BC) equated:
- Soul (psyche) = air (aer), breath (pneuma), and world-encompassing medium. [16]
Though biologically outdated, this framework anticipates ELE’s causal demand:
External air → physiological flow → cognitive control
III. Stage 2: Physiology — Embodied Processor and Motor Command
Here, bulk aerodynamic input becomes:
- Precisely controlled
- Temporally sequenced
- Biomechanically embodied motor output
Mechanisms of Phonation and Muscle Control
Controlled airflow is converted into oscillation via the larynx, regulated by:
- Fine-grained laryngeal muscle control
- Low-dimensional self-oscillation models capturing:
- body-cover differentiation of the vocal folds
- primary vibration modes (shear and compressional) [14]
Neural programming at this stage:
- Converts normalized activation levels of key muscles:
- Cricothyroid (CT)
- Thyroarytenoid (TA)
- Lateral Cricoarytenoid (LCA)
- Posterior Cricoarytenoid (PCA)
- Into physical quantities:
- vocal fold strain
- adduction
- glottal convergence
- stiffness [14]
Critical output:
- Pneuma (Motor Command) — the patterned neural command controlling glottal aerodynamics.
This is the modern physiological analog of the ancient pneuma.
Neurophysiological Constraints and Embodiment
Physiology is the causal bridge between:
- Abstract linguistic form
- Concrete motor action
Sequential motor control:
- Enforces precise timing
- Shapes articulation
- Suggests non-arbitrary foundations for:
- phonotactic rules
- linear order in syntax [19]
Evidence for embodied cognition:
- Language may rely on a “language prewired brain” built upon the Mirror Neuron System (MNS). [3]
- Language processes display interaction-dominant dynamics:
- heavily dependent on situational variables at fine timescales
- word recognition and lexical mapping depend on object visibility and actionfulness in context [20]
Thus:
Proper analysis of language must be conducted at the level of organisms + environment. [20]
Architecturally, ELE must:
- Be non-sequential only (not a strict pipeline)
- Be interaction-dominant and recursive at the sensory input layer
- Dynamically link:
- physiological and initial linguistic layers
- with an evolving external environment model from later cognitive stages
IV. Stage 3: Linguistics — Formal Hierarchical Structure (L-Units)
This stage:
- Formalizes structured organization of the linguistic signal
- Transforms physiological outputs into symbolic representation
Language’s power lies in its hierarchical structure, enabling infinite productivity. [8]
Classical Stratification and Hierarchy
Standard hierarchy:
- Phonology
- Studies sound systems
- Basic unit: Phoneme — smallest sound unit that can distinguish meaning (/p/ vs /b/) [8]
- Morphology
- Studies internal structure of words
- Basic unit: Morpheme — smallest meaningful unit (e.g., plural “-s”) [8]
- Syntax
- Rules for combining words into larger constituents
- Basic objects: phrases and sentences
- Key property: Recursion — embedding phrases within phrases [8]
Generative models (e.g., Government and Binding Theory):
- Propose “movement” and traces (silent copies) in sentence structure
- Psycholinguistic evidence supports traces as part of mental representation. [22]
Defining the Atomic Units of Meaning: The Sememe
Semantics sometimes requires a finer granularity than morphemes.
The Sememe (or seme):
- An indivisible, atomic unit of meaning [21]
- Historically: a “definite idea-content expressed in some linguistic form” [23]
Example:
- “Triangle” = sememe “three-sided straight-lined figure” [23]
Computational Hierarchy and Embodiment
Empirical evidence:
- ERP/MEG studies show morphologically complex words elicit larger responses than monomorphemic words — supporting active morphological decomposition during access. [24]
Neural architectures mirror this:
- Transformer models (e.g., BERT) exhibit systematic info distribution:
- Early layers (1–3): phonological/morphological segmentation
- Middle layers: syntactic structure
- Late layers: lexical and semantic relationships [9]
This:
- Empirically validates functional separation of linguistic levels.
- Confirms that Sememes (atomic meaning units) must be:
Grounded through sensorimotor simulations (Stage 2),
linking semantics with embodied cognition and action. [15]
Thus, there is a biologically plausible feedback loop between:
- Linguistics (Stage 3)
- Physiology (Stage 2)
V. Stage 4: Cognition — Mental Representation and Architectures
Cognition:
- Acquires, stores, and applies linguistic knowledge
- Interfaces formal language structure with the external world
Language–Cognition Interaction
Language and cognition:
- Interact closely rather than being strictly identical or fully separate. [3]
Cognition:
- Builds internal models of the world
Language:
- Acts as a storehouse of cultural wisdom
- Serves as a teacher, adapting cultural knowledge to concrete life situations. [3]
Psycholinguistic Mechanisms: Storage, Retrieval, Generalization
Language processing hinges on:
- Sensory, short-term, and long-term memory coordination. [26]
- Chunking: grouping elements into manageable units
Short-term memory:
- Holds roughly 5–9 chunks [26]
- Directly shapes parsing efficiency and lexical retrieval
Metarules:
- Capture generalizations about language structure
- Do not list structures themselves
- Instead define classes of permissible structures [27]
In frameworks like Generalized Phrase Structure Grammar (GPSG):
- Metarules, Feature Co-occurrence Restrictions, and the Head Feature Convention:
- compactly encode structural generalizations [27]
Cognition also governs:
- Conceptual metaphor
- Prototype categorization [29]
- Figurative language processing (metaphors, idioms) [28]
- Mnemonics and associative networks for long-term recall [30]
Computational Architectures and Limits
To model long-term dependencies and context:
- ELE must use Recursive Linguistic Models (RLMs) [10]
- RLMs:
- handle vast context via recursive subqueries
- mimic attention and memory mechanisms across long discourse
Caution:
- fMRI-based semantic mapping (brain regions vs. conceptual fields) suffers:
- signal loss in temporal regions
- performance-related variability [32]
Thus, ELE’s cognitive validation demands:
- Controlled performance variables
- Multi-modal methods (EEG/MEG, behavioral data, etc.)
VI. Stage 5: Communication — Situated Pragmatics and Intentionality
This stage deals with actual use of language in:
- Social contexts
- Environmental situations
Here, meaning is defined not only by structure but by:
- Intention
- Context
- Social norms
Situated Context and Social Layers
Pragmatics:
- Studies how utterances communicate beyond literal meaning and grammar [34][35]
- Focuses on:
- implications
- inferences
- attitudes
- situational context
Communication operates across social layers [28]:
- Social Norms layer:
- unwritten rules
- respect, politeness, body language
- when/how to speak
- Friendships layer:
- humor, empathy, in-jokes
- slang and non-literal uses
Defining the Unit of Interaction: The Pragmeme
Analogous to Sememe (semantic atom), Pragmeme:
- Proposed to clarify “indirect speech act” phenomena [36]
- Defined as:
- a unit realized through pragmatic acts
- tightly bound to situation and context [36]
A Pragmeme may:
- Arise through Pragmaticalization
- Carry its own illocutionary force [37]
Measuring it requires:
- Rich pragmatic assessment protocols
- Inclusion of:
- linguistic
- extra-linguistic
- paralinguistic signals [5]
Cross-cultural work needs coding schemes that distinguish:
- Core Code Categories (replicable, e.g., syntactic downgrading)
- Situated Code Categories (culture-specific, context-bound) [38]
Intentionality and Theory of Mind
Successful communication depends on:
- Inferring others’ beliefs, intentions, and perspectives
- This is Theory of Mind (ToM) [39]
For ELE:
- ToM-like inference is required for full pragmatic competence
- Recent work shows advanced LLMs can pass false-belief tasks, suggesting:
- sufficient structural complexity to model basic ToM reasoning [39]
The Pragmeme:
- Is the highest-order grounding unit:
- Validates and adapts cognitive constructs (Stage 4)
- Constrains them via real-world environment and social dynamics [20][36]
This demands an interaction-dominant architecture:
- Continuous adaptation
- Dynamic sensitivity to context and norms
VII. The Empirical Linguistic Engine Architecture and Conclusions
The ELE unifies five domains:
- Physics
- Physiology
- Linguistics
- Cognition
- Communication
In this framework:
- Physical constraints start the chain
- Physiology embodies it
- Linguistics structures it
- Cognition indexes and recursively manipulates it
- Communication situates and validates it
The Integrated P→P→L→C→C Model
The architecture must:
- Enforce the causal hierarchy: Physics → Physiology → Linguistics → Cognition → Communication
- Allow dynamic, recursive interaction between layers
- Especially connect:
- early embodiment
- later situational models
Table 1. The ELE Architectural Hierarchy
| Stage in Continuum | Primary Scientific Domain | Operational Function / Output | Key Discrete Unit (Computational / Linguistic) | Example Empirical Measurement |
|---|---|---|---|---|
| Physics | Fluid Dynamics / Acoustics | Airflow generation and resonance initiation | Phonation Trigger / Acoustic Waveform | Vital Capacity (mL); Lung Pressure (cm H₂O) [12] |
| Physiology | Neurobiology / Motor Control | Laryngeal & articulatory muscle activation and vibration | Pneuma (Physiological / Motor Command) [14] | Cricothyroid (CT) muscle activation level [14] |
| Linguistics | Formal / Generative Theory | Structuring form and basic meaning | Morpheme (Minimal Meaningful Unit) [8] | ERP/MEG morphological decomposition effects [24] |
| Cognition | Psycholinguistics / Memory | Conceptual indexing, retrieval, and recursive structuring | Sememe (Atomic Semantic Feature) [21] | Chunking capacity (≈5–9 items); lexical retrieval time [26] |
| Communication | Pragmatics / Sociolinguistics | Situated contextual interpretation and intentionality | Pragmeme (Situationally Bound Speech Act) [36] | Pragmatic protocol score (topic coherence, appropriateness, etc.) [5] |
Table 2. Cognitive and Computational Correlates of Linguistic Layers
| Linguistic Level | Cognitive / Computational Correlate | Timescale / Locus (LLM / Human) | Evidence / Function |
|---|---|---|---|
| Phonology / Letter | Phonological processes; sound-to-letter mapping | Short timescale (~0.17 words); early layers [9] | Fast neurons encode basic acoustic sequence data [9] |
| Morphology / Word | Morphological segmentation; basic syntactic structure | Medium timescale (1–10 words); middle layers [9] | Processing minimal meaningful units and phrase structure [8] |
| Syntax / Phrase | Phrase-structure parsing and recursion (embedding) | Generative mechanism; RLM context management [8][10] | Enables infinite productivity via hierarchical organization [8] |
| Semantics / Lexical | Lexical and semantic relationships; conceptualization | Long timescale (≈360,000+ words); late layers [9] | Storage of cultural knowledge; embodied meaning indexing [3][9] |
| Pragmatics / Discourse | Situational adaptation; Theory of Mind inference; discourse-level control | External environment; interaction-dominant dynamics [20] | Defines meaning via context, intent, social norms, and ToM-based inference [28][39] |
Conclusions and Architectural Recommendations
Successful implementation of the ELE requires three architectural directives:
- Mandatory Embodiment and Sensorimotor Grounding
- Semantic units (Sememes) must be indexed through simulated physical actions or “imagined manipulation.” [15]
- Grounding is rooted in the physiological stage (mirror neuron system, embodiment) [3]
- This directly addresses critiques of ungrounded, text-only models. [6]
- Adoption of Recursive, Multi-Scale Processing
- Use architectures like RLMs to exploit functional hierarchies (fast vs. slow neurons). [9][10]
- Handle vast contexts by recursive querying of environment and memory.
- Essential for:
- Cognitive generalization (Metarules, chunking)
- Pragmatic adaptation (Pragmemes, discourse integration)
- Dynamic Interaction-Dominant Architecture
- The engine must not be a rigid pipeline.
- Empirical evidence shows situational context influences low-level processing in real time. [20]
- Therefore:
- Physiological and early linguistic layers must remain dynamically linked to the pragmatic situational model.
- Meaning is continuously constrained by environment and social norms (Pragmeme layer). [36]
For empirical validation, ELE must:
- Go beyond text-only corpora
- Incorporate multimodal data:
- Aerodynamic measures (VC)
- Muscle activation signals
- High-temporal-resolution neuroimaging (EEG/MEG)
- Actively correct for:
- fMRI signal loss in temporal regions
- Performance variability across groups and tasks [32]
By anchoring all formal linguistic units to measurable physical and physiological constraints, the Empirical Linguistic Engine provides a robust, human-plausible framework for next-generation computational language models.
It is:
- Physically grounded
- Biologically embodied
- Linguistically structured
- Cognitively recursive
- Communicatively situated
— a full P→P→L→C→C continuum rendered as a single coherent architecture.