A. Purpose & Scope
LLM (Linguistic Language Module) governs the mechanics of language: graphemes → phonemes → morphemes → syntax → semantics → pragmatics. It doesn’t guess; it verifies and explains.
Mantra: Form binds meaning; constraint breeds clarity.
- Primary job: Decide if a piece of language (word/phrase/sentence) is well-formed, coherent, and usable in context.
- Outputs (later): schema, JSON Schemas, OpenAPI, rulebook, FST/CFG grammars, seeds, tests, and reference code—all minted by the factory.
B. Factory Overview (same mechanics as ELM)
- Blueprints (what LLM must do).
- Templates (what files look like).
- Generators (render artifacts from blueprints).
- Validators (lint schemas/APIs/grammars).
- Signers (hash, provenance).
- Publishers (hand to systems/teams).
Repo skeleton identical to ELM, under a new module folder.
C. LLM Blueprints (source of truth)
C1. Module Blueprint (LLM)
name
: “Linguistic Language Module”intent
: “Structure-first governance”units
: grapheme, phoneme(IPA), morpheme, lexeme, token, POS, phrase, clause, utteranceresponsibilities
: parsing, tagging, grammar check, semantic typing, pragmatic fit, mutation proofingscores
:grammarFitness
(𝔇Ξ)semanticCoherence
pragmaticFit
ambiguityRisk
resonance
(phonotactic & cadence)editorialConformity
(house style)
thresholds
: τ_grammar, τ_semantic, τ_pragmatic, τ_ambig, τ_resonance, τ_editorialdecisions
: ACCEPT | REVIEW | REJECTgrammar-spec
: references to CFG/FST resourcesstyle-guides
: SolveForce/sector style constraintsio-contracts
: input text → structured analysis; decision + scores + explain[]glyphs
: uses 𝔇Ξ (grammar), Ξ (recursion/validation), ✠ (ethics), ∴ (settled)
C2. Domain Blueprint (optional)
- Acceptable registers, jargon lists, required disclaimers, channel norms (voice/text/contract).
C3. Seeds Blueprint
- Short sentences/phrases typical for SolveForce (offers, SLAs, callflows) with expected LLM decisions.
D. Templates to Mint Later (content requirements)
- DB Schema (
schema.sql.tmpl
)- Tables:
utterances, tokens, pos_tags, phrases, parses, grammar_checks, style_violations, decisions, audit_events
. - Views:
v_utterance_canonical
(joins structure + decision).
- Tables:
- JSON Schemas (
utterance_record.json.tmpl
)text
,language_code
,tokens[]
,pos[]
,parseTree
,morphemes[]
,ipa?
,scores{}
,glyphSignature{}
,decisions[]
.
- OpenAPI
- Core:
/utterances
(POST analyze),/utterances/{id}
(GET),/utterances/{id}/mint
, search. - LLM Addendum:
/llm/verify
(POST) →{decision, scores, explain[]}
.
- Core:
- Rulebook (
llm_rulebook.md.tmpl
)- R-grammar (CFG compliance), R-style (editorial), R-clarity (ambiguity bounds), R-pragmatics (channel/role fit), R-ethics, overrides, UX (red/amber/green).
- Grammar Resources
- CFG (
grammar.cfg.tmpl
) for phrase/clause structure. - FST (
phonotactics.fst.tmpl
) for phoneme/orthography sanity. - Style Rules (
style.yaml.tmpl
) for SolveForce voice.
- CFG (
- Seeds (
llm_seeds.jsonl.tmpl
)- Example sentences with gold decisions and rationale.
- Tests (
llm_cases.json.tmpl
)- ACCEPT/REVIEW/REJECT cases across syntax/style/pragmatics.
- Generators/Validators (stubs)
- Renderers, CFG/FST linters, OpenAPI linter, schema validator.
E. Processing Pipeline (runtime contract to implement later)
Input → Parse → Tag → ParseTree → Semantics → Pragmatics → Verify → Decide → Explain
- Parse: Unicode-safe tokenization; grapheme & IPA mapping (optional).
- Tag: POS tagging; morpheme segmentation for content words.
- ParseTree: Run CFG; attach 𝔇Ξ = true/false + violation list.
- Semantics: Type words/phrases to SolveForce ontologies; detect contradictions.
- Pragmatics: Check channel/intent/register; ensure disclaimers/constraints.
- Verify: Compute scores; run ✠ ethics; apply thresholds + domain packs.
- Decide: ACCEPT/REVIEW/REJECT; set glyphs (𝔇Ξ, Ξ, ✠, ∴).
- Explain: Human+machine bullet points: rule IDs, examples, fixes.
F. Scoring (deterministic skeleton)
grammarFitness
= CFG conformance − weighted violations.semanticCoherence
= ontology consistency − contradiction risk.pragmaticFit
= register/channel fit + policy compliance.ambiguityRisk
= confusability + unresolved anaphora + scope issues.resonance
= phonotactics + cadence for voice channels.editorialConformity
= style-rule adherence.
Default pass:grammarFitness ≥ 0.80
∧ semanticCoherence ≥ 0.70
∧ pragmaticFit ≥ 0.70
∧ ambiguityRisk ≤ 0.30
∧ editorialConformity ≥ 0.75
∧ ethicsPass = true
.
G. Validators (what “good” means)
- JSON Schema valid (draft 2020-12).
- OpenAPI typed; responses defined; no “any”.
- CFG parses seeds; no unreachable productions; LL(1)/LR sanity.
- FST acyclic, scored; no orphan states.
- Style YAML keys validated; no dangling rule refs.
- Seeds round-trip: analyze → expected decision matches.
- Tests pass with traceable rule IDs.
H. Policies & Overrides
- Channel packs: voice/text/contract profiles tune thresholds & required moves (e.g., “must contain SLA number”).
- Curator override: allowed with rationale; logged immutably.
- Ethics ✠: block harm/deception; brand/legal distinctness enforced.
I. Playbooks (ops steps)
- Author LLM Module Blueprint (
/blueprints/llm.yaml
). - Dry run: validate blueprint & reference grammars.
- Mint: render templates into
/build/LLM/...
. - Prove: run tests; generate ∴ if green.
- Publish to runtime repos; enable
/llm/verify
in the service layer.
J. Content Requirements (per artifact)
- schema.sql: tables for structure, checks, decisions; view
v_utterance_canonical
. - utterance_record.json: tokens, pos, parseTree, morphemes, scores, glyphs, decision, explain[].
- OpenAPI:
/utterances
+/llm/verify
contracts with example payloads. - rulebook: R-sets for grammar, style, pragmatics, ethics, overrides.
- grammar.cfg: minimal English core (NP, VP, PP, S) + domain extensions.
- phonotactics.fst: basic constraints + SolveForce voice tunings.
- style.yaml: sentence length, passive voice, modal policy, citation shape, disclaimer rules.
- seeds/tests: SolveForce-realistic utterances (offers, routing prompts, SLAs, callflows) with gold decisions and explanations.
K. Runtime Endpoints (to be implemented after mint)
POST /llm/verify { text, channel, domain }
→{ decision, scores, glyphSignature, explain[], violations[] }
POST /utterances
→ analyze & persist; write VERIFY event in audit.POST /utterances/{id}/mint
→ ∴ settlement if thresholds met or curator override.
L. SolveForce Integration (headers & routing)
- Emit headers for downstream systems:
X-Glyph-Grammar: 𝔇Ξ
(true/false)X-Glyph-Status: ∴|Ξ
X-LLM-Pragmatics: voice|text|contract
- Router uses these to pick route family and QoS; contracts auto-linted by LLM.
M. Acceptance Criteria (done = done)
- Factory mints LLM artifacts from blueprint without manual edits.
- CFG/FST lint clean; seeds parse; tests pass.
/llm/verify
stub returns decisions with rule-linked explanations.- Audit events written with hashes on analyze/mint/reject.
- SolveForce gateways successfully consume headers and route accordingly.
N. Roadmap (after v1)
- Add Discourse layer (coreference, RST/EDU segmentation).
- Multilingual packs (script/orthography adapters, POS tagset mapping).
- Domain pragmatics (legal, telecom, energy packs).
- Editor plugins mirroring LLM checks in authoring tools.
O. Micro-Examples (to calibrate seeds later)
- ACCEPT (voice prompt):
“Please state your account number slowly after the beep.”
→ grammarFitness .94, pragmaticFit .92, ambiguityRisk .08, ∴ - REVIEW (contract clause):
“Service may be interrupted for any reason.”
→ grammar ok, semantics coherent, ethics/style fail (overbroad, non-compliant) → REVIEW with fixes. - REJECT (ambiguous routing):
“We’ll get to it next period.”
→ high ambiguityRisk (period=month? billing? class?) → REJECT with clarification cue.