A. Purpose & Scope
MLM (Morphological Language Module) governs how words are formed and transformed.
It encodes legal morpheme inventories, allomorphy rules, mutation operators (vowel/consonant interchanges, affix grafts), and productivity constraints—so every neologism has a valid build sheet.
Mantra: Shape carries sense; morphology is meaning in uniform.
- Primary job: Decide whether a proposed form is morphologically valid, productive (worth minting), and distinct from neighbors.
- Downstream: Feeds ELM (origins) and LLM (syntax/pragmatics) with well-formed word candidates.
B. Factory Overview (same machine, new blueprint)
- Blueprints declare morpheme systems, operators, and constraints.
- Templates define the file shapes (schema, JSON Schema, OpenAPI, rulebook, FSTs, seeds, tests).
- Generators render artifacts from blueprints.
- Validators lint structure and test productivity.
- Signers hash & prove provenance.
- Publishers ship to the ledger + services.
Repo skeleton mirrors the prior modules.
C. MLM Blueprints (source of truth)
C1. Module Blueprint (MLM)
name: “Morphological Language Module”intent: “Governed word-formation & mutation”units: morpheme, allomorph, stem, template, mutationmorpheme_inventory: roots, prefixes, suffixes, infixes/circumfixes (if used), clitics (optional)allomorphy_rules: e.g.,-nomicsallomorphs, vowel harmony, consonant assimilationtemplates: combinatory schemas likeROOT + NOMICS,ROOT + O + MICS,LAN + O + MICSmutation_operators:VOWEL_SWAP(site, I→A | O→A | …)CONSONANT_ALTER(K↔C, S↔Z under voicing)AFFIX_GRAFT(+omics, +logic, +metrics, +genics, +dynamics)BOUNDARY_SHIFT(resegment if legal)
constraints: morpheme boundary preservation; illegal clusters; script rulesproductivity_model: which affixes are live vs. fossilized; domain conditioningscores:morphIntegrity, productivity, separability, ambiguityRisk, resonancethresholds: τ_integrity, τ_productivity, τ_separability, τ_ambigdecisions: ACCEPT | REVIEW | REJECTio-contracts: term-in → decision/scores/explain-outglyphs: 𝔇Ξ(grammar support flag for form), Ξ(validation trail), ∴(mint), ✠(ethics passthrough)domain_packs: telecom, energy, AI—tune productivity and confusability tolerances
C2. Affix Pack Blueprints (optional)
- Lists of sanctioned affixes with semantics, selectional restrictions (what stems they attach to), and phonological adjustments.
C3. Seeds Blueprint
- Exemplars like LINOMICS, LANOMICS, LINAMICS, LANAMICS, plus a handful of negative controls (illegal builds) for testing.
D. Templates to Mint Later (content requirements)
- DB Schema (
templates/db/schema.sql.tmpl)- Tables:
morphemes, allomorphs, templates, term_builds, build_checks, decisions, audit_events - Link to
terms(ledger) viaterm_id. - View:
v_term_buildsheet(what parts + what rules produced the word).
- Tables:
- JSON Schema (
templates/schemas/term_build.json.tmpl)- Fields:
morphemes[], allomorphs[], template_id, mutations[], constraints_applied[], scores{}, decision, explain[].
- Fields:
- OpenAPI
- Core addendum:
/mlm/verify(POST) →{decision, scores, explain[], buildsheet} - Optional:
/mlm/suggest(POST) to propose legal variants from a stem.
- Core addendum:
- Rulebook (
templates/rules/mlm_rulebook.md.tmpl)- R0 Morpheme Evidence, R1 Boundary Integrity, R2 Legal Templates, R3 Allomorphy Compliance, R4 Operator Legality, R5 Productivity, R6 Separability, R7 Ambiguity, R8 Ethics passthrough, R9 Overrides.
- FST/Rewrite Rules (
templates/rules/mlm_fst.txt.tmpl)- Finite-state morphotactics + rewrite rules for allomorphy & sandhi (e.g.,
I→A before N/M,O→A in unstressed pre-*-mics,S↔Z voicing).
- Finite-state morphotactics + rewrite rules for allomorphy & sandhi (e.g.,
- Productivity Tables (
templates/data/productivity.yaml.tmpl)- Prior probabilities of affix success by domain; cooldowns for overused patterns.
- Seeds (
templates/data/mlm_seeds.jsonl.tmpl)- Positive and negative constructions with expected decisions.
- Tests (
templates/tests/mlm_cases.json.tmpl)- ACCEPT/REVIEW/REJECT with explicit rule IDs and minimal scores.
- Generator/Validator Stubs
- Renderers; FST compiler checks; affix pack lints; selectional restriction checks.
E. Processing Pipeline (runtime contract to implement later)
Input → Segment → Select Allomorphs → Apply Template → Mutate (if legal) → Enforce Constraints → Score → Decide → Explain
- Segment: Identify candidate morphemes, propose boundaries.
- Allomorphs: Choose surface forms conditioned by phonology/register.
- Template: Fit into a legal pattern (e.g.,
ROOT + NOMICS). - Mutate: Apply only whitelisted operators at licensed sites (your vowel/consonant interchanges live here).
- Constraints: Block boundary violations, illegal clusters, unlicensed scripts.
- Score:
morphIntegrity: boundary preservation + template legalityproductivity: affix/stem compatibility + domain priorsseparability: embedding distance from siblingsambiguityRisk: confusability with neighborsresonance: phonotactic flow for intended channel
- Decide: ACCEPT / REVIEW / REJECT (with ✠ check inherited from ethics layer).
- Explain: Emit buildsheet (what attached where, which rule fired) + human bullets.
F. Scoring (deterministic skeleton)
morphIntegrity= morphotactics OK − penalties for boundary/cluster violations.productivity= P(affix|stem, domain) × historical success × novelty dampening.separability= 1 − cosine(sim(new_form, nearest_neighbors_in_domain)).ambiguityRisk= confusion index (edit distance + phonetic similarity + domain frequency).resonance= phonotactic well-formedness × channel weight.
Default pass (tunable):morphIntegrity ≥ 0.85 ∧ productivity ≥ 0.65 ∧ separability ≥ 0.60 ∧ ambiguityRisk ≤ 0.30 ∧ ethicsPass = true.
G. Validators (what “good” means)
- JSON Schema present & valid (draft 2020-12).
- OpenAPI fully typed; examples provided.
- FST compiles; no orphan states; deterministic where required.
- Affix packs: every affix has selectional restrictions + examples.
- Seeds round-trip: proposed build → expected decision.
- Tests pass and cite rule IDs fired/blocked.
- Drift check: high-frequency affixes throttled by productivity rules (prevents buzzword spirals).
H. Policies & Overrides
- Legal templates only; custom templates require curator sign-off.
- Operator safelist: only declared mutation operators may run.
- Domain packs can tighten/loosen productivity & ambiguity thresholds (e.g., telecom stricter than blog posts).
- Override allowed with rationale; logged immutably in audit.
I. Playbooks (ops steps)
- Author MLM Blueprint (
/blueprints/mlm.yaml) with morpheme inventory, operators, templates, constraints, thresholds. - Dry run: validate blueprint, compile FST, lint affix packs.
- Mint: render schema, schemas, APIs, rulebook, rules, seeds, tests into
/build/MLM/.... - Prove: run tests; check seeds; ensure productivity throttles behave.
- Publish: ship artifacts to ledger + service repos; expose
/mlm/verify.
J. Content Requirements (per artifact, when minted)
- schema.sql:
morphemes, allomorphs, templates, term_builds, build_checks, decisions;v_term_buildsheet. - term_build.json: morphemes, allomorphs, template, mutations, constraints_applied, scores, decision, explain, glyphs.
- OpenAPI:
/mlm/verify, optionally/mlm/suggest. - rulebook: R0–R9 with examples and curator guidance.
- mlm_fst.txt: morphotactics + rewrite rules, each with
RULE_ID,CONTEXT,SCORE/PENALTY. - productivity.yaml: priors, cool-downs, domain weights.
- seeds/tests: LINOMICS-family exemplars plus counterexamples.
K. Runtime Endpoints (to implement after mint)
POST /mlm/verify { term, domain?, channel? }→{ decision, scores, glyphSignature, buildsheet, explain[] }POST /mlm/suggest { stem, domain?, operators? }→{ candidates: [form...], traces: [buildsheet...] }
L. SolveForce Integration (practical hooks)
- When MLM ACCEPTS, attach headers for downstream:
X-MLM-Template: <template-id>X-MLM-Operators: [VOWEL_SWAP(I→A), ...]X-Glyph-Status: Ξ|∴(∴ once minted by ledger) - Gateways and editor plugins surface buildsheets as “how this word was made” receipts.
M. Acceptance Criteria (done = done)
- Factory mints MLM artifacts from blueprint with zero manual edits.
- FST compiles; affix packs lint; seeds round-trip; tests green.
/mlm/verifystub returns decisions with buildsheets & rule-linked explanations.- Audit events written on verify/mint with hashes and operator lists.
- Productivity throttles prevent high-collision spam.
N. Roadmap (after v1)
- Cross-lingual packs (Semitic templatic morphology; Slavic derivational richness; Romance suffix productivity).
- Orthography adapters (multiscript builds with diacritic policies).
- Learned operator priors (telemetry tunes productivity over time).
- Editor-side morphology coach (inline suggestions + legal build previews).
O. Micro-Examples (to calibrate the seeds later)
- ACCEPT — LANOMICS
- Template:
ROOT(lan) + NOMICS - Operator:
VOWEL_SWAP(I→A)from LINOMICS parent - Scores: morphIntegrity .93; productivity .78; separability .71; ambiguityRisk .18 → ACCEPT.
- Template:
- REVIEW — LINAMICS
- Template:
ROOT(lin) + AMICS(nonstandard) - Allomorphy mismatch; productivity borderline → REVIEW with recommendation: use
-nomicsor justify-amics.
- Template:
- REJECT — LANONICS
- Illegal resegmentation causing ambiguous
-onicscollision in electronics domain; high confusion index → REJECT.
- Illegal resegmentation causing ambiguous