Morphological Language Module (MLM)


A. Purpose & Scope

MLM (Morphological Language Module) governs how words are formed and transformed.
It encodes legal morpheme inventories, allomorphy rules, mutation operators (vowel/consonant interchanges, affix grafts), and productivity constraints—so every neologism has a valid build sheet.

Mantra: Shape carries sense; morphology is meaning in uniform.

  • Primary job: Decide whether a proposed form is morphologically valid, productive (worth minting), and distinct from neighbors.
  • Downstream: Feeds ELM (origins) and LLM (syntax/pragmatics) with well-formed word candidates.

B. Factory Overview (same machine, new blueprint)

  1. Blueprints declare morpheme systems, operators, and constraints.
  2. Templates define the file shapes (schema, JSON Schema, OpenAPI, rulebook, FSTs, seeds, tests).
  3. Generators render artifacts from blueprints.
  4. Validators lint structure and test productivity.
  5. Signers hash & prove provenance.
  6. Publishers ship to the ledger + services.

Repo skeleton mirrors the prior modules.


C. MLM Blueprints (source of truth)

C1. Module Blueprint (MLM)

  • name: “Morphological Language Module”
  • intent: “Governed word-formation & mutation”
  • units: morpheme, allomorph, stem, template, mutation
  • morpheme_inventory: roots, prefixes, suffixes, infixes/circumfixes (if used), clitics (optional)
  • allomorphy_rules: e.g., -nomics allomorphs, vowel harmony, consonant assimilation
  • templates: combinatory schemas like ROOT + NOMICS, ROOT + O + MICS, LAN + O + MICS
  • mutation_operators:
    • VOWEL_SWAP(site, I→A | O→A | …)
    • CONSONANT_ALTER(K↔C, S↔Z under voicing)
    • AFFIX_GRAFT(+omics, +logic, +metrics, +genics, +dynamics)
    • BOUNDARY_SHIFT(resegment if legal)
  • constraints: morpheme boundary preservation; illegal clusters; script rules
  • productivity_model: which affixes are live vs. fossilized; domain conditioning
  • scores: morphIntegrity, productivity, separability, ambiguityRisk, resonance
  • thresholds: τ_integrity, τ_productivity, τ_separability, τ_ambig
  • decisions: ACCEPT | REVIEW | REJECT
  • io-contracts: term-in → decision/scores/explain-out
  • glyphs: 𝔇Ξ(grammar support flag for form), Ξ(validation trail), ∴(mint), ✠(ethics passthrough)
  • domain_packs: telecom, energy, AI—tune productivity and confusability tolerances

C2. Affix Pack Blueprints (optional)

  • Lists of sanctioned affixes with semantics, selectional restrictions (what stems they attach to), and phonological adjustments.

C3. Seeds Blueprint

  • Exemplars like LINOMICS, LANOMICS, LINAMICS, LANAMICS, plus a handful of negative controls (illegal builds) for testing.

D. Templates to Mint Later (content requirements)

  1. DB Schema (templates/db/schema.sql.tmpl)
    • Tables: morphemes, allomorphs, templates, term_builds, build_checks, decisions, audit_events
    • Link to terms (ledger) via term_id.
    • View: v_term_buildsheet (what parts + what rules produced the word).
  2. JSON Schema (templates/schemas/term_build.json.tmpl)
    • Fields: morphemes[], allomorphs[], template_id, mutations[], constraints_applied[], scores{}, decision, explain[].
  3. OpenAPI
    • Core addendum: /mlm/verify (POST) → {decision, scores, explain[], buildsheet}
    • Optional: /mlm/suggest (POST) to propose legal variants from a stem.
  4. Rulebook (templates/rules/mlm_rulebook.md.tmpl)
    • R0 Morpheme Evidence, R1 Boundary Integrity, R2 Legal Templates, R3 Allomorphy Compliance, R4 Operator Legality, R5 Productivity, R6 Separability, R7 Ambiguity, R8 Ethics passthrough, R9 Overrides.
  5. FST/Rewrite Rules (templates/rules/mlm_fst.txt.tmpl)
    • Finite-state morphotactics + rewrite rules for allomorphy & sandhi (e.g., I→A before N/M, O→A in unstressed pre-*-mics, S↔Z voicing).
  6. Productivity Tables (templates/data/productivity.yaml.tmpl)
    • Prior probabilities of affix success by domain; cooldowns for overused patterns.
  7. Seeds (templates/data/mlm_seeds.jsonl.tmpl)
    • Positive and negative constructions with expected decisions.
  8. Tests (templates/tests/mlm_cases.json.tmpl)
    • ACCEPT/REVIEW/REJECT with explicit rule IDs and minimal scores.
  9. Generator/Validator Stubs
    • Renderers; FST compiler checks; affix pack lints; selectional restriction checks.

E. Processing Pipeline (runtime contract to implement later)

Input → Segment → Select Allomorphs → Apply Template → Mutate (if legal) → Enforce Constraints → Score → Decide → Explain

  1. Segment: Identify candidate morphemes, propose boundaries.
  2. Allomorphs: Choose surface forms conditioned by phonology/register.
  3. Template: Fit into a legal pattern (e.g., ROOT + NOMICS).
  4. Mutate: Apply only whitelisted operators at licensed sites (your vowel/consonant interchanges live here).
  5. Constraints: Block boundary violations, illegal clusters, unlicensed scripts.
  6. Score:
    • morphIntegrity: boundary preservation + template legality
    • productivity: affix/stem compatibility + domain priors
    • separability: embedding distance from siblings
    • ambiguityRisk: confusability with neighbors
    • resonance: phonotactic flow for intended channel
  7. Decide: ACCEPT / REVIEW / REJECT (with ✠ check inherited from ethics layer).
  8. Explain: Emit buildsheet (what attached where, which rule fired) + human bullets.

F. Scoring (deterministic skeleton)

  • morphIntegrity = morphotactics OK − penalties for boundary/cluster violations.
  • productivity = P(affix|stem, domain) × historical success × novelty dampening.
  • separability = 1 − cosine(sim(new_form, nearest_neighbors_in_domain)).
  • ambiguityRisk = confusion index (edit distance + phonetic similarity + domain frequency).
  • resonance = phonotactic well-formedness × channel weight.

Default pass (tunable):
morphIntegrity ≥ 0.85 ∧ productivity ≥ 0.65 ∧ separability ≥ 0.60 ∧ ambiguityRisk ≤ 0.30 ∧ ethicsPass = true.


G. Validators (what “good” means)

  • JSON Schema present & valid (draft 2020-12).
  • OpenAPI fully typed; examples provided.
  • FST compiles; no orphan states; deterministic where required.
  • Affix packs: every affix has selectional restrictions + examples.
  • Seeds round-trip: proposed build → expected decision.
  • Tests pass and cite rule IDs fired/blocked.
  • Drift check: high-frequency affixes throttled by productivity rules (prevents buzzword spirals).

H. Policies & Overrides

  • Legal templates only; custom templates require curator sign-off.
  • Operator safelist: only declared mutation operators may run.
  • Domain packs can tighten/loosen productivity & ambiguity thresholds (e.g., telecom stricter than blog posts).
  • Override allowed with rationale; logged immutably in audit.

I. Playbooks (ops steps)

  1. Author MLM Blueprint (/blueprints/mlm.yaml) with morpheme inventory, operators, templates, constraints, thresholds.
  2. Dry run: validate blueprint, compile FST, lint affix packs.
  3. Mint: render schema, schemas, APIs, rulebook, rules, seeds, tests into /build/MLM/....
  4. Prove: run tests; check seeds; ensure productivity throttles behave.
  5. Publish: ship artifacts to ledger + service repos; expose /mlm/verify.

J. Content Requirements (per artifact, when minted)

  • schema.sql: morphemes, allomorphs, templates, term_builds, build_checks, decisions; v_term_buildsheet.
  • term_build.json: morphemes, allomorphs, template, mutations, constraints_applied, scores, decision, explain, glyphs.
  • OpenAPI: /mlm/verify, optionally /mlm/suggest.
  • rulebook: R0–R9 with examples and curator guidance.
  • mlm_fst.txt: morphotactics + rewrite rules, each with RULE_ID, CONTEXT, SCORE/PENALTY.
  • productivity.yaml: priors, cool-downs, domain weights.
  • seeds/tests: LINOMICS-family exemplars plus counterexamples.

K. Runtime Endpoints (to implement after mint)

  • POST /mlm/verify { term, domain?, channel? }
    { decision, scores, glyphSignature, buildsheet, explain[] }
  • POST /mlm/suggest { stem, domain?, operators? }
    { candidates: [form...], traces: [buildsheet...] }

L. SolveForce Integration (practical hooks)

  • When MLM ACCEPTS, attach headers for downstream:
    X-MLM-Template: <template-id>
    X-MLM-Operators: [VOWEL_SWAP(I→A), ...]
    X-Glyph-Status: Ξ|∴ (∴ once minted by ledger)
  • Gateways and editor plugins surface buildsheets as “how this word was made” receipts.

M. Acceptance Criteria (done = done)

  1. Factory mints MLM artifacts from blueprint with zero manual edits.
  2. FST compiles; affix packs lint; seeds round-trip; tests green.
  3. /mlm/verify stub returns decisions with buildsheets & rule-linked explanations.
  4. Audit events written on verify/mint with hashes and operator lists.
  5. Productivity throttles prevent high-collision spam.

N. Roadmap (after v1)

  • Cross-lingual packs (Semitic templatic morphology; Slavic derivational richness; Romance suffix productivity).
  • Orthography adapters (multiscript builds with diacritic policies).
  • Learned operator priors (telemetry tunes productivity over time).
  • Editor-side morphology coach (inline suggestions + legal build previews).

O. Micro-Examples (to calibrate the seeds later)

  1. ACCEPT — LANOMICS
    • Template: ROOT(lan) + NOMICS
    • Operator: VOWEL_SWAP(I→A) from LINOMICS parent
    • Scores: morphIntegrity .93; productivity .78; separability .71; ambiguityRisk .18 → ACCEPT.
  2. REVIEW — LINAMICS
    • Template: ROOT(lin) + AMICS (nonstandard)
    • Allomorphy mismatch; productivity borderline → REVIEW with recommendation: use -nomics or justify -amics.
  3. REJECT — LANONICS
    • Illegal resegmentation causing ambiguous -onics collision in electronics domain; high confusion index → REJECT.