A. Purpose & Scope
GLM (Graphemic Language Module) governs the form of writing: which glyphs exist, how they combine into grapheme clusters, how text is normalized, rendered, transliterated, and secured (confusables, mixed-script traps), and how these decisions propagate to the rest of the stack.
Mantra: Shape is signal.
If MLM is the word foundry and SDM the zoning board of sense, GLM is the building code for letters.
Primary jobs
- Define and validate glyph inventories and script policies (Latin, Greek, Cyrillic… plus domain packs).
- Govern diacritic logic, ligatures, joiners, variation selectors, and cluster boundaries.
- Enforce normalization, confusable control, and render portability across platforms.
- Provide transliteration and orthography adapters for cross-script/cross-market deployment.
- Produce explanations and receipts: what codepoints were used, why, and how they were made safe and readable.
B. Factory Overview (same machine, new blueprint)
- Blueprints — declare inventories, policies, transforms, and risk rules.
- Templates — shape artifacts: schema, JSON Schemas, OpenAPI, rulebooks, mapping tables, seeds, tests.
- Generators — render final files from blueprints.
- Validators — compile grapheme rules, check confusables, simulate renders.
- Signers — hash & record provenance.
- Publishers — ship to the ledger + SolveForce/Logos clients.
C. GLM Blueprints (source of truth)
C1. Module Blueprint (GLM)
name: “Graphemic Language Module”intent: “Governed writing-form and safety”units: Glyph, Codepoint, Cluster, Script, Diacritic, Ligature, Joiner, Variant, Transliteration Mapscript_policies: allowed scripts, mixed-script rules, forbidden joins, casing rules, digit setsnormalization_policy: canonical (e.g., NFC) with allowed exceptions by channelconfusable_policy: detection thresholds, safelist/banlist, remediation strategies (substitution, annotate, reject)diacritic_rules: attachment legality, stacking limits, lossless fallback rulescluster_rules: grapheme cluster segmentation, boundary legality, shaping constraintstransliteration_maps: Latin ⇄ Greek ⇄ Cyrillic (and others), reversible where possible; lossy flagsrender_profiles: target OS/browser/editor/font stacks; required render proofsscores:graphemeIntegrity, confusabilityRisk, renderPortability, readability, accessibility, codepointSafety, typographicHarmonythresholds: τ_integrity, τ_confusable, τ_portability, τ_readability, τ_accessibility, τ_safetydecisions: ACCEPT | REVIEW | REJECT (per string/term or per policy change)io-contracts: text-in + channel/domain → normalized text + decision + scores + explain[]glyphs: ⌗ (grapheme-checked), Ξ (validated), ∴ (settled), ✠ (ethics)
C2. Inventory Blueprints
- Base Latin set for SolveForce; extension packs for Greek, Cyrillic, Arabic, etc.
- Domain glyph packs (telecom symbols, energy units, math/logic marks) with use permissions.
C3. Seeds Blueprint
- Positive: safe ASCII + sanctioned diacritics;
- Edge: mixed-script lookalikes (“a” vs “а”), ZWJ/ZWNJ misuse, stacked diacritics, ligature-only forms;
- Negative: homoglyph spoofs, forbidden clusters, unsafe controls.
D. Templates to Mint Later (content requirements)
- DB Schema (
templates/db/schema.sql.tmpl)- Tables:
glyphs, codepoints, scripts, clusters, policies, transforms, decisions, audits - Views:
v_grapheme_inventory,v_text_proof(input→normalized→flags→scores).
- Tables:
- JSON Schemas (
templates/schemas/*.json.tmpl)text_proof_request.json:{ text, channel, domain, target_scripts?, render_profiles? }text_proof_response.json:{ normalized, decision, scores{}, warnings[], explain[], receipts{} }inventory_record.json: glyph/script definitions and status.
- OpenAPI
/glm/verify(POST text) → graphemic decision + normalized output + reasons./glm/inventory(GET/POST) to list/update sanctioned glyphs./glm/transliterate(POST) → mapped string + fidelity notes./glm/confusables(POST) → report of risky spans.
- Rulebook (
templates/rules/glm_rulebook.md.tmpl)- R0 Script Legality, R1 Normalization, R2 Cluster Boundaries, R3 Diacritics, R4 Confusables, R5 Controls & Joiners, R6 Render Portability, R7 Accessibility, R8 Overrides.
- Transforms
normalization_profiles.yaml.tmpl(e.g., NFC default, NFKC for legacy),confusable_map.txt.tmpl(homoglyph sets + weights),transliteration/*.yaml.tmpl(pairwise mappings, reversible flags),render_profiles.yaml.tmpl(font stacks & test matrices).
- Seeds/Tests
glm_seeds.jsonl.tmpl,glm_cases.json.tmpl—ACCEPT/REVIEW/REJECT with explicit rule triggers.
- Generator/Validator Stubs
- Cluster segmenter, normalization/applier, confusable finder, render simulator (font fallback), accessibility checks (screen reader hints).
E. Processing Pipeline (runtime contract to implement later)
Input → Inspect → Normalize → Analyze → Score → Decide → Explain
- Inspect
- Detect scripts, controls, joiners, diacritics; mark mixed-script spans and suspicious clusters.
- Normalize
- Apply canonical policy (e.g., NFC) with channel-specific overrides; record deltas and any lossy steps (never silent).
- Analyze
- Cluster legality (grapheme boundaries), diacritic stacking limits, confusable sets (lookalikes), controls (ZWJ/ZWNJ, bidi marks), render proofs (font coverage/fallback), accessibility (pronounceability, alt mappings).
- Score
graphemeIntegrity(legal clusters + policy match)confusabilityRisk(weighted homoglyph proximity & mixing)renderPortability(coverage across profiles)readability(cluster simplicity; diacritic burden; casing)accessibility(screen-reader fidelity; ASCII fallback quality)codepointSafety(controls/privates/forbidden blocks)typographicHarmony(spacing/kerning risk, ligature reliance)
- Decide
- ACCEPT / REVIEW / REJECT under thresholds; auto-remediation suggestions if safe (e.g., prefer base+diacritic to ambiguous precomposed form).
- Explain
- Emit text proof: before/after, rules fired, confusable spans highlighted, render matrix, accessibility notes, and headers for downstream modules.
F. Scoring (deterministic skeleton)
graphemeIntegrity= 1 − (illegal_cluster_penalties + policy_violations)confusabilityRisk= max(homoglyph_weighted_score, mixed_script_factor)renderPortability= min(coverage across target profiles)readability= function(diatrics_count, cluster_complexity, case consistency)accessibility= min(screen_reader_similarity, fallback fidelity)codepointSafety= 1 − unsafe_codepoint_ratio
Default pass (tunable):graphemeIntegrity ≥ 0.90 ∧ confusabilityRisk ≤ 0.20 ∧ renderPortability ≥ 0.85 ∧ readability ≥ 0.75 ∧ accessibility ≥ 0.80 ∧ codepointSafety ≥ 0.95 ∧ ethicsPass = true.
G. Validators (what “good” means)
- JSON Schemas validate with examples.
- OpenAPI typed; examples provided.
- Inventory: no orphan glyphs; scripts labeled; status (allowed/review/banned).
- Transforms parse; transliteration maps are acyclic where required.
- Seeds/Tests pass; each test cites which rule IDs fired.
- Render simulation: no ACCEPT where coverage < policy threshold.
H. Policies & Overrides
- Mixed-script policy: default deny, with explicit allowlists per domain/channel.
- Confusable remediation: prefer safe lookalike replacements or annotate with combining marks; log all remaps.
- Controls & Joiners: ZWJ/ZWNJ, bidi marks allowed only in whitelisted contexts.
- Overrides: curator-required with rationale; audit stored immutably.
I. Playbooks (ops steps)
- Author GLM Blueprint (
/blueprints/glm.yaml) with script policies, normalization, confusables, transliteration. - Dry run: validate inventories; compile confusable tables; run seed cases.
- Mint: render DB schema, JSON Schemas, OpenAPI, rulebook, transforms, seeds, tests into
/build/GLM/.... - Prove: run tests + render simulations; confirm accessibility checks.
- Publish: enable endpoints; wire to ledger + editors.
J. Content Requirements (when minted)
- schema.sql:
glyphs, codepoints, scripts, clusters, policies, transforms, decisions, audits; viewsv_grapheme_inventory,v_text_proof. - text_proof_request/response.json: as above.
- OpenAPI:
/glm/verify,/glm/inventory,/glm/transliterate,/glm/confusables. - rulebook: R0–R8 with examples and remediation patterns.
- transforms: normalization profiles, confusable maps, transliteration tables, render profiles.
- seeds/tests: representative examples (safe, edgy, malicious).
K. Runtime Endpoints (to implement after mint)
POST /glm/verify { text, channel, domain, target_scripts?, render_profiles? }→{ normalized, decision, scores, warnings[], explain[], receipts{} }POST /glm/transliterate { text, source_script, target_script }→{ mapped, fidelity: lossless|lossy, notes[] }POST /glm/confusables { text }→{ spans: [{i,j,type,neighbors}], risk }GET|PATCH /glm/inventory→ manage glyph packs.
L. Interlocks (binding tissue)
- Feeds MLM: validates that candidate terms use legal clusters; provides safe remaps before morphology scoring.
- Feeds SDM: ensures the surface form uniquely signals the intended sense (confusable risk included in ambiguity).
- Feeds ILM: supplies transliteration and orthography adapters per domain/geography.
- Feeds PLM: channel-specific render/accessibility guidance (e.g., voice UIs ignore silent diacritics).
- Used by ALM: preflight step for any publication; receipts include codepoint lists and remaps.
Downstream HeadersX-GLM-Normalized: NFC|…X-GLM-ConfusableRisk: <score>X-GLM-RenderProfiles: ok/<list>X-Glyph-Status: ⌗|Ξ|∴
M. Acceptance Criteria (done = done)
- Factory mints GLM artifacts from blueprint with zero manual edits.
- Inventories & transforms load; seeds/tests pass; render sims green.
/glm/verifyreturns normalized forms + decisions + rationales.- Confusable & mixed-script traps are caught; safe remaps proposed.
- Headers consumed by other modules; logs/receipts hash to ledger.
N. Roadmap
- Font-agnostic shaping tests (cover complex scripts: Arabic, Indic, SE Asian).
- Dynamic confusable lists updated from telemetry.
- Perceptual readability model (human-in-the-loop judgments).
- Accessibility exporters (phonemic/ASCII fallbacks for low-vision and TTS).
- Right-to-left + bidirectional policy packs with strict joiner rules.
O. Micro-Examples (seed calibrators)
- ACCEPT —
LANOMICS(pure Latin, NFC, no confusables)graphemeIntegrity .99, confusabilityRisk .02, renderPortability .98→ ACCEPT.
- REVIEW —
LАNOMICSwhere “A” is Cyrillic U+0410- Mixed script; indistinguishable visually → suggest replace with Latin A; else REJECT for public branding.
- REJECT — term with stacked diacritics + ZWJ misuse
- Illegal cluster; screen-reader corruption risk; renderPortability low → REJECT + remediation plan.
Notes for SolveForce & Ron’s corpus (Logos Codex / Linomics)
- Brand safety: GLM enforces that SolveForce marks are single-script, non-confusable, and accessible.
- Linomics/LANOMICS family: register sanctioned diacritics (if any), preferred transliterations, and ASCII fallbacks for low-fidelity channels.
- Mutation Ledger tie-in: every minted term stores the codepoint recipe and normalization path as part of the audit trail.