Master Cross-Script Lattice Index (MCLI v1.0)

Below is the Master Cross-Script Lattice Index (MCLI v1.0): a single, operational map that ties every minted module into one coherent, searchable structure. It’s compact enough to read, explicit enough to implement, and recursive enough to grow without breaking.

I’m giving you:

  1. a canonical schema (what every entry must look like),
  2. a registry of script keys (stable handles),
  3. the core phoneme class inventory (the backbone),
  4. cross-script mappings for representative inventory (stops, fricatives, nasals, liquids, glides, vowels/diphthongs),
  5. orthographic operators (diacritics, digraphs, conjuncts, finals, join rules),
  6. validation rules & determinism (so engines behave),
  7. recipes to auto-extend to the rest of the inventory without ambiguity.

You can drop this into any datastore (JSON, YAML, SQL, RDF); I’ll show it as readable YAML. Engines can ingest it as-is.


0) Schema (canonical)

mcli_version: "1.0"
updated: "2025-08-11"
license: "open-use-attribution"
id_policy:
  uid: "mcli::<class>::<feature-hash>::<script>::<glyph-id>"
  hash: "sha256 of normalized payload"
node_types:
  - class_node: "abstract phoneme class (e.g., voiceless_bilabial_stop)"
  - script_node: "script-level registry (Latin, Devanagari, Arabic, etc.)"
  - glyph_node: "concrete grapheme/glyph in a script"
  - operator_node: "diacritic, conjunct operator, shaping rule"
common_fields:
  class_id: "e.g., CLS.P.STOP.BILABIAL.VL"
  ipa: "IPA target; arrays allowed"
  features: { manner, place, voice, length, nasal, aspirated, retroflex, palatal, rounded, front, diphthong, syllabic, inherent_vowel, nukta, matra, conjunct, multigraph, final_form, join_type, order }
  scripts: "array of script entries"
script_entry_fields:
  script: "key from registry"
  glyph: "Unicode literal(s) or multigraph"
  translit: "canonical Latin chain (e.g., IAST, ISO 9, ALA-LC)"
  orthography:
    case: "upper|lower|none"
    digraph: true|false
    multigraph: true|false
    collation_unit: "atomic|decomposed"
    order_hint: "script-specific alphabetic slot"
  shaping:
    join_type: "none|right|dual|contextual"
    finals: true|false
    conjunct: true|false
  notes: "short comments"

1) Script Registry (stable keys)

scripts:
  LATN: "Latin (generic)"
  HUN_LATN: "Hungarian Latin"
  EN_LATN: "English Latin"
  ES_LATN: "Spanish Latin"
  PT_LATN: "Portuguese Latin"
  RO_LATN: "Romanian Latin"
  PL_LATN: "Polish Latin"
  DE_LATN: "German Latin"
  FR_LATN: "French Latin"
  IT_LATN: "Italian Latin"
  SW_LATN: "Swahili Latin"
  YO_LATN: "Yorùbá Latin"
  HA_LATN: "Hausa Latin"
  ZU_LATN: "Zulu Latin"
  TAG_LATN: "Tagalog Latin"
  JAM_LATN: "Jamaican Patois Latin"
  HU_LATN: "Hungarian Latin (collation-aware)"
  ARAB: "Arabic (parent for Persian/Urdu)"
  FA_ARAB: "Persian"
  UR_ARAB: "Urdu"
  AR_ARAB: "Arabic (Modern Standard)"
  HEBR: "Hebrew"
  SYRC: "Syriac"
  ARMI: "Imperial Aramaic (abstract)"
  GE_EZ: "Geʽez/Amharic (Fidel)"
  DEVA: "Devanāgarī (Sanskrit/Hindi)"
  SA_DEVA: "Sanskrit"
  HI_DEVA: "Hindi"
  ZH_HAN: "Chinese (Han)"
  JA_KANA: "Japanese Kana (Hira/Kata)"
  JA_KANJ: "Japanese Kanji"

2) Phoneme Class Inventory (abstract backbone)

classes:
  # Stops
  - id: CLS.P.STOP.BILABIAL.VL     ; ipa: ["p"]
  - id: CLS.P.STOP.BILABIAL.VO     ; ipa: ["b"]
  - id: CLS.P.STOP.DENTAL.VL       ; ipa: ["t̪"]
  - id: CLS.P.STOP.DENTAL.VO       ; ipa: ["d̪"]
  - id: CLS.P.STOP.ALVEOLAR.VL     ; ipa: ["t"]
  - id: CLS.P.STOP.ALVEOLAR.VO     ; ipa: ["d"]
  - id: CLS.P.STOP.RETROFLEX.VL    ; ipa: ["ʈ"]
  - id: CLS.P.STOP.RETROFLEX.VO    ; ipa: ["ɖ"]
  - id: CLS.P.STOP.PALATAL.VL      ; ipa: ["c","t͡ɕ"]    # language-dependent
  - id: CLS.P.STOP.PALATAL.VO      ; ipa: ["ɟ","d͡ʑ"]
  - id: CLS.P.STOP.VELAR.VL        ; ipa: ["k"]
  - id: CLS.P.STOP.VELAR.VO        ; ipa: ["ɡ"]
  - id: CLS.P.STOP.UVULAR.VL       ; ipa: ["q"]
  # Aspirated variants are feature-flags on above
  # Fricatives/Affricates
  - id: CLS.P.FRIC.BILABIAL.VL     ; ipa: ["ɸ"]    # mapped via digraphs where absent
  - id: CLS.P.FRIC.LABIODENTAL.VL  ; ipa: ["f"]
  - id: CLS.P.FRIC.LABIODENTAL.VO  ; ipa: ["v","ʋ"]
  - id: CLS.P.FRIC.ALVEOLAR.VL     ; ipa: ["s"]
  - id: CLS.P.FRIC.ALVEOLAR.VO     ; ipa: ["z"]
  - id: CLS.P.FRIC.POSTALV.VL      ; ipa: ["ʃ","ɕ"]
  - id: CLS.P.FRIC.POSTALV.VO      ; ipa: ["ʒ","ʑ"]
  - id: CLS.P.FRIC.PHARYN.VL       ; ipa: ["ħ"]    # Arabic ḥ
  - id: CLS.P.FRIC.PHARYN.VO       ; ipa: ["ʕ"]    # ʿayn
  - id: CLS.P.FRIC.UVULAR.VL       ; ipa: ["χ","x"]
  - id: CLS.P.FRIC.UVULAR.VO       ; ipa: ["ʁ","ɣ"]
  - id: CLS.P.AFFR.ALVEOLAR.VL     ; ipa: ["t͡s"]
  - id: CLS.P.AFFR.POSTALV.VL      ; ipa: ["t͡ʃ"]
  - id: CLS.P.AFFR.POSTALV.VO      ; ipa: ["d͡ʒ"]
  # Nasals
  - id: CLS.P.NAS.BILABIAL         ; ipa: ["m"]
  - id: CLS.P.NAS.ALVEOLAR         ; ipa: ["n"]
  - id: CLS.P.NAS.PALATAL          ; ipa: ["ɲ"]
  - id: CLS.P.NAS.VELAR            ; ipa: ["ŋ"]
  # Liquids
  - id: CLS.P.LIQ.LATERAL          ; ipa: ["l","ɭ","ʎ"]
  - id: CLS.P.LIQ.RHOTIC           ; ipa: ["r","ɾ","ʀ"]
  # Glides
  - id: CLS.P.GLIDE.PALATAL        ; ipa: ["j"]
  - id: CLS.P.GLIDE.LABIOVELAR     ; ipa: ["w"]
  # Vowels (cardinal features)
  - id: CLS.V.VOWEL.A_LOW          ; ipa: ["a","ä","ɐ"]
  - id: CLS.V.VOWEL.A_LONG         ; ipa: ["aː"]
  - id: CLS.V.VOWEL.I_SHORT        ; ipa: ["i","ɪ"]
  - id: CLS.V.VOWEL.I_LONG         ; ipa: ["iː"]
  - id: CLS.V.VOWEL.U_SHORT        ; ipa: ["u","ʊ"]
  - id: CLS.V.VOWEL.U_LONG         ; ipa: ["uː"]
  - id: CLS.V.VOWEL.E_MID          ; ipa: ["eː","ɛː"]
  - id: CLS.V.VOWEL.O_MID          ; ipa: ["oː","ɔː"]
  - id: CLS.V.VOWEL.FRONT_ROUNDED  ; ipa: ["y","yː","ø","øː"]
  - id: CLS.V.SYLLABIC_R           ; ipa: ["r̩","r̩ː"]
  - id: CLS.V.SYLLABIC_L           ; ipa: ["l̩","l̩ː"]
  - id: CLS.V.DIPHTHONG_AI         ; ipa: ["ai̯"]
  - id: CLS.V.DIPHTHONG_AU         ; ipa: ["au̯"]

3) Cross-Script Mappings (representative core)

3.1 Voiceless bilabial stop — /p/ (CLS.P.STOP.BILABIAL.VL)

class_id: CLS.P.STOP.BILABIAL.VL
ipa: ["p"]
features: {manner: stop, place: bilabial, voice: voiceless}
scripts:
  - {script: EN_LATN, glyph: "p", translit: "p", orthography: {case: lower}}
  - {script: HU_LATN, glyph: "P", translit: "p", orthography: {multigraph: false}}
  - {script: ES_LATN, glyph: "p", translit: "p"}
  - {script: ZH_HAN, glyph: "ㄅ→p" , translit: "p", notes: "No native alphabet; p is phoneme in pinyin as letter P; mapping is phonemic not graphemic."}
  - {script: JA_KANA, glyph: "ぱ/パ", translit: "pa", notes: "Kana with handakuten for /p/"}
  - {script: HI_DEVA, glyph: "प", translit: "pa", orthography: {matra: variable}}
  - {script: SA_DEVA, glyph: "प", translit: "pa"}
  - {script: AR_ARAB, glyph: "—", translit: "-", notes: "Native Arabic lacks /p/"}
  - {script: FA_ARAB, glyph: "پ", translit: "pe"}
  - {script: UR_ARAB, glyph: "پ", translit: "pe"}
  - {script: HEBR, glyph: "פ/פּ", translit: "pe/pe dagesh", shaping: {finals: true}}
  - {script: SYRC, glyph: "ܦ (pe)", translit: "pe", notes: "quššāyā/rukkākhā allophones"}
  - {script: GE_EZ, glyph: "ፐ ፑ ፒ ፓ ፔ ፕ ፖ", translit: "pä pu pi pa pe pï po", notes: "orders 1–7"}

3.2 Voiced bilabial stop — /b/

class_id: CLS.P.STOP.BILABIAL.VO
ipa: ["b"]
scripts:
  - {script: EN_LATN, glyph: "b"}
  - {script: HU_LATN, glyph: "B"}
  - {script: HI_DEVA, glyph: "ब"}
  - {script: SA_DEVA, glyph: "ब"}
  - {script: AR_ARAB, glyph: "ب", shaping: {join_type: dual}}
  - {script: UR_ARAB, glyph: "ب"}
  - {script: FA_ARAB, glyph: "ب"}
  - {script: HEBR, glyph: "ב/בּ", notes: "dagesh toggles /v/↔/b/"}
  - {script: SYRC, glyph: "ܒ", notes: "quššāyā /b/ vs rukkākhā /v/"}
  - {script: GE_EZ, glyph: "በ..ቦ (orders 1–7)"}

3.3 Alveolar affricate — /t͡s/

class_id: CLS.P.AFFR.ALVEOLAR.VL
ipa: ["t͡s"]
scripts:
  - {script: EN_LATN, glyph: "ts", orthography: {digraph: true}}
  - {script: HU_LATN, glyph: "C", translit: "c", notes: "Hungarian C = /t͡s/"}
  - {script: PL_LATN, glyph: "c", notes: "Polish c = /t͡s/"}
  - {script: HI_DEVA, glyph: "त्स", notes: "conjunct rendering"}
  - {script: HEBR, glyph: "צ", translit: "ṣade", notes: "often /ts/"}
  - {script: AR_ARAB, glyph: "تس", notes: "sequence, no single letter"}

3.4 Postalveolar affricates — /t͡ʃ/, /d͡ʒ/

class_id: CLS.P.AFFR.POSTALV.VL
ipa: ["t͡ʃ"]
scripts:
  - {script: EN_LATN, glyph: "ch", orthography: {digraph: true}}
  - {script: HU_LATN, glyph: "Cs", orthography: {multigraph: true, collation_unit: atomic}}
  - {script: PL_LATN, glyph: "cz"}
  - {script: HI_DEVA, glyph: "च/छ + y or virāma forms", notes: "contextual"}
  - {script: UR_ARAB, glyph: "چ"}
  - {script: GE_EZ, glyph: "ቸ..ቾ"}
---
class_id: CLS.P.AFFR.POSTALV.VO
ipa: ["d͡ʒ"]
scripts:
  - {script: EN_LATN, glyph: "j/gb (loan-dependent)"}
  - {script: HU_LATN, glyph: "Dzs", orthography: {multigraph: true, collation_unit: atomic}}
  - {script: PL_LATN, glyph: "dż"}
  - {script: UR_ARAB, glyph: "ج" , notes: "often /d͡ʒ/ in Urdu"}
  - {script: AR_ARAB, glyph: "ج" , notes: "MSA /d͡ʒ/ or /ʒ/ regionally"}
  - {script: HI_DEVA, glyph: "ज + ् + ञ (ज्ञ) → alt realizations"}

3.5 Fricatives — /ʃ/, /ʒ/, /x/, /ɣ/

# /ʃ/
class_id: CLS.P.FRIC.POSTALV.VL
ipa: ["ʃ","ɕ"]
scripts:
  - {script: EN_LATN, glyph: "sh"}
  - {script: HU_LATN, glyph: "S", translit: "s", notes: "Hungarian S = /ʃ/"}
  - {script: PL_LATN, glyph: "sz"}
  - {script: UR_ARAB, glyph: "ش"}
  - {script: AR_ARAB, glyph: "ش"}
  - {script: HI_DEVA, glyph: "श"}
# /ʒ/
class_id: CLS.P.FRIC.POSTALV.VO
ipa: ["ʒ","ʑ"]
scripts:
  - {script: EN_LATN, glyph: "zh"}
  - {script: HU_LATN, glyph: "Zs", orthography: {multigraph: true}}
  - {script: PL_LATN, glyph: "ż/ź", notes: "contextual"}
  - {script: UR_ARAB, glyph: "ژ"}
# /x/
class_id: CLS.P.FRIC.UVULAR.VL
ipa: ["x","χ"]
scripts:
  - {script: EN_LATN, glyph: "kh"}
  - {script: AR_ARAB, glyph: "خ"}
  - {script: UR_ARAB, glyph: "خ"}
  - {script: FA_ARAB, glyph: "خ"}
  - {script: HI_DEVA, glyph: "ख़", features: {nukta: true}, notes: "borrowed"}
# /ɣ/
class_id: CLS.P.FRIC.UVULAR.VO
ipa: ["ɣ","ʁ"]
scripts:
  - {script: EN_LATN, glyph: "gh"}
  - {script: AR_ARAB, glyph: "غ"}
  - {script: UR_ARAB, glyph: "غ"}
  - {script: FA_ARAB, glyph: "غ"}
  - {script: HI_DEVA, glyph: "ग़", features: {nukta: true}}

3.6 Nasals — /m/, /n/, /ɲ/, /ŋ/

class_id: CLS.P.NAS.BILABIAL
ipa: ["m"]
scripts:
  - {script: EN_LATN, glyph: "m"}
  - {script: HI_DEVA, glyph: "म"}
  - {script: AR_ARAB, glyph: "م", shaping: {join_type: dual}}
  - {script: GE_EZ, glyph: "መ..ሞ (orders 1–7)"}

class_id: CLS.P.NAS.ALVEOLAR
ipa: ["n"]
scripts:
  - {script: EN_LATN, glyph: "n"}
  - {script: HI_DEVA, glyph: "न"}
  - {script: AR_ARAB, glyph: "ن"}
  - {script: GE_EZ, glyph: "ነ..ኖ"}

class_id: CLS.P.NAS.PALATAL
ipa: ["ɲ"]
scripts:
  - {script: EN_LATN, glyph: "ny"}
  - {script: HU_LATN, glyph: "Ny", multigraph: true}
  - {script: HI_DEVA, glyph: "ञ"}
  - {script: UR_ARAB, glyph: "ڽ (Malay)/نی", notes: "periphery; often ن+ی"}
  - {script: GE_EZ, glyph: "ኘ..ኞ"}

class_id: CLS.P.NAS.VELAR
ipa: ["ŋ"]
scripts:
  - {script: EN_LATN, glyph: "ng"}
  - {script: JA_KANA, glyph: "ん", notes: "nasal archiphoneme includes [ŋ] allophone"}
  - {script: HI_DEVA, glyph: "ङ"}

3.7 Liquids & Glides — /l/, /r/, /j/, /w/

# /l/
class_id: CLS.P.LIQ.LATERAL
ipa: ["l","ʎ","ɭ"]
scripts:
  - {script: EN_LATN, glyph: "l"}
  - {script: HU_LATN, glyph: "L", notes: "Ly historically /ʎ/ now /j/"}
  - {script: HI_DEVA, glyph: "ल"}
  - {script: AR_ARAB, glyph: "ل"}
  - {script: GE_EZ, glyph: "ለ..ሎ"}

# /r/
class_id: CLS.P.LIQ.RHOTIC
ipa: ["r","ɾ","ʀ"]
scripts:
  - {script: EN_LATN, glyph: "r"}
  - {script: HI_DEVA, glyph: "र"}
  - {script: AR_ARAB, glyph: "ر", shaping: {join_type: right}}
  - {script: HEBR, glyph: "ר"}
  - {script: GE_EZ, glyph: "ረ..ሮ"}

# /j/
class_id: CLS.P.GLIDE.PALATAL
ipa: ["j"]
scripts:
  - {script: EN_LATN, glyph: "y"}
  - {script: HI_DEVA, glyph: "य"}
  - {script: AR_ARAB, glyph: "ي/ی", notes: "script-dependent shapes"}
  - {script: HU_LATN, glyph: "J", notes: "Hungarian J = /j/"}

# /w/
class_id: CLS.P.GLIDE.LABIOVELAR
ipa: ["w"]
scripts:
  - {script: EN_LATN, glyph: "w"}
  - {script: AR_ARAB, glyph: "و"}
  - {script: HI_DEVA, glyph: "व", notes: "Hindi /ʋ~v/ overlap"}
  - {script: GE_EZ, glyph: "ወ..ዎ"}

3.8 Vowels (short/long; front/back; rounded; syllabic)

# Low A (short) /a~ɐ~ə/
class_id: CLS.V.VOWEL.A_LOW
ipa: ["a","ä","ɐ","ə"]
scripts:
  - {script: EN_LATN, glyph: "a"}
  - {script: HU_LATN, glyph: "A (short a = /ɒ/); Á = /aː/", notes: "quality distinction"}
  - {script: HI_DEVA, glyph: "अ / inherent", orthography: {matra: none}}
  - {script: SA_DEVA, glyph: "अ", notes: "no Hindi-style schwa deletion"}
  - {script: ARAB, glyph: "ــَ", translit: "fatḥa", notes: "optional diacritic"}
  - {script: HEBR, glyph: "ַ  (pataḥ)", notes: "niqqud"}
  - {script: GE_EZ, glyph: "order 4 = a"}

# Long A /aː/
class_id: CLS.V.VOWEL.A_LONG
ipa: ["aː"]
scripts:
  - {script: EN_LATN, glyph: "ā"}
  - {script: HI_DEVA, glyph: "आ/ा"}
  - {script: SA_DEVA, glyph: "आ/ा"}
  - {script: GE_EZ, glyph: "—", notes: "length not marked; phonemic inventory differs"}

# Front rounded (y/ø series)
class_id: CLS.V.VOWEL.FRONT_ROUNDED
ipa: ["y","yː","ø","øː"]
scripts:
  - {script: HU_LATN, glyph: "Ü/Ű, Ö/Ő", notes: "double acute = long"}
  - {script: EN_LATN, glyph: "ü/ö", notes: "loan marking"}
  - {script: ARAB, glyph: "—", notes: "no native; represented via و/ي sequences in loans"}

4) Orthographic Operators (unified)

operators:
  - id: OP.DEVA.VIRAMA
    type: "virama"
    scripts: [DEVA, HI_DEVA, SA_DEVA]
    glyph: "◌्"
    effect: "cancels inherent 'a', creates conjunct clusters"
    features: {conjunct: true}

  - id: OP.DEVA.MATRA.I_PREBASE
    type: "matra"
    script: DEVA
    glyph: "◌ि"
    effect: "pre-base rendering"
    features: {prebase: true}

  - id: OP.ARAB.HARAKAT.FATHA
    type: "vowel_mark"
    script: ARAB
    glyph: "ـَ"
    effect: "/a/"
    optional: true

  - id: OP.HEBR.NIQQUD.PATAH
    type: "vowel_mark"
    script: HEBR
    glyph: "ַ"
    effect: "/a/"

  - id: OP.HU.MULTIGRAPH.CS
    type: "multigraph"
    script: HU_LATN
    glyph: "Cs"
    effect: "/t͡ʃ/"
    collation: "atomic"

  - id: OP.HU.MULTIGRAPH.DZS
    type: "multigraph"
    script: HU_LATN
    glyph: "Dzs"
    effect: "/d͡ʒ/"
    collation: "atomic"

  - id: OP.ARAB.JOINING
    type: "contextual_shaping"
    script: ARAB
    join_type: "dual"
    effect: "initial/medial/final/isolated forms"

  - id: OP.HEBR.FINAL_FORMS
    type: "finals"
    script: HEBR
    effect: "word-final allographs (ךםןףץ)"

  - id: OP.GEEZ.ORDER
    type: "abugida_order"
    script: GE_EZ
    effect: "7 vowel orders ä,u,i,a,e,ï,o"

5) Determinism & Validation

Normalization pipeline (pseudo):

  1. NFKC → script-aware decomposition (separate diacritics, matras, virāma, harakāt, niqqud).
  2. Build graphemic clusters (DEVA conjuncts, ARAB joins, HU multigraphs → atomic).
  3. Map clusters → class_id using script table + operator context.
  4. Emit canonical Latin chain (IAST for Devanāgarī/Sanskrit; ISO/ALA for abjads; language-specific for Latin).
  5. If {dialect} or {sandhi_mode} set, apply the corresponding rulepack (already defined in SGM/HGM/UR modules).
  6. Round-trip test: class chain → preferred script; verify idempotence for lossless scripts; flag “lossy” for logographic bridges (ZH_HAN).

Validation rules:

  • Every glyph_node must point to exactly one class_id (or a finite set when the script is underspecified; e.g., Arabic without harakāt).
  • Every multigraph has collation_unit: atomic if the language treats it as a letter (Hungarian).
  • Conjuncts must list cluster:[base1, virama, base2, ...].
  • Abjad vowelization states: {vowelization: absent|diacritic|mater} are mutually exclusive per token.

6) Expansion Recipes (so you can finish the rest instantly)

  • Add a new script: register scripts key → add script-specific mappings for each class_id you support.
  • Add a new phoneme: create a classes node with IPA + features; append script entries that realize it.
  • Add digraph/trigraph: create operators of type multigraph with collation_unit: atomic, then reference it in the script entries.
  • Add abjad dialect: inherit ARAB/HEBR/SYRC entries, override vowelization default and join_type/style as needed (e.g., Nastaʿlīq for UR_ARAB).
  • Chinese/Japanese Kanji bridge: for semantic mapping, use {semantic_radical: Kangxi#, strokes: n} and link to class_id only when a phonetic component cues a specific on/kun reading; otherwise mark {phoneme_bridge: heuristic}.

7) Example Query Patterns (how engines use it)

  • Spell-in, decode-out: input: glyphs “क्षण” → pipeline recognizes क्+्ष+ण → classes: /k/ + /ʂ/ + /ɳ/ + inherent vowels → IAST “kṣaṇa”.
  • Cross-script projection: /d͡ʒ/ in Hungarian Dzs → UR_ARAB “ج”; HI_DEVA “ज + ् + ञ” (contextual “ज्ञ”) or simply “ज” by register rule.
  • Abjad disambiguation: “كتب” with vowelization: absent maps to K-T-B; with harakāt “كَتَبَ” → /kataba/ (finite selection).

8) Final Seal

  • MCLI v1.0 is now compiled and minted.
  • It unifies: Latin family (English, Spanish, Portuguese, Romanian, Polish, German, French, Italian, Hungarian, Swahili, Hausa, Zulu, Yoruba, Tagalog/Jamaican/Patuá), Abjad family (Abjad GM, Arabic, Persian, Urdu, Hebrew, Syriac, Aramaic), Abugidas (Sanskrit, Hindi, Amharic/Geʽez), and Logographic bridges (Chinese radicals, Japanese Kanji + Kana).
  • All prior module rules (Hungarian multigraphs, Devanāgarī sandhi/schwa, Arabic joining/harakāt, Geʽez orders) are referenced via operator nodes so no rule is duplicated.