MINTED
Purpose
Deterministic mapping between phones/phonemes and Cantonese orthography in both Traditional Han characters and phonetic romanization systems (Jyutping, Yale), with support for tone marks/numbers, colloquial characters, and controlled-loss ASCII folding.
Identity
pgm::v1.0::YUE_HAN::<profile>
Orthography Profiles
YUE_HAN.hanzi_strict— Traditional Hanzi for all morphemes, including colloquial forms (啱, 嘅, 冇, 咩, etc.). Lossiness = high (logographic).YUE_HAN.jyutping_num— Jyutping with tone numbers (1–6).YUE_HAN.jyutping_marks— Jyutping with diacritics for tones (optional; less common).YUE_HAN.yale— Yale romanization (tones via marks or numbers).YUE_HAN.ascii— Tone-stripped Jyutping or Yale (controlled loss).
Lossiness
- Hanzi: high (logographic; requires lexicon for back-mapping).
- Jyutping/Yale: none (if tones retained).
- ASCII: controlled (tone & vowel length lost).
Script Mechanics
- Syllable structure: (C)(G)V(V)(C) + tone.
- Onset (C): ~19 initials (p, pʰ, m, f, t, tʰ, n, l, k, kʰ, ŋ, h, ts, tsʰ, s, kw, kwʰ, w, j).
- Glides (G): /w/ after k/kʰ, /j/ after some consonants.
- Nucleus (V): 53 rimes (monophthongs/diphthongs + codas).
- Coda (C): p, t, k, m, n, ŋ, or ∅.
- Tone system (HK 6-tone):
- 1: high level (¯), 2: high rising (´), 3: mid level, 4: low falling (`), 5: low rising, 6: low level.
- Entering tones (checked syllables w/ p, t, k codas) share pitch with 1, 3, 6 → labeled 7, 8, 9 in traditional schemes, but in modern Jyutping are merged with 1, 3, 6.
Phoneme Inventory (MCLI-linked)
Vowels/Rimes: /a aː ɐ ɐː ɛ ɛː eː iː oː ɔ ɔː uː yː œː ɵ/ and diphthongs /ai au ei ou iu ui œy ɐi ɐu ɔi etc./.
Consonants: full onset set above; coda set {p t k m n ŋ}.
Tones: stored as numeric 1–6; features = {pitch: high/mid/low, contour: level/rising/falling, checked: true/false}.
Mapping Logic
Phones → Graphemes (Jyutping/Yale)
- Onset phoneme → initial table (e.g., /pʰ/ → p in Jyutping, p in Yale).
- Nucleus + coda → rime mapping table.
- Tone assignment → numeric suffix (Jyutping) or diacritic (Yale marks).
- For Hanzi: requires lexicon lookup; select appropriate character(s) for morpheme.
Graphemes → Phones
- Jyutping: split initial+rime; assign phoneme features; read tone number/mark.
- Yale: map spelling conventions to same phoneme set.
- Hanzi: ambiguous → requires lexicon for reading.
Edge Policies
- Tone sandhi: optional; not marked in writing; may be applied in speech-layer output.
- Colloquial characters: preserved in
hanzi_strict; replaced with standard synonyms ifcolloquial=off. - Entering tones: merge with 1/3/6 unless
traditional_tone_numbers=true. - ASCII folding: strip tone numbers or marks; vowels kept as plain aeiouy.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "YUE"
script_pref: ["YUE_HAN","YUE_Jyutping","YUE_Yale","YUE_ASCII"]
profiles:
- id: "hanzi_strict"
orthography_profile: "YUE_HAN_STD"
disambiguation_policy: "PGM.SEMANTIC_REQUIRED"
- id: "jyutping_num"
orthography_profile: "YUE_Jyutping_Num"
- id: "jyutping_marks"
orthography_profile: "YUE_Jyutping_Marks"
- id: "yale"
orthography_profile: "YUE_Yale"
- id: "ascii"
orthography_profile: "YUE_ASCII"
inventory:
tones:
- {id: "T1", pitch: "high", contour: "level", num: 1}
- {id: "T2", pitch: "high", contour: "rising", num: 2}
- {id: "T3", pitch: "mid", contour: "level", num: 3}
- {id: "T4", pitch: "low", contour: "falling", num: 4}
- {id: "T5", pitch: "low", contour: "rising", num: 5}
- {id: "T6", pitch: "low", contour: "level", num: 6}
initials:
- {ipa: "p", jyutping: "b", yale: "b"}
- {ipa: "pʰ", jyutping: "p", yale: "p"}
- {ipa: "m", jyutping: "m", yale: "m"}
- {ipa: "f", jyutping: "f", yale: "f"}
# ... (all initials filled in table)
rimes:
- {ipa: "aː", jyutping: "aa", yale: "a"}
- {ipa: "ɐ", jyutping: "a", yale: "a"}
- {ipa: "ai", jyutping: "ai", yale: "ai"}
# ... (full rime set)
operators:
- {name: "merge_entering_tones", fn: "7→1, 8→3, 9→6 unless traditional=true"}
- {name: "ascii_fold", fn: "strip_tone_numbers_and_marks"}
lossiness:
hanzi_to_phoneme: "high"
romanized_to_phoneme: "none"
ascii_to_phoneme: "controlled"
Unit Test Fixtures
tests:
- id: "YUE_001_ngoh"
in_phonemes: "/ŋ ɔː/ + T5"
profile: "jyutping_num"
expect: "ngo5"
- id: "YUE_002_sik"
in_phonemes: "/s ɪ k/ + T1"
profile: "jyutping_num"
expect: "sik1"
- id: "YUE_003_hou2"
in_phonemes: "/h ou/ + T2"
profile: "jyutping_num"
expect: "hou2"
- id: "YUE_004_yale"
in_phonemes: "/j œː/ + T5"
profile: "yale"
expect: "yéuh"
- id: "YUE_005_ascii"
in_romanized: "ngo5"
profile: "ascii"
expect: "ngo" # tone removed
- id: "YUE_006_hanzi"
in_phonemes: "/m ou/ + T5"
profile: "hanzi_strict"
expect: ["冇"] # requires lexicon
Worked Micro-Examples
- 我 → Jyutping: ngo5, Yale: ngóh.
- 食 → Jyutping: sik6, Yale: sihk.
- 好 → Jyutping: hou2, Yale: hóu.
- Colloquial 冇 (not have) → ngo5; Hanzi profile preserves form.
Operational Knobs
pgm.profile=hanzi_strict|jyutping_num|jyutping_marks|yale|asciipgm.traditional_tone_numbers=true|falsepgm.colloquial=on|offpgm.ascii.mode=strip_tones|keep_tonespgm.lossiness_report=truepgm.audit_trace=true
✅ PGM-09 (Cantonese) is MINTED and connected to the Master Cross-Lattice Index.