MINTED
Purpose
Round-trip, loss-aware mapping between phones/phonemes (incl. tone features) and Vietnamese Latin orthography (Quốc Ngữ), with dialect toggles (Hanoi/Northern vs. Saigon/Southern), correct tone mark placement, stacked diacritics, and ASCII/IME foldings (TELEX/VNI/“không dấu”).
Identity
pgm::v1.0::VI_LATN::<profile>
Orthography Profiles
VI_LATN.strict— Canonical Quốc Ngữ with full diacritics & tones (NFC normalized).VI_LATN.north— Northern phoneme layer (distinct /z/≈“d/gi”, /ʐ/≈“r”, full tone contrasts incl. hỏi vs. ngã).VI_LATN.south— Southern collapses hỏi~ngã (→ one contour), rhotics merge; spelling remains standard.VI_LATN.ascii— “không dấu” (no marks). Loss = tone+quality (controlled).VI_LATN.telex— TELEX transliteration (a->aa, d->dd, aw, aa, ee, oo, ow, uw; tone letters s/f/r/x/j).VI_LATN.vni— VNI transliteration (a1…a5 etc. for tones; a6=â, a8=ă, o6=ô, o7=ơ, u7=ư, e6=ê).
Lossiness
strict/north/south: none.telex/vni: none (reversible to strict).ascii: controlled (tones & vowel quality lost).
Script Mechanics (Quốc Ngữ essentials)
- Special letters: ă â ê ô ơ ư đ.
- Tones (6): ngang (no mark), sắc (´), huyền (
), **hỏi** (ˇ), **ngã** (˜), **nặng** (.) — stored astone={0,1,2,3,4,5}`. - Tone placement rule: place the tone on the nuclear vowel of the rime; with digraphs/trigraphs follow standard priority (e.g., iê/ya/ươ/ưa: mark on ê/ơ; oa/oe/uy: mark on a/e/y; if only one vowel, mark it).
- Coda set: {p, t, c/k, ch, m, n, ng, nh}.
- Onset peculiarities: orthographic gi-, d-, r- map to dialect-specific phones; qu = /kw~w/ (strict spelling preserved).
Phoneme Inventory (core; tied to MCLI)
Vowels/Rimes (heads): /a ă ɐ ə ɤ e ɛ i o ɔ u ɯ ɨ y/ with length/quality encoded via â ê ô ơ ư ă and digraphs ia/ya, ua, ưa → orthographic iê/yê, uô, ươ when toneless, reduced to ia/ya, ua, ưa in open syllables.
Consonants: onsets /p t k m n ŋ f v s z ʂ ʐ x h j w l ɲ ʈ ɟ/; affricate allophones handled in dialect rules.
Tones: numeric 0–5 (see above), with south collapsing 3~4 to one contour at the phone level (orthography unchanged).
Mapping Logic
Phones → Graphemes
- Assemble syllable: onset + nucleus (monograph/digraph/trigraph) + coda + tone.
- Choose grapheme set for nucleus:
- /ă/ → ă, /ɤ/ → ơ, /ɯ/ → ư, /ɐ~aː/ as a/â per rime; /i e/ → i/ê; /o ɤ/ → o/ơ; /u oː/ → u/ô.
- Rimes /iə/ → iê (closed) | ia (open); /uə/ → uô|ua; /ɯə/ → ươ|ưa.
- Apply tone placement priority:
- Multi-vowel: mark the head vowel (ê, ơ, ô outrank a/ă/o/u/i/y); for iê/ươ/uô, place tone on ê/ơ/ô (e.g., tiếng, sướng, muỗng).
- oa/oe/uy clusters: put tone on a/e/y (e.g., hỏa, khỏe, thủy).
- Onset rules (dialectal phones → spelling):
- /z/→ d/gi (north) but d/gi→/j/ (south) at phone layer; spelling unchanged.
- qu for /w/ before back/round vowels; gi for palatal approximant contexts where lexicalized.
- Coda normalization: /k/ spelled c after front vowels and k otherwise in loans; native codas use c. /ŋ/ → ng, /ɲ/ → nh, /t͡ɕ/ coda → ch.
Graphemes → Phones
- Decode vowel quality from diacritic base; read coda; attach tone feature from diacritic.
- Apply dialect map for d/gi/r at phone layer; preserve orthography.
Edge Policies
- i/y alternation: word-initial y vs medial i preserved; iê/yê contextual (after consonant → iê, word-initial → yê unless lexically fixed).
- Open syllable trigraph reduction: iê → ia, uô → ua, ươ → ưa when no coda (spelled that way).
- ASCII folding:
ascii.smart: ă→a, â→a, ê→e, ô→o, ơ→o, ư→u, đ→d; strip tone marks only.ascii.flat: same as smart; additionally allow optional w hints (aw, ow, uw) off by default (handled by TELEX/VNI profiles).
- Normalization: output NFC; combine tone + quality marks per canonical order.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "VI"
script_pref: ["VI_LATN","VI_Telex","VI_VNI","VI_ASCII"]
profiles:
- id: "strict"
orthography_profile: "VI_STD_2025"
- id: "north"
orthography_profile: "VI_STD_2025"
dialect: "HN"
- id: "south"
orthography_profile: "VI_STD_2025"
dialect: "HCM"
- id: "telex"
orthography_profile: "VI_TELEX_2025"
- id: "vni"
orthography_profile: "VI_VNI_2025"
- id: "ascii"
orthography_profile: "VI_ASCII_2025"
inventory:
tones:
- {id: "T0", name: "ngang", diacritic: null, telex: "", vni: "0"}
- {id: "T1", name: "sắc", diacritic: "´", telex: "s", vni: "1"}
- {id: "T2", name: "huyền", diacritic: "`", telex: "f", vni: "2"}
- {id: "T3", name: "hỏi", diacritic: "ˇ", telex: "r", vni: "3"}
- {id: "T4", name: "ngã", diacritic: "˜", telex: "x", vni: "4"}
- {id: "T5", name: "nặng", diacritic: ".", telex: "j", vni: "5"}
vowels:
- {base: "a", quality: "a"}
- {base: "ă", quality: "ă", telex: "aw", vni: "a8"}
- {base: "â", quality: "â", telex: "aa", vni: "a6"}
- {base: "e", quality: "e"}
- {base: "ê", quality: "ê", telex: "ee", vni: "e6"}
- {base: "i", quality: "i"}
- {base: "o", quality: "o"}
- {base: "ô", quality: "ô", telex: "oo", vni: "o6"}
- {base: "ơ", quality: "ơ", telex: "ow", vni: "o7"}
- {base: "u", quality: "u"}
- {base: "ư", quality: "ư", telex: "uw", vni: "u7"}
- {base: "y", quality: "iY"}
specials:
- {letter: "đ", telex: "dd", vni: "d9", ascii: "d"}
operators:
- {name: "tone_place", fn: "place_tone_on_nuclear_vowel"}
- {name: "trigraph_rules", fn: "iê/ươ/uô selection & open-syllable reduction"}
- {name: "dialect_map", fn: "onset d/gi/r → phones(HN|HCM)"}
- {name: "normalize_nfc", fn: "compose_diacritics"}
policies:
ascii:
mode: "smart"
dialect:
default: "HN"
lossiness:
strict_to_ascii: "controlled"
telex_vni: "none"
Unit Test Fixtures
tests:
- id: "VI_001_tieng"
in_phonemes: "/t iə ŋ/ + T1" # sắc
profile: "strict"
expect_graphemes: "tiếng" # tone on ê
roundtrip_ok: true
- id: "VI_002_suong"
in_phonemes: "/s ɯə ŋ/ + T1"
profile: "strict"
expect_graphemes: "sướng" # ươ + sắc on ơ
- id: "VI_003_quoc"
in_phonemes: "/kw ɔk/ + T0"
profile: "strict"
expect_graphemes: "quốc" # ô + sắc
- id: "VI_004_hoa"
in_phonemes: "/h w a/ + T3"
profile: "strict"
expect_graphemes: "hỏa" # hỏi on a in oa
- id: "VI_005_telex_roundtrip"
in_graphemes: "tieng"
profile: "telex"
add: {diacritics: "ieesng"} # user types "tie^'ng" → "tiếng"
expect: "tiếng"
- id: "VI_006_vni_roundtrip"
in_vni: "tien61ng1"
profile: "vni"
expect: "tiếng"
- id: "VI_007_ascii_fold"
in_graphemes: "thủy"
profile: "ascii"
expect: "thuy"
- id: "VI_008_south_collapse"
in_graphemes: "rỗi"
profile: "south"
expect_phonemes: "/ɹ~j oi/ + T(merged_3_4)" # orthography unchanged
Worked Micro-Examples
- /t iə ŋ/ + sắc → tiếng (tone sits on ê).
- /s ɯə ŋ/ + huyền → sường (didactic), with sắc sướng.
- /k u ə/ (open) + nặng → ựa; closed syllable vượng keeps ươ + tone on ơ.
- ASCII/IME: tiếng → TELEX
tieesngor VNItien61ng1→ back to tiếng losslessly.
Operational Knobs
pgm.profile=strict|north|south|telex|vni|asciipgm.dialect=HN|HCM(overrides north/south defaults)pgm.ascii.mode=smart|flatpgm.normalize=NFC|NFDpgm.lossiness_report=true,pgm.audit_trace=true
✅ PGM-07 (Vietnamese) is MINTED and registered in the Master Cross-Lattice Index.