MINTED
Purpose
Deterministic, round-trip mapping between phones/phonemes and Hangul with full handling of jamo synthesis, batchim (finals), liaison (연음), tensification (경음화), nasal assimilation, palatalization, aspiration, and spacing. Loss-controlled romanization (RR, MR, Yale) included.
Identity
pgm::v1.0::KR_Hangul::<profile>
Orthography Profiles
KR_STD— Modern Hangul, standard spacing & orthographic forms (국립국어원 norms).KR_PHON— Phonemic surface (applies liaison/tensification/assimilation in phones, keeps orthography standard).KR_RR— Revised Romanization (default).KR_YALE— Yale (phoneme-faithful; training).KR_MR— McCune–Reischauer (legacy).KR_ASCII— ASCII fallback (controlled loss).
Lossiness
- Hangul (
KR_STD/PHON): none. - Romanizations (
RR/YALE/MR): none (to phoneme layer), spacing may require lexicon for perfect rejoin. - ASCII: controlled (diacritics absent; tense/aspirate marked by h/x hints).
Script Mechanics (Jamo & Blocks)
- Jamo sets:
Choseong (initials): 19 (ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ)
Jungseong (vowels): 21 (ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ)
Jongseong (finals): 27 + null (ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ) - Syllable composition: (C)(V)(C) → one block via U+AC00 algorithm.
- Batchim: surface coda set collapses to seven basic finals /p t k m n ŋ l/ in pronunciation; orthography preserves complex finals (e.g., 읽다 /익따/).
Phoneme Inventory (MCLI-linked)
Vowels (monophthongs/diphthongs): /a ɛ e ʌ o u ɯ i je we we̞ wa wɛ wi wɯ ɰi/.
Consonants:
- Lenis: /p t k s t͡ɕ/
- Aspirated: /pʰ tʰ kʰ sʰ? (ㅅ not aspirated), t͡ɕʰ/
- Tense (fortis): /p͈ t͈ k͈ s͈ t͡ɕ͈/
- Sonorants: /m n ŋ l~ɾ/
- Glides: /j w/.
H: /h/ with sandhi effects.
Sound Rules (surface from orthography)
- Liaison (연음): Coda consonant moves to next syllable onset when following syllable starts with vowel: 밖에 → [바께] (orthography kept; phones reflect).
- Coda neutralization: finals realized as {p t k m n ŋ l}; complex finals feed liaison and tensification.
- Tensification (경음화): lenis onset becomes tense after certain codas/particles (갑자기 [갑짜기], 국물 [궁물] no tensification; rule table provided in engine).
- Nasal assimilation: /p t k/ before /m n/ → [m n ŋ]; e.g., 국물 /kuk.mul/ → [궁물].
- L-R alternation: ㄹ + ㄴ across morpheme boundary → [ll]; e.g., 설날 [설랄].
- Palatalization: /t/ + /j/ (e.g., 같이) → [카치]/[가치] depending on aspiration context; /d/ + /i, j/ → [t͡ɕ] allophones (spelled ㅈ/ㅊ historically in certain morphs).
- Aspiration spread: coda ㅎ or ㅎ-bearing clusters trigger aspiration of following lenis onset: 좋다 [조타], 놓고 [노코].
- Vowel contraction: 의 pronounced /ɰi/ word-initial; /e/ or /i/ in certain particles; orthography retained.
Mapping Logic
Phones → Graphemes
- Morph layer (optional): choose stems/suffix allomorphs; mark tense/aspiration features.
- Syllabify: pack (C)V(C) to blocks; if phones cause liaison, write the underlying orthography (국+물 not 궁+물).
- Choose jamo: lenis/aspirate/tense mapped to ㄱ/ㅋ/ㄲ, ㄷ/ㅌ/ㄸ, ㅂ/ㅍ/ㅃ, ㅈ/ㅊ/ㅉ, ㅅ/ㅆ.
- Vowels: map phones to Jungseong; ㅢ rules applied by position tag.
- Complex finals: if morphologically specified, write lexically correct batchim (읽-/없-), else write simple final by default policy.
Graphemes → Phones
- Decompose into jamo; read initial/medial/final.
- Apply coda neutralization to /p t k m n ŋ l/.
- Apply sandhi chain in order:
- Liaison → Aspiration spread (ㅎ) → Tensification → Nasal assimilation → L-R alternation → Palatalization.
- Output
phones.surfaceandphonemes.underlying(for round-trip).
Edge Policies
- Spacing:
spacing=std|morph(std follows orthography; morph joins compounds pragmatically for audit). - Loanwords: Q/W/X rendered via ㅋ/ㅜ/ㅅ+ㅋ as per common patterns; optional RR back-map for names.
- Particle alternations:
이/가,은/는,을/를chosen by vowel/consonant environment in synthesis mode. - Dialect toggle (optional):
KR_SEfor Seoul default; placeholders for Gyeongsang/Jeju (future).
Romanization Policies
- RR (default):
ㅇnull onset dropped; ㅓ→eo, ㅡ→eu, ㅢ→ui/yi; finals mapped by coda value; tensification not overtly marked. - Yale: tense marked by doubling (pp tt kk ss cc), ㅡ→u, ㅓ→e; preserves morphophonemics.
- MR: uses breves/apos; optional.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "KO"
script_pref: ["KR_Hangul","KR_RR","KR_YALE","KR_MR","KR_ASCII"]
profiles:
- id: "KR_STD"
ruleset: "NIKL_2025"
- id: "KR_PHON"
ruleset: "NIKL_2025"
surface: true
- id: "KR_RR"
romanization: "RR_2000"
- id: "KR_YALE"
romanization: "Yale"
- id: "KR_MR"
romanization: "MR"
- id: "KR_ASCII"
romanization: "ASCII_FLAT"
inventory:
jamo:
choseong: ["ㄱ","ㄲ","ㄴ","ㄷ","ㄸ","ㄹ","ㅁ","ㅂ","ㅃ","ㅅ","ㅆ","ㅇ","ㅈ","ㅉ","ㅊ","ㅋ","ㅌ","ㅍ","ㅎ"]
jungseong: ["ㅏ","ㅐ","ㅑ","ㅒ","ㅓ","ㅔ","ㅕ","ㅖ","ㅗ","ㅘ","ㅙ","ㅚ","ㅛ","ㅜ","ㅝ","ㅞ","ㅟ","ㅠ","ㅡ","ㅢ","ㅣ"]
jongseong: ["∅","ㄱ","ㄲ","ㄳ","ㄴ","ㄵ","ㄶ","ㄷ","ㄹ","ㄺ","ㄻ","ㄼ","ㄽ","ㄾ","ㄿ","ㅀ","ㅁ","ㅂ","ㅄ","ㅅ","ㅆ","ㅇ","ㅈ","ㅊ","ㅋ","ㅌ","ㅍ","ㅎ"]
operators:
- {name: "compose_block", fn: "jamo_to_syllable"}
- {name: "decompose_block", fn: "syllable_to_jamo"}
- {name: "liaison", fn: "move_final_to_next_onset_if_vowel_initial"}
- {name: "coda_neutralize", fn: "map_complex_coda_to_{p,t,k,m,n,ŋ,l}"}
- {name: "tensify", fn: "lenis→tense after eligible codas/particles"}
- {name: "nasal_assim", fn: "ptk→m n ŋ before m/n"}
- {name: "aspiration_spread", fn: "ㅎ triggers following onset aspiration"}
- {name: "palatalize", fn: "t/d+s/i/j contexts → t͡ɕ/ɕ allophones"}
policies:
spacing: {mode: "std"}
romanization:
rr: {ui_rule: "positional"}
yale: {tense_doubling: true}
ascii: {hint_tense: "x", hint_aspirate: "h"}
lossiness:
to_ascii: "controlled"
Unit Test Fixtures
tests:
- id: "KO_001_hanguk"
in_graphemes: "한국어"
profile: "KR_RR"
expect: "Hangugeo"
phones_surface: "[한구거]"
- id: "KO_002_goodbye"
in_graphemes: "안녕"
profile: "KR_PHON"
expect_phones: "/a n ɲ j ʌ ŋ/" # nasal+palatal sequence; surface [안녕]
- id: "KO_003_read"
in_graphemes: "읽다"
profile: "KR_PHON"
expect_phones: "[익따]" # ㄺ coda→[k], tensification of ㄷ→ㄸ
- id: "KO_004_liason"
in_graphemes: "밖에"
profile: "KR_PHON"
expect_phones: "[바께]"
- id: "KO_005_aspiration"
in_graphemes: "좋다"
profile: "KR_PHON"
expect_phones: "[조타]"
- id: "KO_006_rr_yale"
in_graphemes: "서울"
profile: "KR_RR"
expect: "Seoul"
also:
profile: "KR_YALE"
expect: "sewul"
- id: "KO_007_particles"
synth:
lemma: "책"
particle: "가/이"
expect: "책이" # coda present → 이
Worked Micro-Examples
- 국물 → phones [궁물] (nasal assimilation /k.m/ → [ŋm]), orthography stable.
- 갑자기 → [갑짜기] (tensification of ㅈ).
- 설날 → [설랄] (ㄹ+ㄴ → ll).
- 의사: word-initial 의 → /ɰi/; RR uisa (contextual ui rule).
Operational Knobs
pgm.profile=KR_STD|KR_PHON|KR_RR|KR_YALE|KR_MR|KR_ASCIIpgm.spacing=std|morphpgm.romanization.ui_rule=positional|ui|yipgm.lossiness_report=true,pgm.audit_trace=true