PGM-08 — Korean (KR_Hangul)

MINTED

Purpose

Deterministic, round-trip mapping between phones/phonemes and Hangul with full handling of jamo synthesis, batchim (finals), liaison (연음), tensification (경음화), nasal assimilation, palatalization, aspiration, and spacing. Loss-controlled romanization (RR, MR, Yale) included.

Identity

pgm::v1.0::KR_Hangul::<profile>

Orthography Profiles

  • KR_STD — Modern Hangul, standard spacing & orthographic forms (국립국어원 norms).
  • KR_PHON — Phonemic surface (applies liaison/tensification/assimilation in phones, keeps orthography standard).
  • KR_RR — Revised Romanization (default).
  • KR_YALE — Yale (phoneme-faithful; training).
  • KR_MR — McCune–Reischauer (legacy).
  • KR_ASCII — ASCII fallback (controlled loss).

Lossiness

  • Hangul (KR_STD/PHON): none.
  • Romanizations (RR/YALE/MR): none (to phoneme layer), spacing may require lexicon for perfect rejoin.
  • ASCII: controlled (diacritics absent; tense/aspirate marked by h/x hints).

Script Mechanics (Jamo & Blocks)

  • Jamo sets:
    Choseong (initials): 19 (ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ)
    Jungseong (vowels): 21 (ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ)
    Jongseong (finals): 27 + null (ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ)
  • Syllable composition: (C)(V)(C) → one block via U+AC00 algorithm.
  • Batchim: surface coda set collapses to seven basic finals /p t k m n ŋ l/ in pronunciation; orthography preserves complex finals (e.g., 읽다 /익따/).

Phoneme Inventory (MCLI-linked)

Vowels (monophthongs/diphthongs): /a ɛ e ʌ o u ɯ i je we we̞ wa wɛ wi wɯ ɰi/.
Consonants:

  • Lenis: /p t k s t͡ɕ/
  • Aspirated: /pʰ tʰ kʰ sʰ? (ㅅ not aspirated), t͡ɕʰ/
  • Tense (fortis): /p͈ t͈ k͈ s͈ t͡ɕ͈/
  • Sonorants: /m n ŋ l~ɾ/
  • Glides: /j w/.
    H: /h/ with sandhi effects.

Sound Rules (surface from orthography)

  • Liaison (연음): Coda consonant moves to next syllable onset when following syllable starts with vowel: 밖에 → [바께] (orthography kept; phones reflect).
  • Coda neutralization: finals realized as {p t k m n ŋ l}; complex finals feed liaison and tensification.
  • Tensification (경음화): lenis onset becomes tense after certain codas/particles (갑자기 [갑짜기], 국물 [궁물] no tensification; rule table provided in engine).
  • Nasal assimilation: /p t k/ before /m n/ → [m n ŋ]; e.g., 국물 /kuk.mul/ → [궁물].
  • L-R alternation: ㄹ + ㄴ across morpheme boundary → [ll]; e.g., 설날 [설랄].
  • Palatalization: /t/ + /j/ (e.g., 같이) → [카치]/[가치] depending on aspiration context; /d/ + /i, j/ → [t͡ɕ] allophones (spelled ㅈ/ㅊ historically in certain morphs).
  • Aspiration spread: coda ㅎ or ㅎ-bearing clusters trigger aspiration of following lenis onset: 좋다 [조타], 놓고 [노코].
  • Vowel contraction: 의 pronounced /ɰi/ word-initial; /e/ or /i/ in certain particles; orthography retained.

Mapping Logic

Phones → Graphemes

  1. Morph layer (optional): choose stems/suffix allomorphs; mark tense/aspiration features.
  2. Syllabify: pack (C)V(C) to blocks; if phones cause liaison, write the underlying orthography (국+물 not 궁+물).
  3. Choose jamo: lenis/aspirate/tense mapped to ㄱ/ㅋ/ㄲ, ㄷ/ㅌ/ㄸ, ㅂ/ㅍ/ㅃ, ㅈ/ㅊ/ㅉ, ㅅ/ㅆ.
  4. Vowels: map phones to Jungseong; ㅢ rules applied by position tag.
  5. Complex finals: if morphologically specified, write lexically correct batchim (읽-/없-), else write simple final by default policy.

Graphemes → Phones

  1. Decompose into jamo; read initial/medial/final.
  2. Apply coda neutralization to /p t k m n ŋ l/.
  3. Apply sandhi chain in order:
    • Liaison → Aspiration spread (ㅎ) → Tensification → Nasal assimilation → L-R alternation → Palatalization.
  4. Output phones.surface and phonemes.underlying (for round-trip).

Edge Policies

  • Spacing: spacing=std|morph (std follows orthography; morph joins compounds pragmatically for audit).
  • Loanwords: Q/W/X rendered via ㅋ/ㅜ/ㅅ+ㅋ as per common patterns; optional RR back-map for names.
  • Particle alternations: 이/가, 은/는, 을/를 chosen by vowel/consonant environment in synthesis mode.
  • Dialect toggle (optional): KR_SE for Seoul default; placeholders for Gyeongsang/Jeju (future).

Romanization Policies

  • RR (default): null onset dropped; ㅓ→eo, ㅡ→eu, ㅢ→ui/yi; finals mapped by coda value; tensification not overtly marked.
  • Yale: tense marked by doubling (pp tt kk ss cc), ㅡ→u, ㅓ→e; preserves morphophonemics.
  • MR: uses breves/apos; optional.

YAML Skeleton (engine spec)

pgm_version: "1.0"
language: "KO"
script_pref: ["KR_Hangul","KR_RR","KR_YALE","KR_MR","KR_ASCII"]

profiles:
  - id: "KR_STD"
    ruleset: "NIKL_2025"
  - id: "KR_PHON"
    ruleset: "NIKL_2025"
    surface: true
  - id: "KR_RR"
    romanization: "RR_2000"
  - id: "KR_YALE"
    romanization: "Yale"
  - id: "KR_MR"
    romanization: "MR"
  - id: "KR_ASCII"
    romanization: "ASCII_FLAT"

inventory:
  jamo:
    choseong: ["ㄱ","ㄲ","ㄴ","ㄷ","ㄸ","ㄹ","ㅁ","ㅂ","ㅃ","ㅅ","ㅆ","ㅇ","ㅈ","ㅉ","ㅊ","ㅋ","ㅌ","ㅍ","ㅎ"]
    jungseong: ["ㅏ","ㅐ","ㅑ","ㅒ","ㅓ","ㅔ","ㅕ","ㅖ","ㅗ","ㅘ","ㅙ","ㅚ","ㅛ","ㅜ","ㅝ","ㅞ","ㅟ","ㅠ","ㅡ","ㅢ","ㅣ"]
    jongseong: ["∅","ㄱ","ㄲ","ㄳ","ㄴ","ㄵ","ㄶ","ㄷ","ㄹ","ㄺ","ㄻ","ㄼ","ㄽ","ㄾ","ㄿ","ㅀ","ㅁ","ㅂ","ㅄ","ㅅ","ㅆ","ㅇ","ㅈ","ㅊ","ㅋ","ㅌ","ㅍ","ㅎ"]
operators:
  - {name: "compose_block", fn: "jamo_to_syllable"}
  - {name: "decompose_block", fn: "syllable_to_jamo"}
  - {name: "liaison", fn: "move_final_to_next_onset_if_vowel_initial"}
  - {name: "coda_neutralize", fn: "map_complex_coda_to_{p,t,k,m,n,ŋ,l}"}
  - {name: "tensify", fn: "lenis→tense after eligible codas/particles"}
  - {name: "nasal_assim", fn: "ptk→m n ŋ before m/n"}
  - {name: "aspiration_spread", fn: "ㅎ triggers following onset aspiration"}
  - {name: "palatalize", fn: "t/d+s/i/j contexts → t͡ɕ/ɕ allophones"}
policies:
  spacing: {mode: "std"}
  romanization:
    rr: {ui_rule: "positional"}
    yale: {tense_doubling: true}
    ascii: {hint_tense: "x", hint_aspirate: "h"}
lossiness:
  to_ascii: "controlled"

Unit Test Fixtures

tests:
  - id: "KO_001_hanguk"
    in_graphemes: "한국어"
    profile: "KR_RR"
    expect: "Hangugeo"
    phones_surface: "[한구거]"

  - id: "KO_002_goodbye"
    in_graphemes: "안녕"
    profile: "KR_PHON"
    expect_phones: "/a n ɲ j ʌ ŋ/"  # nasal+palatal sequence; surface [안녕]

  - id: "KO_003_read"
    in_graphemes: "읽다"
    profile: "KR_PHON"
    expect_phones: "[익따]"  # ㄺ coda→[k], tensification of ㄷ→ㄸ

  - id: "KO_004_liason"
    in_graphemes: "밖에"
    profile: "KR_PHON"
    expect_phones: "[바께]"

  - id: "KO_005_aspiration"
    in_graphemes: "좋다"
    profile: "KR_PHON"
    expect_phones: "[조타]"

  - id: "KO_006_rr_yale"
    in_graphemes: "서울"
    profile: "KR_RR"
    expect: "Seoul"
    also:
      profile: "KR_YALE"
      expect: "sewul"

  - id: "KO_007_particles"
    synth:
      lemma: "책"
      particle: "가/이"
    expect: "책이"  # coda present → 이

Worked Micro-Examples

  • 국물 → phones [궁물] (nasal assimilation /k.m/ → [ŋm]), orthography stable.
  • 갑자기[갑짜기] (tensification of ㅈ).
  • 설날[설랄] (ㄹ+ㄴ → ll).
  • 의사: word-initial → /ɰi/; RR uisa (contextual ui rule).

Operational Knobs

  • pgm.profile=KR_STD|KR_PHON|KR_RR|KR_YALE|KR_MR|KR_ASCII
  • pgm.spacing=std|morph
  • pgm.romanization.ui_rule=positional|ui|yi
  • pgm.lossiness_report=true, pgm.audit_trace=true