PGM-06 — Turkish (TR_LATN)


MINTED

Purpose

A precise, round-trip mapping between phones/phonemes and the modern Turkish Latin alphabet, with robust handling of vowel harmony, dotted/dotless I, word-final devoicing, and the special behavior of Ğ (yumuşak g). Provides a loss-controlled ASCII fallback and pedagogy knobs.

Identity

pgm::v1.0::TR_LATN::<profile>

Orthography Profiles

  • TR_LATN.strict — Canonical 29-letter orthography (A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z). No Q/W/X.
  • TR_LATN.extended — Accept Q/W/X for proper names/loans; normalizes to strict on request.
  • TR_LATN.ascii — Diacritic-stripped fallback (Ç→C, Ş→S, İ→I, I→I, Ö→O, Ü→U, Ğ→G or vowel length mark). Loss controlled via policy.

Lossiness

  • strict: none (orthography ≈ phonemic).
  • extended: none (within its inventory).
  • ascii: controlled (diacritics removed; ğ behavior approximated).

Script Mechanics (core rules)

  • Simple one-grapheme ≈ one-phoneme mapping, with notable exceptions:
    • C /d͡ʒ/, Ç /t͡ʃ/, Ş /ʃ/, J /ʒ/, Y /j/.
    • Ğ (yumuşak g): generally no consonant closure; lengthens or glides the preceding vowel (e.g., dağ [daː]). Between front vowels often yields a light /j/-like glide.
    • I ı /ɯ/ (dotless) vs İ i /i/ (dotted). Case mapping is context-sensitive; ASCII folding must respect Turkish casing rules.
  • Word-final devoicing affects b c d gp ç t k in phonetics; spelling stays lexical (kitap /kitap/, renk /reŋk/).
  • Vowel harmony guides suffix vowels (two-way: A↔E; four-way: I set {ı i u ü}). PGM emits correct allomorphs from feature sets.
  • Buffer consonants in morphology: (y, s, n, ş) inserted by rule for certain suffixes (e.g., ev-ievi, araba-sı).

Phoneme Inventory (tied to MCLI class_ids)

Vowels: /a e ɯ i o ø u y/ (written A E I İ O Ö U Ü). Length emerges via Ğ or lexical loans; PGM can mark ː in phoneme layer.

Consonants (selection):

  • Stops: /p b t d c ɟ k ɡ/ (orthographic surface as [c ɟ] before front vowels).
  • Affricates: /t͡ʃ d͡ʒ/ (Ç, C).
  • Fricatives: /f v s z ʃ ʒ h/.
  • Nasals: /m n ŋ/ (ŋ allophonic before /k ɡ/).
  • Liquids/Approximants: /l ɫ r j/ (clear/dark L allophony).
  • Special: Ğ → /¯/ (length/glide operator over the previous vowel).

Mapping Logic

Phones → Graphemes

  1. Vowels: direct map; apply harmony engine for suffix templates (e.g., -(lAr)-lar/-ler by backness; -(I)-ı/-i/-u/-ü by backness+rounding).
  2. Consonants: direct map with fronting rules for <k,g,ğ> contexts recorded in phoneme features (no spelling change).
  3. Length/Ğ: if phoneme layer marks caused by historical /ɣ/ → write Ğ after the vowel (daːdağ). Intervocalic V-Ğ-V may surface as VjV; spelling remains Ğ.
  4. Final devoicing: do not alter spelling; only phonetics.

Graphemes → Phones

  • Read letters literally, then:
    • If Ğ: attach length/glide to preceding vowel; no consonant closure.
    • If final b/c/d/g: set [p/ç/t/k] allophones in surface layer.
    • Set /ŋ/ allophone for n before k/g clusters.

Edge Policies & Disambiguation

  • ASCII folding (ascii.mode=smart|flat):
    • smart: Ç→C, Ş→S, Ö→Oe, Ü→Ue, İ→I, I→I; Ğ→g~omit + mark preceding ː if available.
    • flat: all diacritics stripped (Oe→O, Ue→U).
  • Casing (case.turkish=true): honors dotted/dotless I rules (i→İ, I→I; upper İ→İ, lower I→ı).
  • Loans/QWX: in extended, preserve Q/W/X; in strict, normalize Q→K, W→V, X→KS (with audit note).
  • Stress: default final stress; exceptions (e.g., İstanbul) maintained via lexicon hook; PGM stores stress=pos metadata.

YAML Skeleton (engine spec)

pgm_version: "1.0"
language: "TR"
script_pref: ["TR_LATN","TR_LATN_ASCII"]

profiles:
  - id: "strict"
    orthography_profile: "TR_STRICT_2025"
    disambiguation_policy: "PGM.DFLT"
  - id: "extended"
    orthography_profile: "TR_EXT_2025"
    disambiguation_policy: "PGM.DFLT"
  - id: "ascii"
    orthography_profile: "TR_ASCII_2025"
    disambiguation_policy: "PGM.PEDAGOGIC"

inventory:
  vowels:
    - {id: "PH.TR.a", ipa: "a", map: {TR_LATN: "a", ASCII: "a"}}
    - {id: "PH.TR.e", ipa: "e", map: {TR_LATN: "e", ASCII: "e"}}
    - {id: "PH.TR.ɯ", ipa: "ɯ", map: {TR_LATN: "ı", ASCII: "i"}}
    - {id: "PH.TR.i", ipa: "i", map: {TR_LATN: "i/İ", ASCII: "i"}}
    - {id: "PH.TR.o", ipa: "o", map: {TR_LATN: "o", ASCII: "o"}}
    - {id: "PH.TR.ø", ipa: "ø", map: {TR_LATN: "ö", ASCII: "oe"}}
    - {id: "PH.TR.u", ipa: "u", map: {TR_LATN: "u", ASCII: "u"}}
    - {id: "PH.TR.y", ipa: "y", map: {TR_LATN: "ü", ASCII: "ue"}}
  consonants:
    - {id: "PH.TR.dzh", ipa: "d͡ʒ", map: {TR_LATN: "c", ASCII: "c"}}
    - {id: "PH.TR.tsh", ipa: "t͡ʃ", map: {TR_LATN: "ç", ASCII: "c"}}
    - {id: "PH.TR.sh",  ipa: "ʃ",  map: {TR_LATN: "ş", ASCII: "s"}}
    - {id: "PH.TR.zh",  ipa: "ʒ",  map: {TR_LATN: "j", ASCII: "j"}}
    - {id: "PH.TR.y",   ipa: "j",  map: {TR_LATN: "y", ASCII: "y"}}
    - {id: "PH.TR.g_soft", ipa: "Vː_op", map: {TR_LATN: "ğ", ASCII: ""}}
  operators:
    - {name: "length_from_ğ", fn: "attach_length_to_prev_vowel"}
    - {name: "final_devoice", fn: "surface_allophone_b→p,c→ç,d→t,g→k"}
    - {name: "harmony_2way",  fn: "A↔E selection by backness"}
    - {name: "harmony_4way",  fn: "I-set {ı,i,u,ü} by backness+rounding"}
policies:
  ascii:
    mode: "smart"
  case:
    turkish: true
  loans:
    allow_qwx_in_extended: true
lossiness:
  strict_to_ascii: "controlled"

Unit Test Fixtures

tests:
  - id: "TR_001_turkiye"
    in_graphemes: "Türkiye"
    profile: "strict"
    expect_phonemes: "/t y r c i j e/"   # surface /tyɾcije/; y ~ /j/ glide before 'e'
    roundtrip_ok: true

  - id: "TR_002_dag_soft_g"
    in_graphemes: "dağ"
    profile: "strict"
    expect_phonemes: "/d aː/"             # ğ realized as vowel length
    ascii: {profile: "ascii", expect: "dag" | "daa"}  # depending on ascii.mode

  - id: "TR_003_calisma"
    in_graphemes: "çalışma"
    profile: "strict"
    expect_phonemes: "/t͡ʃ a ɫ ɯ ʃ m a/"
    ascii: {profile: "ascii", expect: "calisma"}

  - id: "TR_004_suffix_harmony"
    in_lemma: "kitap"      # 'book'
    suffix_template: "-(lAr)"  # plural
    profile: "strict"
    expect_form: "kitaplar"

  - id: "TR_005_suffix_4way"
    in_lemma: "gün"        # 'day' (front, rounded)
    suffix_template: "-(I)m"   # 1sg poss
    profile: "strict"
    expect_form: "günüm"

  - id: "TR_006_casing"
    in_graphemes: "istanbul"
    op: "uppercase_turkish"
    expect: "İSTANBUL"     # dotted-I rule honored

Worked Micro-Examples

  • “dağlar” (mountains): dağ + plural -(lAr)dağlar (/daːlar/).
  • “evim” (my house): ev + -(I)mevim (front unrounded → i).
  • “İzmir’de” (in İzmir): proper name + locative; apostrophe separation preserved; casing keeps dotted İ.

Operational Knobs

  • pgm.profile=strict|extended|ascii
  • pgm.harmony=auto|off (for raw transliteration vs morphosynthesis)
  • pgm.ascii.mode=smart|flat
  • pgm.case.turkish=true|false
  • pgm.lossiness_report=true, pgm.audit_trace=true

PGM-06 (Turkish) is MINTED and connected to the Master Cross-Lattice Index.