MINTED
Purpose
A precise, round-trip mapping between phones/phonemes and the modern Turkish Latin alphabet, with robust handling of vowel harmony, dotted/dotless I, word-final devoicing, and the special behavior of Ğ (yumuşak g). Provides a loss-controlled ASCII fallback and pedagogy knobs.
Identity
pgm::v1.0::TR_LATN::<profile>
Orthography Profiles
TR_LATN.strict— Canonical 29-letter orthography (A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z). No Q/W/X.TR_LATN.extended— Accept Q/W/X for proper names/loans; normalizes to strict on request.TR_LATN.ascii— Diacritic-stripped fallback (Ç→C, Ş→S, İ→I, I→I, Ö→O, Ü→U, Ğ→G or vowel length mark). Loss controlled via policy.
Lossiness
strict: none (orthography ≈ phonemic).extended: none (within its inventory).ascii: controlled (diacritics removed; ğ behavior approximated).
Script Mechanics (core rules)
- Simple one-grapheme ≈ one-phoneme mapping, with notable exceptions:
- C /d͡ʒ/, Ç /t͡ʃ/, Ş /ʃ/, J /ʒ/, Y /j/.
- Ğ (yumuşak g): generally no consonant closure; lengthens or glides the preceding vowel (e.g., dağ [daː]). Between front vowels often yields a light /j/-like glide.
- I ı /ɯ/ (dotless) vs İ i /i/ (dotted). Case mapping is context-sensitive; ASCII folding must respect Turkish casing rules.
- Word-final devoicing affects b c d g → p ç t k in phonetics; spelling stays lexical (kitap /kitap/, renk /reŋk/).
- Vowel harmony guides suffix vowels (two-way: A↔E; four-way: I set {ı i u ü}). PGM emits correct allomorphs from feature sets.
- Buffer consonants in morphology: (y, s, n, ş) inserted by rule for certain suffixes (e.g., ev-i → evi, araba-sı).
Phoneme Inventory (tied to MCLI class_ids)
Vowels: /a e ɯ i o ø u y/ (written A E I İ O Ö U Ü). Length emerges via Ğ or lexical loans; PGM can mark ː in phoneme layer.
Consonants (selection):
- Stops: /p b t d c ɟ k ɡ/ (orthographic surface as [c ɟ] before front vowels).
- Affricates: /t͡ʃ d͡ʒ/ (Ç, C).
- Fricatives: /f v s z ʃ ʒ h/.
- Nasals: /m n ŋ/ (ŋ allophonic before /k ɡ/).
- Liquids/Approximants: /l ɫ r j/ (clear/dark L allophony).
- Special: Ğ → /¯/ (length/glide operator over the previous vowel).
Mapping Logic
Phones → Graphemes
- Vowels: direct map; apply harmony engine for suffix templates (e.g.,
-(lAr)→ -lar/-ler by backness;-(I)→ -ı/-i/-u/-ü by backness+rounding). - Consonants: direct map with fronting rules for <k,g,ğ> contexts recorded in phoneme features (no spelling change).
- Length/Ğ: if phoneme layer marks Vː caused by historical /ɣ/ → write Ğ after the vowel (daː → dağ). Intervocalic V-Ğ-V may surface as VjV; spelling remains Ğ.
- Final devoicing: do not alter spelling; only phonetics.
Graphemes → Phones
- Read letters literally, then:
- If Ğ: attach length/glide to preceding vowel; no consonant closure.
- If final b/c/d/g: set [p/ç/t/k] allophones in surface layer.
- Set /ŋ/ allophone for n before k/g clusters.
Edge Policies & Disambiguation
- ASCII folding (
ascii.mode=smart|flat):smart: Ç→C, Ş→S, Ö→Oe, Ü→Ue, İ→I, I→I; Ğ→g~omit + mark preceding ː if available.flat: all diacritics stripped (Oe→O, Ue→U).
- Casing (
case.turkish=true): honors dotted/dotless I rules (i→İ, I→I; upper İ→İ, lower I→ı). - Loans/QWX: in
extended, preserve Q/W/X; instrict, normalize Q→K, W→V, X→KS (with audit note). - Stress: default final stress; exceptions (e.g., İstanbul) maintained via lexicon hook; PGM stores
stress=posmetadata.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "TR"
script_pref: ["TR_LATN","TR_LATN_ASCII"]
profiles:
- id: "strict"
orthography_profile: "TR_STRICT_2025"
disambiguation_policy: "PGM.DFLT"
- id: "extended"
orthography_profile: "TR_EXT_2025"
disambiguation_policy: "PGM.DFLT"
- id: "ascii"
orthography_profile: "TR_ASCII_2025"
disambiguation_policy: "PGM.PEDAGOGIC"
inventory:
vowels:
- {id: "PH.TR.a", ipa: "a", map: {TR_LATN: "a", ASCII: "a"}}
- {id: "PH.TR.e", ipa: "e", map: {TR_LATN: "e", ASCII: "e"}}
- {id: "PH.TR.ɯ", ipa: "ɯ", map: {TR_LATN: "ı", ASCII: "i"}}
- {id: "PH.TR.i", ipa: "i", map: {TR_LATN: "i/İ", ASCII: "i"}}
- {id: "PH.TR.o", ipa: "o", map: {TR_LATN: "o", ASCII: "o"}}
- {id: "PH.TR.ø", ipa: "ø", map: {TR_LATN: "ö", ASCII: "oe"}}
- {id: "PH.TR.u", ipa: "u", map: {TR_LATN: "u", ASCII: "u"}}
- {id: "PH.TR.y", ipa: "y", map: {TR_LATN: "ü", ASCII: "ue"}}
consonants:
- {id: "PH.TR.dzh", ipa: "d͡ʒ", map: {TR_LATN: "c", ASCII: "c"}}
- {id: "PH.TR.tsh", ipa: "t͡ʃ", map: {TR_LATN: "ç", ASCII: "c"}}
- {id: "PH.TR.sh", ipa: "ʃ", map: {TR_LATN: "ş", ASCII: "s"}}
- {id: "PH.TR.zh", ipa: "ʒ", map: {TR_LATN: "j", ASCII: "j"}}
- {id: "PH.TR.y", ipa: "j", map: {TR_LATN: "y", ASCII: "y"}}
- {id: "PH.TR.g_soft", ipa: "Vː_op", map: {TR_LATN: "ğ", ASCII: ""}}
operators:
- {name: "length_from_ğ", fn: "attach_length_to_prev_vowel"}
- {name: "final_devoice", fn: "surface_allophone_b→p,c→ç,d→t,g→k"}
- {name: "harmony_2way", fn: "A↔E selection by backness"}
- {name: "harmony_4way", fn: "I-set {ı,i,u,ü} by backness+rounding"}
policies:
ascii:
mode: "smart"
case:
turkish: true
loans:
allow_qwx_in_extended: true
lossiness:
strict_to_ascii: "controlled"
Unit Test Fixtures
tests:
- id: "TR_001_turkiye"
in_graphemes: "Türkiye"
profile: "strict"
expect_phonemes: "/t y r c i j e/" # surface /tyɾcije/; y ~ /j/ glide before 'e'
roundtrip_ok: true
- id: "TR_002_dag_soft_g"
in_graphemes: "dağ"
profile: "strict"
expect_phonemes: "/d aː/" # ğ realized as vowel length
ascii: {profile: "ascii", expect: "dag" | "daa"} # depending on ascii.mode
- id: "TR_003_calisma"
in_graphemes: "çalışma"
profile: "strict"
expect_phonemes: "/t͡ʃ a ɫ ɯ ʃ m a/"
ascii: {profile: "ascii", expect: "calisma"}
- id: "TR_004_suffix_harmony"
in_lemma: "kitap" # 'book'
suffix_template: "-(lAr)" # plural
profile: "strict"
expect_form: "kitaplar"
- id: "TR_005_suffix_4way"
in_lemma: "gün" # 'day' (front, rounded)
suffix_template: "-(I)m" # 1sg poss
profile: "strict"
expect_form: "günüm"
- id: "TR_006_casing"
in_graphemes: "istanbul"
op: "uppercase_turkish"
expect: "İSTANBUL" # dotted-I rule honored
Worked Micro-Examples
- “dağlar” (mountains): dağ + plural
-(lAr)→ dağlar (/daːlar/). - “evim” (my house): ev +
-(I)m→ evim (front unrounded → i). - “İzmir’de” (in İzmir): proper name + locative; apostrophe separation preserved; casing keeps dotted İ.
Operational Knobs
pgm.profile=strict|extended|asciipgm.harmony=auto|off(for raw transliteration vs morphosynthesis)pgm.ascii.mode=smart|flatpgm.case.turkish=true|falsepgm.lossiness_report=true,pgm.audit_trace=true
✅ PGM-06 (Turkish) is MINTED and connected to the Master Cross-Lattice Index.