MINTED
Purpose
Deterministic, auditable mapping between phones/phonemes and Bengali script (Eastern Nagari), with clean round-trips to Latin transliteration. Tied to the MCLI so cross-script projection (e.g., Latin/Devanagari) is consistent.
Identity
pgm::v1.0::BN_BENG::<profile>
Orthography Profiles
BN_BENG.std— Modern standard Bangla (Kolkata/Dhaka compromise), inherent vowel /ɔ/ (“ô”) behavior typical; conjuncts allowed; nukta letters where used.BN_BENG.phonemic— Closer to surface phonology (neutralize historical spellings where safe; optional nasalization marks).BN_LATN.iso15919— ISO 15919 transliteration (diacritics).BN_LATN.bgn— BGN/PCGN-style Latin (diacritics minimized; pedagogy-friendly).
Lossiness
- Bengali script:
controlled(inherent vowel, optional nasalization, sandhi, conjunct suppression). - Latin (ISO 15919):
none. - Latin (BGN):
controlled(diacritics dropped).
Script Mechanics (Eastern Nagari essentials)
- Inherent vowel: অ “a/ô” ≈ /ɔ/ after bare consonants unless suppressed.
- Dependent vowel signs (mātrā): ি/ী i, া a, ু/ূ u, ে e, ৈ ai, ো o, ৌ au, ৃ r̥, ৄ r̥̄ (rare).
- Virāma / Hasanta: ◌্ (kills inherent vowel; builds conjuncts).
- Anusvāra: ং (ṃ) — homorganic nasal or vowel nasalization (context/policy).
- Candrabindu: ঁ (˜) — explicit vowel nasalization (optional in modern print).
- Visarga: ঃ (ḥ) — rare/learned/Sanskritic.
- Khanda ta: ৎ (t̪̚) — final unreleased dental stop (special orthographic case).
- Nukta letters: ড় (ṛ), ঢ় (ṛh), য-ফলা (y-phala), র-ফলা (r-phala) manage palatal/retroflex and liquid clusters.
Phoneme Inventory (core; pointers to MCLI class_ids)
Vowels (phonemic targets):
/a ~ ɔ/, /o/, /e/, /i iː/, /u uː/, (loan/learned) /æ, ə, ɯ/; vocalic /r̩/ (ঋ), long /r̩ː/ (ঌ~rare).
Consonants (selected):
Stops: /p pʰ b b̤/, /t̪ t̪ʰ d̪ d̪̤/, /ʈ ʈʰ ɖ ɖ̤/, /k kʰ ɡ ɡ̤/
Affricates: /t͡ʃ t͡ʃʰ d͡ʒ d͡ʒ̤/
Fricatives: /ʃ ʂ s h/ (orthography favors শ ষ স distinctions)
Nasals: /m n ŋ ɲ/
Liquids/Glides: /l r j w/
(“̤” denotes breathy-voice legacy contrasts; modern Bangla tends toward aspiration/voicing contrasts; map via profile.)
Key Mapping Logic
- Phones → Phonemes: normalize aspiration/voicing; handle alveolar~dental and retroflex contrasts per profile.
- Phonemes → Graphemes:
- If vowel-initial syllable → independent vowel letter (অ আ ই …).
- Else use consonant + mātrā or hasanta conjuncts.
- Apply r-phala (্র) and y-phala (্য) per cluster rules.
- Optional candrabindu for nasalized vowels (policy-driven).
- Inherent vowel policy (
inherent=auto|strict_off|strict_on):auto(std): realize /ɔ/ after bare consonants unless: word-final C#, before virāma conjunct, or in prescribed lexical exceptions.
- Latin output:
iso15919preserves all contrasts (e/ē, o/ō, ṛ, ṃ, ḥ).bgnuses plain letters (sh/ch/jh/ph/kh; no macrons).
Edge Policies & Disambiguation
- Homorganic nasal: ং before stops → map nasal place to following stop in phoneme layer.
- Candrabindu vs anusvāra: prefer anusvāra; promote to candrabindu in
phonemicprofile when vowel nasalization is phonemic. - Retroflex vs dental: preserve orthographic শ/ষ/স; phoneme layer collapses to /ʃ s/ where appropriate; keep retroflexes /ʈ ɖ/ distinct.
- Khanda ta (ৎ): emit on
final_t_dental_unreleased=true. - Loan alignment: English loans may keep cluster vowels (স্কুল → /skul/ “school”); profile can insert epenthetic ə if
etymology_weight<0.5.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "BN"
script_pref: ["BN_BENG", "LATN_ISO15919", "LATN_BGN"]
profiles:
- id: "std"
orthography_profile: "BN_STD_2025"
disambiguation_policy: "PGM.DFLT"
- id: "phonemic"
orthography_profile: "BN_PHONEMIC_2025"
disambiguation_policy: "PGM.PEDAGOGIC"
- id: "iso15919"
orthography_profile: "BN_LATN_ISO_2025"
disambiguation_policy: "PGM.DFLT"
- id: "bgn"
orthography_profile: "BN_LATN_BGN_2025"
disambiguation_policy: "PGM.DFLT"
inventory:
phonemes:
- id: "PH.BN.t_dental_VL"
ipa: "t̪"
features: {place: dental, manner: stop, voice: VL}
grapheme_map:
class_id: "CLS.BN.C.T_DENTAL"
scripts:
- {script: BN_BENG, glyph: "ত"}
- {script: LATN_ISO15919, mapping: "t"}
- {script: LATN_BGN, mapping: "t"}
- id: "PH.BN.ɔ"
ipa: "ɔ"
features: {vowel: true}
grapheme_map:
class_id: "CLS.BN.V.O_OPEN"
scripts:
- {script: BN_BENG, glyph: "অ", matra: null, inherent: true}
- {script: LATN_ISO15919, mapping: "ô"}
- {script: LATN_BGN, mapping: "o"}
diacritics:
- {name: "anusvara", glyph: "ং", function: "nasal_place_assim"}
- {name: "candrabindu", glyph: "ঁ", function: "vowel_nasalization"}
- {name: "visarga", glyph: "ঃ", function: "voiceless_breath"}
operators:
- {name: "hasanta", glyph: "্", fn: "suppress_inherent; build_conjunct"}
spelling_policies:
BN_STD_2025:
inherent: "auto"
conjuncts: "preferred"
nasalization: "anusvara_default"
retroflex_preserve: true
khanda_t_final: "auto"
BN_PHONEMIC_2025:
inherent: "strict_off" # always write a vowel if present
conjuncts: "minimal"
nasalization: "candrabindu_on_phonemic"
BN_LATN_ISO_2025:
scheme: "ISO15919"
BN_LATN_BGN_2025:
scheme: "BGN"
lossiness:
to_grapheme: "controlled"
to_latin_iso: "none"
to_latin_bgn: "controlled"
Unit Test Fixtures
tests:
- id: "BN_001_word_bangla"
in_phonemes: "/b a ŋ l a/"
profile: "std"
expect_graphemes: "বাংলা"
roundtrip_ok: true
- id: "BN_002_inherent_off"
in_phonemes: "/b a ŋ g l a d e ʃ/"
profile: "phonemic"
expect_graphemes: "বাাংলাদেশ" # pedagogy may force explicit vowels; engine may refine to "বাংলাদেশ"
notes: "pedagogic explicit vowels; real orthography prefers conventional form"
roundtrip_ok: true
- id: "BN_003_iso_out"
in_graphemes: "শিক্ষা"
profile: "iso15919"
expect: "śikṣā"
- id: "BN_004_bgn_out"
in_graphemes: "বাংলা"
profile: "bgn"
expect: "Bangla"
- id: "BN_005_nasalization"
in_phonemes: "/dʒ o ñ/"
profile: "phonemic"
expect_graphemes: "জোঁন" # example shows candrabindu on vowel; exact lexeme may vary
roundtrip_ok: true
- id: "BN_006_khanda_t"
in_phonemes: "/b a t̪̚/" # final unreleased dental stop
profile: "std"
expect_graphemes: "বৎ"
Worked Micro-Examples
- /ʃ i k ʃ aː/ → শিক্ষা (śikṣā) →
iso15919: śikṣā,bgn: shiksha. - /b a ŋ l a/ → বাংলা →
iso15919: bāṅlā,bgn: Bangla. - /r̩/ (ঋ) in learned forms → ঋষি (ṛṣi) →
iso15919: ṛṣi,bgn: rishi. - Anusvāra before /k g/ → লঙ্কা (Loŋkā) “Lanka”: nasal becomes ŋ (homorganic).
Operational Knobs
pgm.profile=std|phonemic|iso15919|bgnpgm.inherent=auto|strict_off|strict_onpgm.conjuncts=preferred|minimal|nonepgm.nasalization=candrabindu|anusvara|autopgm.dialect=dhaka|kolkata(fine-tunes /eæ/, /oɔ/ realizations in Latin output)pgm.lossiness_report=true,pgm.audit_trace=true
✅ PGM-02 (Bengali) is MINTED and linked to MCLI classes.