MINTED
Purpose
Round-trip, loss-aware mapping between phones/phonemes and Marathi Devanāgarī, including explicit treatment of schwa preservation, vowel length, retroflex/dental contrasts, and loanword integration from Persian, Arabic, English, and Sanskrit.
Identity
pgm::v1.0::MR_DEVA::<profile>
Orthography Profiles
MR_DEVA.std— Standard orthography (schwa largely preserved, Sanskrit loans intact).MR_DEVA.colloquial— More phonemic, applies colloquial vowel reduction/deletion patterns.MR_LATN.iso15919— ISO 15919 transliteration (full diacritics).MR_LATN.bgn— BGN/PCGN-style Latin (fewer diacritics).
Lossiness
- Devanāgarī (std):
none— spelling closely follows pronunciation. - Colloquial profile:
controlled— due to vowel reduction and sandhi assimilation. - Latin ISO:
none. - Latin BGN:
controlled(diacritic loss).
Script Mechanics
- Inherent vowel: /ə/ after bare consonants, generally retained orthographically in Marathi.
- Dependent vowel signs (mātrā): ा ā, ि/ी i/ī, ु/ू u/ū, ृ r̥, ॄ r̥̄, े e, ै ai, ो o, ौ au.
- Nukta letters: क़ q, ख़ x, ग़ ġ, ज़ z, फ़ f, ड़ ṛ, ढ़ ṛh for Persian/Arabic loans.
- Anusvāra: ◌ं /ṃ/ — often homorganic nasal.
- Candrabindu: ◌ँ — vowel nasalization.
- Visarga: ◌ः /ḥ/ — rare, in Sanskritic forms.
- Virāma (halant): ◌् — cancels inherent vowel; used in conjunct clusters.
Phoneme Inventory (core; MCLI-linked)
Vowels: /a aː i iː u uː e eː o oː ai au ə/; syllabic /r̩ r̩ː l̩ l̩ː/ in learned forms.
Consonants: full Indo-Aryan stop grid; fricatives /s ʃ ʂ h/; nasals /m n ɳ ŋ/; liquids /l r ɾ ɽ/; glides /j ʋ/.
Aspirated vs unaspirated stops maintained in writing; retroflex/dental distinctions preserved.
Mapping Logic
- Phones → Phonemes: preserve retroflex/dental, aspirated/unaspirated, vowel length.
- Phonemes → Graphemes:
- Vowel-initial → independent vowel letter (अ आ इ ई उ ऊ ए ऐ ओ औ).
- Else: consonant + appropriate mātrā, or virāma for clusters.
- Anusvāra for nasal homorganic to following stop (context-aware).
- Candrabindu for nasalized vowel (optional policy).
- Nukta letters for loan phonemes.
- Graphemes → Phonemes:
- Decode mātrās; inherent /ə/ unless virāma or cluster suppresses.
- Optional schwa deletion if
colloquialprofile active.
Edge Policies
- Schwa deletion:
std: schwa preserved except in a few Sanskritic visarga contexts.colloquial: delete final schwa, and in medial positions per spoken norms.
- Loanwords: keep etymological spelling in
std; allow phonemic respelling incolloquial. - Nasalization: choose anusvāra vs candrabindu according to profile.
- Conjuncts: retain in
std; may split incolloquial.
YAML Skeleton (engine spec)
pgm_version: "1.0"
language: "MR"
script_pref: ["MR_DEVA", "LATN_ISO15919", "LATN_BGN"]
profiles:
- id: "std"
orthography_profile: "MR_STD_2025"
disambiguation_policy: "PGM.DFLT"
- id: "colloquial"
orthography_profile: "MR_COLLOQ_2025"
disambiguation_policy: "PGM.PEDAGOGIC"
- id: "iso15919"
orthography_profile: "MR_LATN_ISO_2025"
disambiguation_policy: "PGM.DFLT"
- id: "bgn"
orthography_profile: "MR_LATN_BGN_2025"
disambiguation_policy: "PGM.DFLT"
inventory:
phonemes:
- id: "PH.MR.t_dental_VL"
ipa: "t̪"
features: {place: dental, manner: stop, voice: VL}
grapheme_map:
class_id: "CLS.MR.C.T_DENTAL"
scripts:
- {script: MR_DEVA, glyph: "त"}
- {script: LATN_ISO15919, mapping: "t"}
- {script: LATN_BGN, mapping: "t"}
- id: "PH.MR.a_long"
ipa: "aː"
features: {vowel: true, length: long}
grapheme_map:
class_id: "CLS.MR.V.A_LONG"
scripts:
- {script: MR_DEVA, glyph: ["आ","ा"], select: "independent|matra"}
- {script: LATN_ISO15919, mapping: "ā"}
- {script: LATN_BGN, mapping: "a"}
diacritics:
- {name: "anusvara", glyph: "ं", function: "nasal_place_assim"}
- {name: "candrabindu", glyph: "ँ", function: "vowel_nasalization"}
- {name: "visarga", glyph: "ः", function: "voiceless_breath"}
operators:
- {name: "virama", glyph: "्", fn: "suppress_inherent; build_conjunct"}
- {name: "nukta", glyph: "़", fn: "add_non_native_consonant"}
spelling_policies:
MR_STD_2025:
schwa: "preserve"
nasalization: "context"
conjuncts: "retain"
nukta: "on_loans_only"
MR_COLLOQ_2025:
schwa: "delete_per_speech"
nasalization: "explicit_on_phonemic"
conjuncts: "simplify"
lossiness:
to_grapheme: "none"
to_latin_iso: "none"
to_latin_bgn: "controlled"
Unit Test Fixtures
tests:
- id: "MR_001_maharashtra"
in_phonemes: "/m ə ɦ aː r aː ʂ ʈ r ə/"
profile: "std"
expect_graphemes: "महाराष्ट्र"
roundtrip_ok: true
- id: "MR_002_colloquial_schwa_drop"
in_phonemes: "/m ə ɦ aː r aː ʂ ʈ r ə/"
profile: "colloquial"
expect_graphemes: "महाराष्ट्र" # same orthography, schwa deletion marked in phoneme layer
roundtrip_ok: true
- id: "MR_003_iso_out"
in_graphemes: "महाराष्ट्र"
profile: "iso15919"
expect: "mahārāṣṭra"
- id: "MR_004_bgn_out"
in_graphemes: "महाराष्ट्र"
profile: "bgn"
expect: "Maharashtra"
- id: "MR_005_loans"
in_phonemes: "/k q aː m/"
profile: "std"
expect_graphemes: "क़ाम"
Worked Micro-Examples
- /m ə ɦ aː r aː ʂ ʈ r ə/ → महाराष्ट्र → ISO: mahārāṣṭra, BGN: Maharashtra.
- /ɡ ə n eː ʃ/ → गणेश → ISO: gaṇeśa, BGN: Ganesh.
- Loan /q aː m/ → क़ाम in
std; काम ifcolloquialand phonemically merged.
Operational Knobs
pgm.profile=std|colloquial|iso15919|bgnpgm.schwa=preserve|delete_per_speechpgm.nukta=on|offpgm.nasalization=candrabindu|anusvara|autopgm.lossiness_report=true,pgm.audit_trace=true
✅ PGM-04 (Marathi) is MINTED and wired into the MCLI.