PGM-04 — Marathi (MR_DEVA)

MINTED


Purpose

Round-trip, loss-aware mapping between phones/phonemes and Marathi Devanāgarī, including explicit treatment of schwa preservation, vowel length, retroflex/dental contrasts, and loanword integration from Persian, Arabic, English, and Sanskrit.

Identity

pgm::v1.0::MR_DEVA::<profile>

Orthography Profiles

  • MR_DEVA.std — Standard orthography (schwa largely preserved, Sanskrit loans intact).
  • MR_DEVA.colloquial — More phonemic, applies colloquial vowel reduction/deletion patterns.
  • MR_LATN.iso15919 — ISO 15919 transliteration (full diacritics).
  • MR_LATN.bgn — BGN/PCGN-style Latin (fewer diacritics).

Lossiness

  • Devanāgarī (std): none — spelling closely follows pronunciation.
  • Colloquial profile: controlled — due to vowel reduction and sandhi assimilation.
  • Latin ISO: none.
  • Latin BGN: controlled (diacritic loss).

Script Mechanics

  • Inherent vowel: /ə/ after bare consonants, generally retained orthographically in Marathi.
  • Dependent vowel signs (mātrā): ा ā, ि/ी i/ī, ु/ू u/ū, ृ r̥, ॄ r̥̄, े e, ै ai, ो o, ौ au.
  • Nukta letters: क़ q, ख़ x, ग़ ġ, ज़ z, फ़ f, ड़ ṛ, ढ़ ṛh for Persian/Arabic loans.
  • Anusvāra: ◌ं /ṃ/ — often homorganic nasal.
  • Candrabindu: ◌ँ — vowel nasalization.
  • Visarga: ◌ः /ḥ/ — rare, in Sanskritic forms.
  • Virāma (halant): ◌् — cancels inherent vowel; used in conjunct clusters.

Phoneme Inventory (core; MCLI-linked)

Vowels: /a aː i iː u uː e eː o oː ai au ə/; syllabic /r̩ r̩ː l̩ l̩ː/ in learned forms.
Consonants: full Indo-Aryan stop grid; fricatives /s ʃ ʂ h/; nasals /m n ɳ ŋ/; liquids /l r ɾ ɽ/; glides /j ʋ/.
Aspirated vs unaspirated stops maintained in writing; retroflex/dental distinctions preserved.


Mapping Logic

  1. Phones → Phonemes: preserve retroflex/dental, aspirated/unaspirated, vowel length.
  2. Phonemes → Graphemes:
    • Vowel-initial → independent vowel letter (अ आ इ ई उ ऊ ए ऐ ओ औ).
    • Else: consonant + appropriate mātrā, or virāma for clusters.
    • Anusvāra for nasal homorganic to following stop (context-aware).
    • Candrabindu for nasalized vowel (optional policy).
    • Nukta letters for loan phonemes.
  3. Graphemes → Phonemes:
    • Decode mātrās; inherent /ə/ unless virāma or cluster suppresses.
    • Optional schwa deletion if colloquial profile active.

Edge Policies

  • Schwa deletion:
    • std: schwa preserved except in a few Sanskritic visarga contexts.
    • colloquial: delete final schwa, and in medial positions per spoken norms.
  • Loanwords: keep etymological spelling in std; allow phonemic respelling in colloquial.
  • Nasalization: choose anusvāra vs candrabindu according to profile.
  • Conjuncts: retain in std; may split in colloquial.

YAML Skeleton (engine spec)

pgm_version: "1.0"
language: "MR"
script_pref: ["MR_DEVA", "LATN_ISO15919", "LATN_BGN"]

profiles:
  - id: "std"
    orthography_profile: "MR_STD_2025"
    disambiguation_policy: "PGM.DFLT"
  - id: "colloquial"
    orthography_profile: "MR_COLLOQ_2025"
    disambiguation_policy: "PGM.PEDAGOGIC"
  - id: "iso15919"
    orthography_profile: "MR_LATN_ISO_2025"
    disambiguation_policy: "PGM.DFLT"
  - id: "bgn"
    orthography_profile: "MR_LATN_BGN_2025"
    disambiguation_policy: "PGM.DFLT"

inventory:
  phonemes:
    - id: "PH.MR.t_dental_VL"
      ipa: "t̪"
      features: {place: dental, manner: stop, voice: VL}
      grapheme_map:
        class_id: "CLS.MR.C.T_DENTAL"
        scripts:
          - {script: MR_DEVA, glyph: "त"}
          - {script: LATN_ISO15919, mapping: "t"}
          - {script: LATN_BGN, mapping: "t"}
    - id: "PH.MR.a_long"
      ipa: "aː"
      features: {vowel: true, length: long}
      grapheme_map:
        class_id: "CLS.MR.V.A_LONG"
        scripts:
          - {script: MR_DEVA, glyph: ["आ","ा"], select: "independent|matra"}
          - {script: LATN_ISO15919, mapping: "ā"}
          - {script: LATN_BGN, mapping: "a"}
  diacritics:
    - {name: "anusvara", glyph: "ं", function: "nasal_place_assim"}
    - {name: "candrabindu", glyph: "ँ", function: "vowel_nasalization"}
    - {name: "visarga", glyph: "ः", function: "voiceless_breath"}
operators:
  - {name: "virama", glyph: "्", fn: "suppress_inherent; build_conjunct"}
  - {name: "nukta", glyph: "़", fn: "add_non_native_consonant"}
spelling_policies:
  MR_STD_2025:
    schwa: "preserve"
    nasalization: "context"
    conjuncts: "retain"
    nukta: "on_loans_only"
  MR_COLLOQ_2025:
    schwa: "delete_per_speech"
    nasalization: "explicit_on_phonemic"
    conjuncts: "simplify"
lossiness:
  to_grapheme: "none"
  to_latin_iso: "none"
  to_latin_bgn: "controlled"

Unit Test Fixtures

tests:
  - id: "MR_001_maharashtra"
    in_phonemes: "/m ə ɦ aː r aː ʂ ʈ r ə/"
    profile: "std"
    expect_graphemes: "महाराष्ट्र"
    roundtrip_ok: true

  - id: "MR_002_colloquial_schwa_drop"
    in_phonemes: "/m ə ɦ aː r aː ʂ ʈ r ə/"
    profile: "colloquial"
    expect_graphemes: "महाराष्ट्र"   # same orthography, schwa deletion marked in phoneme layer
    roundtrip_ok: true

  - id: "MR_003_iso_out"
    in_graphemes: "महाराष्ट्र"
    profile: "iso15919"
    expect: "mahārāṣṭra"

  - id: "MR_004_bgn_out"
    in_graphemes: "महाराष्ट्र"
    profile: "bgn"
    expect: "Maharashtra"

  - id: "MR_005_loans"
    in_phonemes: "/k q aː m/"
    profile: "std"
    expect_graphemes: "क़ाम"

Worked Micro-Examples

  • /m ə ɦ aː r aː ʂ ʈ r ə/महाराष्ट्र → ISO: mahārāṣṭra, BGN: Maharashtra.
  • /ɡ ə n eː ʃ/गणेश → ISO: gaṇeśa, BGN: Ganesh.
  • Loan /q aː m/ → क़ाम in std; काम if colloquial and phonemically merged.

Operational Knobs

  • pgm.profile=std|colloquial|iso15919|bgn
  • pgm.schwa=preserve|delete_per_speech
  • pgm.nukta=on|off
  • pgm.nasalization=candrabindu|anusvara|auto
  • pgm.lossiness_report=true, pgm.audit_trace=true

PGM-04 (Marathi) is MINTED and wired into the MCLI.