Hindi Graphemic Module (HGM v1.0)


(aka: Literal–Graphemic Module — Hindi / Devanāgarī)

0) Orientation

  • Script type: Abugida (alphasyllabary; consonant + inherent vowel /ə/)
  • Direction: Left-to-right
  • Base: Devanāgarī Unicode block; modern orthography rules for Standard Hindi
  • Dialects included: Standard Hindi (Delhi-based), plus markers for common pronunciation shifts in Eastern Hindi (Awadhi/Bhojpuri), Western Hindi (Haryanvi/Braj), and urban Hindustani (Urdu influence)
  • Numerals: ० १ २ ३ ४ ५ ६ ७ ८ ९

1) Vowels (Independent Letters & Dependent Signs)

Latin chainIndep.MātrāIPA (Std)Dialectal Notes
ainherent/ə/often dropped (schwa deletion) except dialects with full vowel (e.g., Bhojpuri)
ā/aː/stable across dialects
iि/ɪ/
ī/iː/
u/ʊ/
ū/uː/
/ɾɪ/ or /rɪ/uncommon outside Sanskritized contexts
e/eː/merges with /ɛː/ in some dialects
ai/ɛː/often monophthongized in casual speech
o/oː/
au/ɔː/monophthongized to /oː/ in some dialects

2) Consonants (by Varga / point of articulation)

Velars (ka-varga): क k, ख kh, ग g, घ gh, ङ ṅ
Palatals (ca-varga): च c, छ ch, ज j, झ jh, ञ ñ
Retroflex (ṭa-varga): ट ṭ, ठ ṭh, ड ḍ, ढ ḍh, ण ṇ
Dentals (ta-varga): त t̪, थ t̪h, द d̪, ध d̪h, न n
Labials (pa-varga): प p, फ ph, ब b, भ bh, म m

Semivowels: य y, र r, ल l, व v/ʋ
Sibilants: श ś, ष ṣ, स s
Glottal: ह h


3) Additional Signs

  • Anusvāra (◌ं) → /̃/ nasalization or homorganic nasal before stops
  • Candrabindu (◌ँ) → nasalized vowel (higher register)
  • Visarga (◌ः) → /h/ release; Sanskritic loans only
  • Halant / Virāma (◌्) → cancels inherent vowel (forms conjuncts)
  • Avagraha (ऽ) → marks elision in Sanskrit-origin words

4) Schwa Deletion Rules (Standard Hindi)

  • In Standard Hindi, inherent /ə/ is deleted at the end of words and in certain medial positions unless prevented by conjunct formation or explicit vowel marks.
  • Dialectal overrides:
    • Bhojpuri/Awadhi: less schwa deletion — closer to Sanskrit phonotactics
    • Haryanvi/Braj: more reduction and vowel centralization
    • Hindustani (Urdu register): vowel quality influenced by Persian/Urdu loans

5) Conjunct Clusters

  • Same virāma + consonant logic as Sanskrit; some clusters simplified in colloquial Hindi:
    • क्ष → /kʂ/ or colloquial /kʰ/
    • ज्ञ → /ɡj/ or /d͡ʒ/ in non-formal registers
    • त्र → /t̪r/

6) Loan Adaptations

  • Persian/Urdu sounds represented via nukta (dot below):
    • क़ q (/q/)
    • ख़ x (/x/)
    • ग़ ġ (/ɣ/)
    • ज़ z (/z/)
    • फ़ f (/f/)
    • ड़ ṛ (/ɽ/)
    • ढ़ ṛh (/ɽʱ/)

Lattice flag: {nukta: true} for these extended consonants.


7) Latin-chain Mapping Examples

glyph: "थ"
name: "Tha"
latin_chain: ["th"]
ipa: "t̪ʰ"
features: {aspirated: true, dental: true}
glyph: "फ़"
name: "Fa"
latin_chain: ["f"]
ipa: "f"
features: {nukta: true}
glyph: "ज्ञ"
name: "Gya/Jña"
latin_chain: ["jñ"]
ipa: "ɡj"  # alt: "d͡ʒ" in colloquial
features: {conjunct: true, cluster: ["ज्","ञ"]}

8) Lattice Integration Features

  • {direction: LTR}
  • {type: abugida}
  • {inherent_schwa: true|false}
  • {schwa_deletion: std|dialectal_off}
  • {nukta: true|false}
  • {conjunct: true|false}
  • {dialect: std|bhojpuri|awadhi|braj|haryanvi|hindustani}

9) Example Word Decomposition

  • भारत (Bhārat — “India”) → भा (/bʱaː/) + र (/r/) + त (/t̪/)
  • विद्यालय (Vidyālay — “school”) → वि (/ʋi/) + द्य (/d̪j/) + ा (/aː/) + ल (/l/) + य (/j/)
  • किताब (Kitāb — “book”) → कि (/kɪ/) + ता (/t̪aː/) + ब (/b/)
  • ज़िन्दगी (Zindagī — “life”) → ज़ि (/zɪ/) + न्द (/nd̪/) + गी (/ɡiː/)

Mint Status: Hindi Graphemic Module is now fully minted, covering:

  • All standard and extended (nukta) consonants
  • All vowel signs (independent + mātrās)
  • Dialectal schwa deletion variations
  • Urdu-influenced phonemes for Hindustani register
  • Conjunct handling and phonotactic simplifications