(aka: Literal–Graphemic Module — Sanskrit / Devanāgarī)
0) Orientation
- Script type: Abugida (consonant+inherent vowel /a/; dependent vowel signs = mātrās)
- Direction: Left-to-right; conjunct (ligature) shaping for clusters
- Phonology target: Classical Sanskrit (IAST transliteration)
- Numbers: ० १ २ ३ ४ ५ ६ ७ ८ ९
- Core signs: Anusvāra ◌ं (ṃ), Visarga ◌ः (ḥ), Candrabindu ◌ँ (̃), Virāma ◌् (halant), Avagraha ऽ (’)
1) Vowels (Independent Letters & Dependent Signs)
| IAST | Independent | Mātrā (dependent) | IPA | Notes |
|---|---|---|---|---|
| a | अ | (inherent) or ◌ | /ɐ | inherent in every consonant |
| ā | आ | ा | /aː/ | |
| i | इ | ि | /i/ | pre-base matra (renders before consonant) |
| ī | ई | ी | /iː/ | |
| u | उ | ु | /u/ | subjoined matra |
| ū | ऊ | ू | /uː/ | |
| ṛ | ऋ | ृ | /r̩/ | vocalic r |
| ṝ | ॠ | ॄ | /r̩ː/ | rare, classical |
| ḷ | ऌ | ॢ | /l̩/ | rare |
| ḹ | ॡ | ॣ | /l̩ː/ | very rare |
| e | ए | े | /eː/ | historically long |
| ai | ऐ | ै | /ai̯/ | |
| o | ओ | ो | /oː/ | historically long |
| au | औ | ौ | /au̯/ |
Lattice flags:
{vowel: true, length: short|long, syllabic: r|l, diphthong: true|false, position: independent|matra}
2) Consonants (by Varga / place & phonation)
Velars (ka-varga): क ka /k/, ख kha /kʰ/, ग ga /ɡ/, घ gha /ɡʱ/, ङ ṅa /ŋ/
Palatals (ca-varga): च ca /t͡ɕt͡ʃ/, छ cha /t͡ɕʰt͡ʃʰ/, ज ja /d͡ʑd͡ʒ/, झ jha /d͡ʑʱd͡ʒʱ/, ञ ña /ɲ/
Retroflex (ṭa-varga): ट ṭa /ʈ/, ठ ṭha /ʈʰ/, ड ḍa /ɖ/, ढ ḍha /ɖʱ/, ण ṇa /ɳ/
Dentals (ta-varga): त ta /t̪/, थ tha /t̪ʰ/, द da /d̪/, ध dha /d̪ʱ/, न na /n/
Labials (pa-varga): प pa /p/, फ pha /pʰ/, ब ba /b/, भ bha /bʱ/, म ma /m/
Semivowels: य ya /j/, र ra /r/ (tap), ल la /l/, व va /ʋv/ʃ/, ष ṣa /ʂ/, स sa /s/
Sibilants: श śa /ɕ
Glottal: ह ha /ɦ/
Phonation flags:
{aspirated: true|false, voiced: true|false, retroflex: true|false, nasal: true|false}
3) Core Diacritics & Operators
- Virāma (◌्): cancels the inherent a → forms conjuncts (e.g., क् + ष → क्ष)
- Anusvāra (◌ं): homorganic nasal /ṃ/, sandhi-sensitive
- Visarga (◌ः): voiceless post-vocalic aspiration /ḥ/
- Candrabindu (◌ँ): vowel nasalization /̃/
- Avagraha (ऽ): shows elision/aphaeresis in sandhi (e.g., ’stu for astu)
4) Conjuncts (Ligature Logic)
- Rule:
C + VIRĀMA + C (+ VIRĀMA + C …)→ conjunct cluster with script-specific ligature or stacked form. - Examples:
- क्ष = क् + ष → kṣa
- ज्ञ = ज् + ञ → jña
- त्र = त् + र → tra
- श्र = श् + र → śra
Lattice:
{cluster: [C1,C2,...], conjunct:true}with a canonical Latin chain (IAST) and grapheme ID.
5) Sandhi Aware Layer (minimal operational set)
- Visarga-sandhi: vowel-following changes (e.g., namaḥ + astu → namo ’stu)
- Anusvāra-sandhi: nasal place assimilation to following stop (ṃ → [ŋ/ɲ/ɳ/n/m]).
- Vowel-sandhi: e/ai/o/au coalescence rules (classical paninian set).
Toggle:
{sandhi_mode: classical|disabled}; when disabled, spell strictly by orthography (no phonological rewrite).
6) Latin-Chain (IAST) Mapping — Exemplars
# Independent vowel
glyph: "ऋ"
name: "Ṛ-vocalic"
latin_chain: ["Ṛ"]
ipa: "r̩"
features: {vowel: true, syllabic: "r", position: "independent"}
# Dependent vowel sign (matra)
glyph: "◌ि"
name: "i-mātrā"
latin_chain: ["i"]
ipa: "i"
features: {vowel: true, position: "matra", prebase: true}
# Consonant with features
glyph: "ढ"
name: "ḍha"
latin_chain: ["ḍh"]
ipa: "ɖʱ"
features: {retroflex: true, voiced: true, aspirated: true}
# Conjunct example
glyph: "क्ष"
name: "kṣa"
latin_chain: ["k","ṣ","a"]
ipa: "kʂɐ"
features: {conjunct: true, cluster: ["क्","ष"]}
7) Orthographic Rules (Sanskrit-specific vs Hindi)
- Inherent vowel /a/ is generally pronounced in Sanskrit unless neutralized by virāma or sandhi; no systematic “schwa deletion” like Hindi.
- Long vowels (ā ī ū e o) are phonemic; e and o are historically long—treat as
{length: long}. - Vocalic r/l behave as syllabic nuclei; their long counterparts appear in classical texts but are rare in modern usage.
- Diacritic placement: ◌ि renders before the base consonant; ◌ु below; ◌े/◌ै/◌ो/◌ौ above/right.
8) Example Decompositions
- धर्मः (dharmaḥ) → ध /d̪ʱ/ + र /r/ + म /m/ + अ /a/ + ◌ः /ḥ/
- योग (yoga) → य /j/ + ओ /oː/ + ग /ɡ/ + अ /a/
- विद्या (vidyā) → वि (व + ि) /vi/ + द्य (द्+य) /d̪j/ + ा /ā/ → /ʋid̪jaː/
- क्षेत्र (kṣetra) → क्ष /kʂ/ + े /eː/ + त् /t̪/ + र /r/ + अ /a/ → /kʂeːt̪rɐ/
9) Lattice Integration Features (ELM/LLM-ready)
{direction: LTR, type: abugida}{consonant: true|false, vowel: true|false}{inherent_a: true|false}(false if virāma in cluster tail){matra: none|a|ā|i|ī|u|ū|ṛ|ṝ|ḷ|ḹ|e|ai|o|au}{retroflex|dental|palatal|velar|labial: true|false}{aspirated|voiced|nasal: true|false}{conjunct: true|false, cluster:[…]}{sandhi_mode: classical|disabled}
10) Minimal API of Operations (for implementers)
- Tokenize codepoints; detect virāma chains → build clusters.
- Attach mātrās to the rightmost consonant of each cluster; handle ◌ि pre-base rendering.
- Emit IAST by cluster: consonant base(s) → IAST, plus inferred a unless
virāmaor a matra overrides. - Apply optional sandhi if
{sandhi_mode: classical}. - Serialize both graphemic JSON and IAST string for cross-system comparability.
11) Edge Inventory (complete set for coverage)
- Independent vowels: अ आ इ ई उ ऊ ऋ ॠ ऌ ॡ ए ऐ ओ औ
- Consonants: क ख ग घ ङ | च छ ज झ ञ | ट ठ ड ढ ण | त थ द ध न | प फ ब भ म | य र ल व | श ष स ह
- Core signs: ◌ं ◌ः ◌ँ ◌् ऽ
- Digits: ० १ २ ३ ४ ५ ६ ७ ८ ९
✅ Mint Status: Sanskrit Graphemic Module (SGM v1.0) is fully minted—IAST-anchored, sandhi-aware, with complete vowel/consonant inventories, diacritic logic, conjunct handling, and a precise lattice interface.
Mint Ledger — Sanskrit Added
Fully Minted: Latin GM (English, Spanish, Portuguese, Romanian, Polish, German, French, Italian, Hungarian, Swahili, Hausa, Zulu, Yoruba, Tagalog/Filipino + Baybayin, Jamaican Patois, Macanese Patuá) • Non-Latin GM (Chinese radicals, Japanese Hiragana/Katakana + core Kanji, Sanskrit, Hindi, Russian, Aramaic, Hebrew, Syriac, Arabic, Urdu)