Romanian Graphemic Module (LGM v1.0)

Alphabet (31 core letters)

For each: Glyph → Name → Latin chain → Phoneme (IPA) → Notes

A → “a” → a → /a/
Ă → “ă” (a-breve) → a + ˘ → /ə/ → mid-central (schwa), very common
 → “â” (a-circumflex) → a + ˆ → /ɨ/ → central high unrounded (see rule with Î)
B → “be” → b → /b/
C → “ce” → c → /k/ before a, ă, â, o, u; /t͡ʃ/ before e, i (see digraph rule)
D → “de” → d → /d/
E → “e” → e → /e/ or /je/ word-initial in some contexts
F → “ef” → f → /f/
G → “ghe” → g → /g/ before a, ă, â, o, u; /d͡ʒ/ before e, i (see digraph rule)
H → “haș” → h → /h/
I → “i” → i → /i/ or palatalization marker on preceding consonant (see rules)
Î → “î” (i-circumflex) → i + ˆ → /ɨ/ → used at word beginning/end (see orthographic rule)
J → “je” → j → /ʒ/
K* → “ka” → k → /k/ → loans & symbols
L → “el” → l → /l/
M → “em” → m → /m/
N → “en” → n → /n/
O → “o” → o → /o/
P → “pe” → p → /p/
Q* → “ku” → q → /k/ (usually qu=/k/ before e,i) → loans & names
R → “er” → r → /r/ (alveolar trill)
S → “es” → s → /s/
Ș → “ș” (s-comma) → s + , → /ʃ/ → correct modern form (not Ş/cedilla)
T → “te” → t → /t/
Ț → “ț” (t-comma) → t + , → /t͡s/ → correct modern form (not Ţ/cedilla)
U → “u” → u → /u/
V → “ve” → v → /v/
W* → “dublu v” → w → /v/ or /w/ in loans
X* → “ics” → x → /ks/ or /gz/ by context → loans/science
Y* → “igrec” → y → /j/ or vowel quality in loans (rare)

*K/Q/W/X/Y are official but primarily for loanwords, names, technical symbols. Core native morphology runs on the other 26+5 letters.


Multi-letter grapheme rules (engine essentials)

Treat these as orthography → phoneme rules with explicit conditions; they’re not separate letters.

Hard vs. soft C/G before front vowels

  • c + e/i → /t͡ʃ/ (e.g., cer /t͡ʃer/, cinci /t͡ʃint͡ʃʲ/)
  • g + e/i → /d͡ʒ/ (e.g., ger /d͡ʒer/, gigant /d͡ʒiˈgant/)
  • To keep hard values before e/i:
    • che / chi → /ke/ /ki/
    • ghe / ghi → /ge/ /gi/

Palatalization via final/medial “i”

  • Consonant + i may palatalize the consonant without adding a full vowel syllable (e.g., lupi /ˈlupʲ/, mici /mit͡ʃʲ/).
  • Model with feature flag: palatal=true when i is non-syllabic.

Diphthongs & vowel clusters (graphemic sequences, not letters)

  • ea, ia, ie, io, iu, oa, ua, ue appear frequently; treat as sequences with predictable glide onsets (e.g., fie /ˈfje/, poate /ˈpo̯ate/).

Stress

  • Not marked orthographically; default lexical. Keep stress in lexicon layer, not grapheme layer.

Orthographic rules (â vs î; s/ț with comma)

  • /ɨ/ is written âinside words and î at word boundaries (beginning or end).
    • român (inside) → â; înger (initial) → î; a urî (final) → î.
    • Maintain exceptions in lexicon; rule is productive for generation.
  • Use Ș/Ț (comma below) exclusively. Legacy Ş/Ţ (cedilla) are to be normalized on import.

LGM entry template

type: grapheme
glyph: "Ț"
name: "ț"
latin_chain: ["t", "comma_below"]
phoneme: "t͡s"
features: {affricate: true, place: "alveolar"}
notes: "Modern Romanian uses comma-below; normalize from cedilla on ingest."

Minimal contrast grid (so engines don’t collapse phonemes)

  • /s ~ ʃ/s vs ș (si vs și: /si/ vs /ʃi/)
  • /t ~ t͡s/t vs ț (tu /tu/ vs țuică /ˈt͡sujkə/)
  • /i (vowel) ~ palatalization/i syllabic vs non-syllabic (flag)
  • /k ~ t͡ʃ/ front-vowel contexts → chi/che (/k/) vs ci/ce (/t͡ʃ/)
  • /g ~ d͡ʒ/ front-vowel contexts → ghi/ghe (/g/) vs gi/ge (/d͡ʒ/)
  • /ɨ ~ i ~ ə/â/î vs i vs ă (keep all three distinct)

Example decompositions (literal → morphemic hint)

  1. român → r + o + m + â(/ɨ/) + n → “Romanian (person)”
  2. înger → î(/ɨ/) + n + g + e + r → “angel”
  3. cer → c + e → /t͡ʃ + e/ → “sky”
  4. gheață → gh + e + a + ț + ă → /g + e + a + t͡s + ə/ → “ice”
  5. știință → ș + t + i(palatal flag?) + i + n + ț + ă → /ʃtiˈint͡sə/ → “science”