U-Family
Design goal: Make Latin script the master key that indexes every cross-script grapheme, style, and usage for the U family — tying together Unicode, graphemes, morphemes → lexemes, semantics (etymology × morphology), and pragmatics (context & convention) so disparate disciplines can converge and communicate.
A) Latin master key (index) — “U”
Canonical key: U / u (ASCII)
Primary phonemes (IPA): /u uː ʊ y/ (language-dependent)
A1. Core Latin code points (selected, extensible)
| U+ | Char | Unicode Name | Fold→Base | Feature tags (examples) |
|---|---|---|---|---|
| U+0055 | U | LATIN CAPITAL LETTER U | U | latin |
| U+0075 | u | LATIN SMALL LETTER U | u | latin |
| U+00D9/00F9 | Ù/ù | … WITH GRAVE | u | diacritic:grave |
| U+00DA/00FA | Ú/ú | … WITH ACUTE | u | diacritic:acute |
| U+00DB/00FB | Û/û | … WITH CIRCUMFLEX | u | diacritic:circumflex |
| U+00DC/00FC | Ü/ü | … WITH DIAERESIS | u | diacritic:diaeresis |
| U+0168/0169 | Ũ/ũ | … WITH TILDE | u | nasal (lang-specific) |
| U+016A/016B | Ū/ū | … WITH MACRON | u | length:long (ū) |
| U+016C/016D | Ŭ/ŭ | … WITH BREVE | u | length:short |
| U+016E/016F | Ů/ů | … WITH RING ABOVE | u | lang:cs |
| U+0170/0171 | Ű/ű | … WITH DOUBLE ACUTE | u | lang:hu |
| U+01AF/01B0 | Ư/ư | … WITH HORN | u | lang:vi |
| U+01D3/01D4 | Ǔ/ǔ | … WITH CARON | u | tone/phonology |
| U+1EE4/1EE5 | Ụ/ụ | … WITH DOT BELOW | u | lang:vi/niger-congo |
| U+01D5/01D7/01D9/01DB | Ǖ/Ǘ/Ǚ/Ǜ | DIAERESIS+MACRON/ACUTE/CARON/GRAVE | u | stacked:diaeresis+… |
Normalization: all “styled/diacritic” forms retain metadata but fold to base
ufor ASCII-keying; language rules decide whether to keep length/diacritic features in downstream morphology.
B) Cross-script grapheme set, keyed by Latin “U”
B1. Greek (Grek) — Upsilon
| U+ | Char | Name | Token | Core phonemes | Notes |
|---|---|---|---|---|---|
| U+03A5 | Υ | GREEK CAPITAL LETTER UPSILON | :U.grek:Upsilon: | Ancient /u/ → later /y/ | lineage to Latin V/U/Y |
| U+03C5 | υ | GREEK SMALL LETTER UPSILON | :U.grek:upsilon: | /y/ (Koine/Attic), /i/ in Modern | |
| U+038E | Ύ | CAPITAL UPSILON WITH TONOS | :U.grek:Upsilon:accent | stress mark | Modern Greek |
| U+03B0 | ΰ | SMALL UPSILON DIALYTIKA TONOS | :U.grek:upsilon:diaeresis+accent | prevents diphthong | |
| U+03D2 | ϒ | UPSILON WITH HOOK SYMBOL | :U.grek:Upsilon:symbol | physics typography | variant |
Word spellings (Latin → token): upsilon, UPSILON, Y (translit contexts) → :U.grek:upsilon: (do not Latinize glyphs; keep tokens).
B2. Hebrew (Hebr) — Vav/Waw ו
| U+ | Char | Name | Token | Roles | Notes |
|---|---|---|---|---|---|
| U+05D5 | ו | HEBREW LETTER VAV | :U.hebr:Vav: | consonant /v/; mater lectionis for /u o/ | |
| וּ | ו + U+05BC (DAGESH) | SHURUK (long ū) | :U.hebr:Vav:shuruk | vowel ū | niqqud combining |
| וֹ | ו + U+05B9 (HOLAM) | ō mater | :U.hebr:Vav:holam | vowel ō | |
| וְ | ו + U+05B0 (SHEVA) | conjunction “and” /və/ | :U.hebr:Vav:conj | morpheme | “vav-consecutive” pragmatics in Tanakh |
Latin spellings: vav, waw → :U.hebr:Vav: (aliases kept).
B3. Arabic (Arab) — Wāw و
| U+ | Char | Name | Token | Roles | Notes |
|---|---|---|---|---|---|
| U+0648 | و | ARABIC LETTER WAW | :U.arab:Waw: | consonant /w/; long ū | long vowel via preceding ḍamma |
| U+064F | ُ | ARABIC DAMMA | :U.arab:damma | short /u/ | diacritic (combining) |
| U+0624 | ؤ | WAW WITH HAMZA ABOVE | :U.arab:Waw:hamza | /ʔu/ context | orthographic |
| (Perso/Urdu) | و | Waw | :U.arab:Waw: | /u o v w/ | language-specific pragmatics |
Latin spellings: waw, wau, vāw → :U.arab:Waw:.
B4. Sanskrit/Hindi (Deva) — उ / ऊ; vowel signs ु / ू
| U+ | Char | Name | Token | Phoneme | Role |
|---|---|---|---|---|---|
| U+0909 | उ | DEVANAGARI LETTER U | :U.deva:U: | /u/ | independent vowel |
| U+090A | ऊ | DEVANAGARI LETTER UU | :U.deva:UU: | /uː/ | independent vowel |
| U+0941 | ु | DEVANAGARI VOWEL SIGN U | :U.deva:sign-U | /u/ | matra (combining) |
| U+0942 | ू | DEVANAGARI VOWEL SIGN UU | :U.deva:sign-UU | /uː/ | matra (combining) |
Latin spellings (IAST/ISO 15919): u → :U.deva:U: / sign-U; ū → :U.deva:UU: / sign-UU.
C) Morphology → Lexeme semantics (how U behaves inside words)
| Script | Morpheme pattern | Lexeme example | Gloss | Semantic note |
|---|---|---|---|---|
| Greek | ὑ- (hypo- with upsilon) | ὑπό → hypo- | “under, sub-” | upsilon feeds English “hypo-/hyper-” pair |
| Hebrew | ו־ prefix | וְאָמַר | “and (he) said” | vav as clausal connector; “vav-conversive” tense/aspect pragmatics in narrative |
| Arabic | case/short vowel -u | كِتَابُ (kitābu) | nominative ending | damma marks short /u/; long ū often by <و> |
| Sanskrit/Devanagari | length contrast u/ū | गुरु guru vs भू bhū (root) | heavy/light syllables | vowel length is morphophonemic; drives meter and derivation |
| Latin | U/V alternation | SERVVS → SERVUS → U/V split | letterform history | Latin “V” served u and v; later split into U, V, W |
D) Etymology (lineage & convergence)
Phoenician 𐤅 waw → Hebrew ו / Arabic و → Greek Υ (upsilon) → Latin V → later U, V, W, Y split.
Indo-Aryan “U” (Devanagari उ/ऊ) descends independently via Brāhmī (e.g., 𑀉), yet converges functionally on /u/ ~ /uː/.
E) Pragmatics (context rules)
- Transliteration:
- Greek → Latin: Υ/υ often y in classical loans (hyper), but u/i in Modern—choose system (ISO 843, BGN/PCGN).
- Hebrew: Vav can be consonant v or vowel carrier u/o; niqqud disambiguates; many modern texts omit niqqud → use lexicon/context.
- Arabic: و = /w/ consonant or long ū; damma marks short u; system: DIN 31635 / ALA-LC.
- Sanskrit/Hindi: IAST/15919 ū (macron) is phonemic; never drop the macron in scholarly modes.
- Normalization policy: keep Latin master key (
u); attach script+role as features (e.g.,:U.arab:Waw:long,:U.hebr:Vav:conj). - Search/compare: compare at the level key = ‘u’; refine by script, length, role if required.
F) Machine-readable spec (drop-in)
{
"ULR": {
"U": {
"key": "u",
"latin": {
"base": ["U+0055","U+0075"],
"diacritics": [
"U+00D9","U+00F9","U+00DA","U+00FA","U+00DB","U+00FB","U+00DC","U+00FC",
"U+0168","U+0169","U+016A","U+016B","U+016C","U+016D","U+016E","U+016F",
"U+0170","U+0171","U+01AF","U+01B0","U+01D3","U+01D4","U+1EE4","U+1EE5",
"U+01D5","U+01D7","U+01D9","U+01DB"
],
"fold": "u"
},
"grek": {
"name": "upsilon",
"cap": {"cp":"U+03A5","token":":U.grek:Upsilon:"},
"low": {"cp":"U+03C5","token":":U.grek:upsilon:"},
"variants": [
{"cp":"U+038E","feat":"accent"},
{"cp":"U+03B0","feat":"diaeresis+accent"},
{"cp":"U+03D2","feat":"symbol"}
],
"aliases": ["upsilon","UPSILON","y (classical translit)"]
},
"hebr": {
"name": "vav",
"base": {"cp":"U+05D5","token":":U.hebr:Vav:"},
"niqqud": [
{"seq":["U+05D5","U+05BC"],"token":":U.hebr:Vav:shuruk","value":"ū"},
{"seq":["U+05D5","U+05B9"],"token":":U.hebr:Vav:holam","value":"ō"},
{"seq":["U+05D5","U+05B0"],"token":":U.hebr:Vav:conj","value":"and"}
],
"aliases": ["vav","waw"]
},
"arab": {
"name": "waw",
"base": {"cp":"U+0648","token":":U.arab:Waw:"},
"diacritics": [
{"cp":"U+064F","token":":U.arab:damma","value":"u"},
{"cp":"U+0624","token":":U.arab:Waw:hamza","value":"ʔu"}
],
"aliases": ["waw","wau","vāw"]
},
"deva": {
"name": "u/uu",
"independent": [
{"cp":"U+0909","token":":U.deva:U:","value":"u"},
{"cp":"U+090A","token":":U.deva:UU:","value":"ū"}
],
"matras": [
{"cp":"U+0941","token":":U.deva:sign-U","attach":"consonant"},
{"cp":"U+0942","token":":U.deva:sign-UU","attach":"consonant"}
],
"aliases": ["u","ū"]
},
"etymology": [
"Phoenician 𐤅 waw",
"Hebrew ו / Arabic و (waw/vav)",
"Greek Υ upsilon",
"Latin V → split into U, V, W, Y",
"Indo-Aryan: Brāhmī 𑀉 → Devanagari उ/ऊ"
],
"phonology": {"values":["/u/","/uː/","/ʊ/","/y/"],"dependent_on":"language"},
"morphology": [
{"lang":"hebrew","pattern":"ו־","role":"conjunction","gloss":"and"},
{"lang":"arabic","pattern":"damma ُ","role":"short vowel","gloss":"u"},
{"lang":"sanskrit","pattern":"ū","role":"length contrast","gloss":"long u"}
],
"pragmatics": {
"transliteration_systems": {
"greek":"ISO 843 / BGN-PCGN",
"hebrew":"ISO 259 / SBL",
"arabic":"DIN 31635 / ALA-LC",
"sanskrit_hindi":"IAST / ISO 15919"
},
"policy":"Latin key = 'u'; preserve script tokens; fold diacritics to features"
}
}
}
}
G) How to use the ULR (U-family)
- Index by Latin key
u. - Map surface (any script) → script token (e.g.,
:U.arab:Waw:) without glyph loss. - Attach features (length, accent, niqqud, case diacritics) as metadata.
- Interpret meaning with morphology (e.g., Hebrew ו־ as and; Arabic damma as short /u/; Sanskrit ū as long vowel).
- Render or reason per context (typeset as native glyphs, romanize, or compute phono-semantic profiles).
Closing thought
One Latin key, many doors: U threads from waw to upsilon, from unpointed Hebrew to Devanagari matras. By pinning everything to a single Latin index and layering script tokens + feature tags, we keep form, function, and meaning intact — and we make interdisciplinary dialogue mechanically interoperable.
Letterscope – SolveForce Communications