Universal Letter Record (ULR)

U-Family

Design goal: Make Latin script the master key that indexes every cross-script grapheme, style, and usage for the U family — tying together Unicode, graphemes, morphemes → lexemes, semantics (etymology × morphology), and pragmatics (context & convention) so disparate disciplines can converge and communicate.


A) Latin master key (index) — “U”

Canonical key: U / u (ASCII)
Primary phonemes (IPA): /u uː ʊ y/ (language-dependent)

A1. Core Latin code points (selected, extensible)

U+CharUnicode NameFold→BaseFeature tags (examples)
U+0055ULATIN CAPITAL LETTER UUlatin
U+0075uLATIN SMALL LETTER Uulatin
U+00D9/00F9Ù/ù… WITH GRAVEudiacritic:grave
U+00DA/00FAÚ/ú… WITH ACUTEudiacritic:acute
U+00DB/00FBÛ/û… WITH CIRCUMFLEXudiacritic:circumflex
U+00DC/00FCÜ/ü… WITH DIAERESISudiacritic:diaeresis
U+0168/0169Ũ/ũ… WITH TILDEunasal (lang-specific)
U+016A/016BŪ/ū… WITH MACRONulength:long (ū)
U+016C/016DŬ/ŭ… WITH BREVEulength:short
U+016E/016FŮ/ů… WITH RING ABOVEulang:cs
U+0170/0171Ű/ű… WITH DOUBLE ACUTEulang:hu
U+01AF/01B0Ư/ư… WITH HORNulang:vi
U+01D3/01D4Ǔ/ǔ… WITH CARONutone/phonology
U+1EE4/1EE5Ụ/ụ… WITH DOT BELOWulang:vi/niger-congo
U+01D5/01D7/01D9/01DBǕ/Ǘ/Ǚ/ǛDIAERESIS+MACRON/ACUTE/CARON/GRAVEustacked:diaeresis+…

Normalization: all “styled/diacritic” forms retain metadata but fold to base u for ASCII-keying; language rules decide whether to keep length/diacritic features in downstream morphology.


B) Cross-script grapheme set, keyed by Latin “U”

B1. Greek (Grek) — Upsilon

U+CharNameTokenCore phonemesNotes
U+03A5ΥGREEK CAPITAL LETTER UPSILON:U.grek:Upsilon:Ancient /u/ → later /y/lineage to Latin V/U/Y
U+03C5υGREEK SMALL LETTER UPSILON:U.grek:upsilon:/y/ (Koine/Attic), /i/ in Modern
U+038EΎCAPITAL UPSILON WITH TONOS:U.grek:Upsilon:accentstress markModern Greek
U+03B0ΰSMALL UPSILON DIALYTIKA TONOS:U.grek:upsilon:diaeresis+accentprevents diphthong
U+03D2ϒUPSILON WITH HOOK SYMBOL:U.grek:Upsilon:symbolphysics typographyvariant

Word spellings (Latin → token): upsilon, UPSILON, Y (translit contexts) → :U.grek:upsilon: (do not Latinize glyphs; keep tokens).


B2. Hebrew (Hebr) — Vav/Waw ו

U+CharNameTokenRolesNotes
U+05D5וHEBREW LETTER VAV:U.hebr:Vav:consonant /v/; mater lectionis for /u o/
וּו + U+05BC (DAGESH)SHURUK (long ū):U.hebr:Vav:shurukvowel ūniqqud combining
וֹו + U+05B9 (HOLAM)ō mater:U.hebr:Vav:holamvowel ō
וְו + U+05B0 (SHEVA)conjunction “and” /və/:U.hebr:Vav:conjmorpheme“vav-consecutive” pragmatics in Tanakh

Latin spellings: vav, waw:U.hebr:Vav: (aliases kept).


B3. Arabic (Arab) — Wāw و

U+CharNameTokenRolesNotes
U+0648وARABIC LETTER WAW:U.arab:Waw:consonant /w/; long ūlong vowel via preceding ḍamma
U+064FُARABIC DAMMA:U.arab:dammashort /u/diacritic (combining)
U+0624ؤWAW WITH HAMZA ABOVE:U.arab:Waw:hamza/ʔu/ contextorthographic
(Perso/Urdu)وWaw:U.arab:Waw:/u o v w/language-specific pragmatics

Latin spellings: waw, wau, vāw:U.arab:Waw:.


B4. Sanskrit/Hindi (Deva) — उ / ऊ; vowel signs ु / ू

U+CharNameTokenPhonemeRole
U+0909DEVANAGARI LETTER U:U.deva:U:/u/independent vowel
U+090ADEVANAGARI LETTER UU:U.deva:UU:/uː/independent vowel
U+0941DEVANAGARI VOWEL SIGN U:U.deva:sign-U/u/matra (combining)
U+0942DEVANAGARI VOWEL SIGN UU:U.deva:sign-UU/uː/matra (combining)

Latin spellings (IAST/ISO 15919): u:U.deva:U: / sign-U; ū:U.deva:UU: / sign-UU.


C) Morphology → Lexeme semantics (how U behaves inside words)

ScriptMorpheme patternLexeme exampleGlossSemantic note
Greekὑ- (hypo- with upsilon)ὑπόhypo-“under, sub-”upsilon feeds English “hypo-/hyper-” pair
Hebrewו־ prefixוְאָמַר“and (he) said”vav as clausal connector; “vav-conversive” tense/aspect pragmatics in narrative
Arabiccase/short vowel -uكِتَابُ (kitābu)nominative endingdamma marks short /u/; long ū often by <و>
Sanskrit/Devanagarilength contrast u/ūगुरु guru vs भू bhū (root)heavy/light syllablesvowel length is morphophonemic; drives meter and derivation
LatinU/V alternationSERVVS → SERVUSU/V splitletterform historyLatin “V” served u and v; later split into U, V, W

D) Etymology (lineage & convergence)

Phoenician 𐤅 wawHebrew ו / Arabic وGreek Υ (upsilon)Latin V → later U, V, W, Y split.
Indo-Aryan “U” (Devanagari उ/ऊ) descends independently via Brāhmī (e.g., 𑀉), yet converges functionally on /u/ ~ /uː/.


E) Pragmatics (context rules)

  • Transliteration:
    • Greek → Latin: Υ/υ often y in classical loans (hyper), but u/i in Modern—choose system (ISO 843, BGN/PCGN).
    • Hebrew: Vav can be consonant v or vowel carrier u/o; niqqud disambiguates; many modern texts omit niqqud → use lexicon/context.
    • Arabic: و = /w/ consonant or long ū; damma marks short u; system: DIN 31635 / ALA-LC.
    • Sanskrit/Hindi: IAST/15919 ū (macron) is phonemic; never drop the macron in scholarly modes.
  • Normalization policy: keep Latin master key (u); attach script+role as features (e.g., :U.arab:Waw:long, :U.hebr:Vav:conj).
  • Search/compare: compare at the level key = ‘u’; refine by script, length, role if required.

F) Machine-readable spec (drop-in)

{
  "ULR": {
    "U": {
      "key": "u",
      "latin": {
        "base": ["U+0055","U+0075"],
        "diacritics": [
          "U+00D9","U+00F9","U+00DA","U+00FA","U+00DB","U+00FB","U+00DC","U+00FC",
          "U+0168","U+0169","U+016A","U+016B","U+016C","U+016D","U+016E","U+016F",
          "U+0170","U+0171","U+01AF","U+01B0","U+01D3","U+01D4","U+1EE4","U+1EE5",
          "U+01D5","U+01D7","U+01D9","U+01DB"
        ],
        "fold": "u"
      },
      "grek": {
        "name": "upsilon",
        "cap": {"cp":"U+03A5","token":":U.grek:Upsilon:"},
        "low": {"cp":"U+03C5","token":":U.grek:upsilon:"},
        "variants": [
          {"cp":"U+038E","feat":"accent"},
          {"cp":"U+03B0","feat":"diaeresis+accent"},
          {"cp":"U+03D2","feat":"symbol"}
        ],
        "aliases": ["upsilon","UPSILON","y (classical translit)"]
      },
      "hebr": {
        "name": "vav",
        "base": {"cp":"U+05D5","token":":U.hebr:Vav:"},
        "niqqud": [
          {"seq":["U+05D5","U+05BC"],"token":":U.hebr:Vav:shuruk","value":"ū"},
          {"seq":["U+05D5","U+05B9"],"token":":U.hebr:Vav:holam","value":"ō"},
          {"seq":["U+05D5","U+05B0"],"token":":U.hebr:Vav:conj","value":"and"}
        ],
        "aliases": ["vav","waw"]
      },
      "arab": {
        "name": "waw",
        "base": {"cp":"U+0648","token":":U.arab:Waw:"},
        "diacritics": [
          {"cp":"U+064F","token":":U.arab:damma","value":"u"},
          {"cp":"U+0624","token":":U.arab:Waw:hamza","value":"ʔu"}
        ],
        "aliases": ["waw","wau","vāw"]
      },
      "deva": {
        "name": "u/uu",
        "independent": [
          {"cp":"U+0909","token":":U.deva:U:","value":"u"},
          {"cp":"U+090A","token":":U.deva:UU:","value":"ū"}
        ],
        "matras": [
          {"cp":"U+0941","token":":U.deva:sign-U","attach":"consonant"},
          {"cp":"U+0942","token":":U.deva:sign-UU","attach":"consonant"}
        ],
        "aliases": ["u","ū"]
      },
      "etymology": [
        "Phoenician 𐤅 waw",
        "Hebrew ו / Arabic و (waw/vav)",
        "Greek Υ upsilon",
        "Latin V → split into U, V, W, Y",
        "Indo-Aryan: Brāhmī 𑀉 → Devanagari उ/ऊ"
      ],
      "phonology": {"values":["/u/","/uː/","/ʊ/","/y/"],"dependent_on":"language"},
      "morphology": [
        {"lang":"hebrew","pattern":"ו־","role":"conjunction","gloss":"and"},
        {"lang":"arabic","pattern":"damma ُ","role":"short vowel","gloss":"u"},
        {"lang":"sanskrit","pattern":"ū","role":"length contrast","gloss":"long u"}
      ],
      "pragmatics": {
        "transliteration_systems": {
          "greek":"ISO 843 / BGN-PCGN",
          "hebrew":"ISO 259 / SBL",
          "arabic":"DIN 31635 / ALA-LC",
          "sanskrit_hindi":"IAST / ISO 15919"
        },
        "policy":"Latin key = 'u'; preserve script tokens; fold diacritics to features"
      }
    }
  }
}

G) How to use the ULR (U-family)

  1. Index by Latin key u.
  2. Map surface (any script) → script token (e.g., :U.arab:Waw:) without glyph loss.
  3. Attach features (length, accent, niqqud, case diacritics) as metadata.
  4. Interpret meaning with morphology (e.g., Hebrew ו־ as and; Arabic damma as short /u/; Sanskrit ū as long vowel).
  5. Render or reason per context (typeset as native glyphs, romanize, or compute phono-semantic profiles).

Closing thought

One Latin key, many doors: U threads from waw to upsilon, from unpointed Hebrew to Devanagari matras. By pinning everything to a single Latin index and layering script tokens + feature tags, we keep form, function, and meaning intact — and we make interdisciplinary dialogue mechanically interoperable.


Letterscope – SolveForce Communications