Architecture 33 — Semantic Layer, Knowledge Graph & Enterprise Search Fabric

Reference Architecture Diagram + Narrative (unify meaning → retrieve truth → answer with provenance)

                         ┌────────────────────────────────────────────────────┐
                         │            ENTERPRISE CONTENT UNIVERSE            │
  Docs/Email/Wiki │ ECM/CMS │ Data Lakehouse │ APIs/Logs │ Tickets/CMDB │ OT/IoT
  SharePoint/GDrive │ Confluence │ S3/ADLS  │ REST/GraphQL │ ITSM/KB   │ BMS/EPMS
                         └───────────────┬───────────────┬───────────────┬──────┘
                                         │               │               │
                                         ▼               ▼               ▼
     ┌─────────────────────────────────────────────────────────────────────────┐
     │        CONNECTORS & INGEST (continuous crawl + webhooks + CDC)         │
     │  • Text/PDF/HTML/Images/AV → OCR/ASR                                    │
     │  • Table/JSON/Log → schema normalize                                    │
     │  • PII/PHI/PCI scrubbing  • Access ACL capture (owner/tenant/labels)    │
     └───────────────┬──────────────────────────────┬──────────────────────────┘
                     │                              │
                     ▼                              ▼
   ┌─────────────────────────────────────┐  ┌────────────────────────────────────┐
   │   SEMANTIC PROCESSORS               │  │   KNOWLEDGE MODELING               │
   │  • Embeddings (domain-tuned)        │  │  • Ontologies/Taxonomies          │
   │  • Chunking/summarization           │  │  • Entity/Relation extract (NER)  │
   │  • Metadata/enrichment (topics,     │  │  • Triples graph (RDF/SPARQL)     │
   │    intents, sensitivity, recency)   │  │  • Synonyms/aliases (thesaurus)   │
   └──────────────┬──────────────────────┘  └──────────────┬────────────────────┘
                  │                                         │
                  ▼                                         ▼
   ┌─────────────────────────────────────┐   ┌───────────────────────────────────┐
   │   VECTOR INDEX (ANN/RAG)            │   │  KNOWLEDGE GRAPH STORE            │
   │  • Per-tenant/label partitions      │   │  • Typed entities, lineage,      │
   │  • Freshness/decay scoring          │   │    provenance, and policies       │
   │  • Hybrid re-rank (BM25 + vector)   │   │  • SPARQL/Gremlin APIs            │
   └──────────────┬──────────────────────┘   └──────────────┬───────────────────┘
                  │                                         │
                  ▼                                         ▼
        ┌──────────────────────────────────────────────────────────┐
        │   QUERY ORCHESTRATOR & GUARDRAILS                        │
        │  • Router: keyword ↔ hybrid ↔ KG lookup                   │
        │  • RAG pipelines (LLM-safe): retrieval allow-lists        │
        │  • Policy-as-code (residency, sensitivity, tenancy)       │
        │  • Prompt shields: redaction, cite-or-block if no source  │
        └──────────────┬──────────────────────────────┬────────────┘
                       │                              │
                       ▼                              ▼
    ┌────────────────────────────────┐     ┌──────────────────────────────────┐
    │  ANSWER SURFACES               │     │  AUTHORING & CATALOG             │
    │  • Enterprise Search (UI/API)  │     │  • Curation/KB publishing        │
    │  • Q&A Assistants (chat/tools) │     │  • Model cards / dataset pages   │
    │  • Agents (CRM/ITSM/IDE plugs) │     │  • Content lifecycle & reviews   │
    └────────────────────────────────┘     └──────────────────────────────────┘

  Cross-cuts ──► IdP/MFA • ZTNA/ABAC • HSM/KMS (keys/tokens) • SIEM/SOAR • AIOps • ITSM/CMDB • GRC/WORM (citations & consent)

Narrative (how we turn sprawling content into trustworthy, answerable knowledge)

1) Purpose & posture

  • Objective: Build a semantic fabric that makes every document, dataset, ticket, and log findable, explainable, and answerable—with citations, permissions, and provenance respected by default.
  • Posture: Policy-first RAG (retrieval-augmented generation), hybrid search (lexical + vector + KG), least-privilege answers (results filtered by ACL and labels).

2) Ingest & normalization (syntax of sources)

  • Continuous connectors crawl ECM/Share drives, cloud storage, wikis, email/KB, APIs, logs; OCR/ASR lifts text from scans and media; CDC/webhooks keep deltas fresh.
  • Scrubbers mask PII/PHI/PCI on ingest per data policy; capture ACLs, owners, and labels for downstream enforcement.

3) Semantics & knowledge modeling (meaning)

  • Semantic processors: domain-tuned embeddings, smart chunking, extractive summaries, topic/intent tagging, recency/decay signals.
  • Knowledge modeling: ontologies, controlled vocabularies, NER to bind entities; graph triples store relations and lineage/provenance; synonym/alias maps (e.g., SIP trunk = voice gateway).

4) Dual stores (how we remember)

  • Vector index for similarity / semantic retrieval (per-tenant partitions; hybrid re-rank BM25+ANN).
  • Knowledge graph for explicit reasoning: “show all circuits serving Site X with SLA Y and recent trouble tickets.” Both stores reference the same document IDs and ACLs.

5) Query orchestration & guardrails (law at the boundary)

  • Query router chooses keyword ↔ hybrid ↔ KG based on intent.
  • RAG pipelines only pull from allowed corpora (retrieval allow-lists); prompt shields redact sensitive fields and force citations; no-source → answer refuses with helpful fallback search.

6) Answers & authoring (talk and write with proof)

  • Search UI/API returns passages + citations & access badges.
  • Q&A assistants produce grounded answers with inline citations, snippet diffs, and “open in source” links.
  • Authoring tools let SMEs curate/canonize answers into KB pages; dataset/model cards live in the same catalog.

7) Security, governance & evidence (trust)

  • IdP/MFA + ZTNA/ABAC enforce who sees what; HSM/KMS protects tokens/keys.
  • SIEM/SOAR watches for mass export/PII misuse; playbooks can revoke sessions, tighten allow-lists, and open ITSM cases.
  • GRC/WORM stores citations, prompts, responses, and consent artifacts for audits.

8) Resilience & freshness (never stale)

  • Near-real-time indexers (webhooks/CDC) for deltas; AIOps checks staleness, orphaned docs, broken links; auto-reindex on policy/ACL change.
  • Multi-region replicas for vector + graph; eventual-consistency SLAs per corpus.

Reference KPIs

  • Grounded answer rate (with citations): ≥98%
  • No-source refusal integrity: 100% (never fabricate)
  • Time-to-index delta (95th): <5 minutes (most corpora)
  • Permission leakage: 0 incidents (ABAC enforced)
  • Search satisfaction/NPS: +10 pts within 2 quarters
  • Freshness violations auto-remediated: ≥95%

Minimal BOM (aligned to the fabric)

Connectors (ECM/Drives/Email/Wiki/APIs/Logs), OCR/ASR, Scrubbers/PII maskers, Embedding service (domain-tuned), Hybrid search (BM25+ANN), Vector DB (ANN), Knowledge Graph (RDF/SPARQL or LPG/Gremlin), Ontology/Thesaurus service, Query router/orchestrator, RAG pipelines with allow-lists & prompt shields, Search UI/API + Chat assistants, Authoring/KB catalog, IdP/MFA, ZTNA/ABAC, HSM/KMS, SIEM/SOAR, AIOps freshness monitors, ITSM/CMDB, GRC/WORM.