Reference Architecture Diagram + Narrative (sync APIs + async events, policy-by-design)
┌──────────────────────────────────────────────┐
│ PRODUCERS / CONSUMERS │
Domain Apps (#1–14) │ Mobile/Web │ Partners/3rd Parties │ Internal Services
└───────────────┬─────────────────┬─────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────┐
│ EDGE API GATEWAY & WAF (NORTH–SOUTH) │
│ • REST/GraphQL/gRPC • OAuth2/OIDC • mTLS/MTLS │
│ • Rate/Quota • Threat/WAF • Tokenization/DLP │
│ • Geo/data residency rules • Monetization/Billing │
└───────────────┬────────────────────────────┬────────────┘
│ │
▼ ▼
┌────────────────────────────────────┐ ┌──────────────────────────┐
│ SERVICE MESH (EAST–WEST) │ │ EVENT BACKBONE │
│ • mTLS/Id-based policy (OPA) │ │ • Kafka/Pulsar topics │
│ • Traffic mgmt (A/B, canary) │ │ • Schema registry │
│ • Zero-trust microsegmentation │ │ • Contracts/compat chk │
└───────────┬────────────────────────┘ └───────────┬─────────────┘
│ │
▼ ▼
┌────────────────────────────────┐ ┌────────────────────────────────┐
│ CONTRACTS & CATALOG │ │ TRANSFORMS & MEDIATION │
│ • API catalog (versions, SLA) │ │ • Protocol/map (REST↔gRPC) │
│ • Async schemas & compat │ │ • Enrichment/redaction │
│ • Usage plans & keys │ │ • Dead-letter & replay │
└───────────────┬────────────────┘ └───────────────┬────────────────┘
│ │
▼ ▼
┌────────────────────────────────┐ ┌─────────────────────────────────┐
│ DEVELOPER / PARTNER PORTAL │ │ GOVERNANCE, KEYS, & EVIDENCE │
│ • Onboarding & self-service │ │ • IdP/MFA • PAM/JIT • HSM/KMS │
│ • Test keys, sandboxes │ │ • SIEM/SOAR • GRC/WORM │
│ • Docs, SDKs, postman/OCI │ │ • Residency/retention policies │
└────────────────────────────────┘ └─────────────────────────────────┘
Telemetry / FinOps ──► AIOps (latency, error, saturation) • ITSM/CMDB • Cost/Monetization • SLA dashboards
Narrative (how APIs and events stay fast, safe, and contractually true)
1) Purpose & posture
- Objective: Provide a single, governed fabric for synchronous APIs (REST/GraphQL/gRPC) and asynchronous events (Kafka/Pulsar) that connects internal domains, external partners, and channels—securely, performantly, and with auditable contracts.
- Posture: Zero-trust, identity-aware, policy-as-code, with schema/contract discipline and geo/data-residency enforced at the edge.
2) Edge API gateway (syntax at the boundary)
- Terminates TLS/mTLS, validates OAuth2/OIDC/JWT, applies WAF/API-security, rate/quotas, and tokenization/DLP for sensitive fields (PCI/PHI/PII).
- Monetization & usage plans (per partner/app), developer keys, and geo-pin routing enforce regional laws and SLAs.
3) Service mesh (semantics inside)
- mTLS between services, OPA/ABAC policies for who/what can talk, A/B & canary traffic shaping, and microsegmentation across namespaces/tenants.
- Ties directly to DevSecOps (Arch 19) for progressive delivery and to Identity/Keys (Arch 20) for cert/secret rotation.
4) Event backbone (meaning in motion)
- Kafka/Pulsar topics with schema registry and compatibility gates prevent producer/consumer drift.
- Dead-letter and replay patterns ensure resilience; ordering and idempotence baked into contracts.
5) Contracts, catalog, and mediation
- API catalog (versions, owners, SLAs, examples) and event catalog (topics, schemas, retention).
- Transform/mediation services normalize protocols (REST↔gRPC, XML↔JSON), enrich headers, redact fields, and handle protocol upgrades without breaking consumers.
6) Developer & partner experience
- Self-service portal for onboarding, sandbox keys, SDKs, Postman/OCI specs, try-its, and mock servers.
- Automated legal & technical gates (terms, scopes, data use) before production promotion.
7) Security, governance, and evidence
- IdP/MFA for consoles, PAM/JIT for privileged changes, HSM/KMS/PKI for key custody and signing webhooks.
- SIEM/SOAR correlates abuse, exfil, or spikes; playbooks throttle/revoke tokens, block IP ranges, open ITSM incidents.
- GRC/WORM stores contracts, policy versions, consent artifacts, and request/response samples for audits.
8) Resilience patterns
- Multi-region API edges, active/active brokers, global DNS/Anycast; graceful degradation (cached responses, stubbed transforms) during upstream incidents.
- Back-pressure and circuit-breaker protections guard downstream domains.
9) Telemetry, cost, and SLA proof
- AIOps watches RED/USE metrics (Rate/Errors/Duration; Utilization/Saturation/Errors), tail latencies (p95/p99), consumer lag, and saturation.
- FinOps tracks cost per call/topic and partner billback; SLA dashboards expose uptime/latency/error budgets.
Reference KPIs
- Edge p95 latency (sync APIs): <150 ms (regional)
- Availability (edges/brokers): ≥99.99%
- Schema/contract violations: 0 in production (gated)
- Consumer lag (events): p99 < 1 s (business-critical streams)
- Security incidents MTTR (API abuse/exfil): ≤1 h
Minimal BOM (aligned with prior stack)
Edge API gateway + WAF/API-security, OAuth2/OIDC, mTLS/PKI, Service mesh (mTLS + OPA), Kafka/Pulsar + schema registry, Transform/mediation services, Developer/partner portal, Tokenization/DLP, HSM/KMS, SIEM/SOAR, AIOps/Observability, ITSM/CMDB, GRC/WORM, FinOps cost hooks.