Runbooks — Government & Defense Architecture


1. Onboarding Runbook (Deploying a New Embassy/Base Site)

Objective: Bring a new government facility into the secure global WAN, ensuring sovereignty and compliance.

Step Sequence:

  1. Pre-Validation
    • Confirm DIA/MPLS circuits are provisioned (carrier SLAs).
    • Validate satellite fallback availability (LEO/MEO/GEO).
    • Assign ASN/VRF and IP ranges.
  2. Edge Installation
    • Ship SD-WAN appliance (ZTP ready).
    • Onsite staff connects power/fiber; LTE stick as bootstrap.
  3. Zero-Touch Provisioning (ZTP)
    • Device phones home to controller over LTE.
    • Configuration template applied (policies, ZTNA, VRF segmentation).
  4. Security Enrollment
    • Site joins PKI (certificates loaded via HSM/KMS).
    • Agents register with SIEM/SOAR, logging started.
  5. Functional Tests
    • Run ping/traceroute to HQ, cloud, and sovereign apps.
    • Simulate ZTNA logins for each role (diplomat, IT admin, contractor).
  6. Handover
    • Site marked “Production” in CMDB/ITSM.
    • NOC monitoring thresholds enabled.

Logos mapping:

  • Syntax: Circuits + SD-WAN edges joined.
  • Semantics: Security and apps defined.
  • Pragmatics: Operations/monitoring established.

2. Failover Runbook (Primary DIA/MPLS Loss)

Objective: Maintain service if terrestrial primary link fails.

Step Sequence:

  1. Detection
    • AIOps alarms on packet loss/jitter > threshold.
    • SD-WAN reports circuit down.
  2. Automatic Failover
    • SD-WAN reroutes traffic via satellite (VSAT) or LTE backup.
    • Critical apps prioritized (voice, defense apps); crew Wi-Fi deprioritized.
  3. Validation
    • Synthetic probes confirm apps reachable (HQ, gov cloud).
    • MOS probe ensures voice acceptable (>3.8).
  4. Notification
    • NOC generates ticket; ITSM notifies agency stakeholders.
    • Carrier opened for root cause.
  5. Recovery
    • On primary restoration, SD-WAN shifts traffic back (graceful).
    • Report auto-generated.

3. Incident Response Runbook (Cyber Intrusion Attempt)

Objective: Contain and remediate suspected breach.

Step Sequence:

  1. Alert
    • SIEM flags anomalous login or east-west movement.
    • SOAR triggers enrichment (geo-IP, device posture, role).
  2. Containment
    • ZTNA revokes token/session.
    • SD-WAN policy isolates VRF (suspect enclave).
    • NGFW/FWaaS blocks flows.
  3. Eradication
    • SOC deploys EDR to clean host.
    • NDR verifies no residual beaconing.
  4. Recovery
    • Re-image device via golden image.
    • Identity reset + MFA revalidation.
  5. Postmortem
    • SOAR produces report.
    • Update SIEM rules; lessons fed to AIOps.

4. Disaster Recovery Drill Runbook

Objective: Rehearse full site loss (e.g., embassy evacuation, natural disaster).

Step Sequence:

  1. Scenario Trigger
    • Simulate catastrophic fiber/cable cut or physical site evacuation.
  2. Failover Activation
    • All workloads rerouted via alternate site/colo.
    • Staff log in via mobile/LTE, with ZTNA enforcing policies.
  3. Critical Apps Validation
    • Secure voice/video bridging up.
    • Classified workloads accessible only from secondary sovereign enclave.
  4. Time-to-Recover Measurement
    • Record RTO/RPO achieved.
    • Measure MOS/jitter for comms.
  5. Debrief
    • Document lessons, update runbook.
    • Adjust DR posture (add capacity, refine failover paths).

Roles & Responsibilities

  • NOC: Circuit monitoring, failover, comms.
  • SOC: Cybersecurity, incident isolation, logging.
  • Field Techs: Physical install/maintenance.
  • Agency IT: Access approval, compliance.
  • Exec Stakeholders: Approve DR posture, debrief reports.

KPIs (Tracked in Runbooks)

  • Onboarding time (site live <7 days from carrier delivery).
  • Failover time (<60 sec detection + reroute).
  • Incident MTTR (<2 hrs).
  • DR drill success rate (≥95%).
  • Compliance alignment (ISO, SOC2, NIST, CJIS, GDPR where relevant).

⚖️ Logos Framing:
Each runbook is a grammar of action.

  • Onboarding = writing a new word into the sentence of the network.
  • Failover = substituting synonyms without breaking syntax.
  • Incident response = correcting meaning when semantics drift.
  • DR drill = recursive rehearsal to preserve pragmatic truth.