1. Onboarding Runbook (Deploying a New Embassy/Base Site)
Objective: Bring a new government facility into the secure global WAN, ensuring sovereignty and compliance.
Step Sequence:
- Pre-Validation
- Confirm DIA/MPLS circuits are provisioned (carrier SLAs).
- Validate satellite fallback availability (LEO/MEO/GEO).
- Assign ASN/VRF and IP ranges.
- Edge Installation
- Ship SD-WAN appliance (ZTP ready).
- Onsite staff connects power/fiber; LTE stick as bootstrap.
- Zero-Touch Provisioning (ZTP)
- Device phones home to controller over LTE.
- Configuration template applied (policies, ZTNA, VRF segmentation).
- Security Enrollment
- Site joins PKI (certificates loaded via HSM/KMS).
- Agents register with SIEM/SOAR, logging started.
- Functional Tests
- Run ping/traceroute to HQ, cloud, and sovereign apps.
- Simulate ZTNA logins for each role (diplomat, IT admin, contractor).
- Handover
- Site marked “Production” in CMDB/ITSM.
- NOC monitoring thresholds enabled.
Logos mapping:
- Syntax: Circuits + SD-WAN edges joined.
- Semantics: Security and apps defined.
- Pragmatics: Operations/monitoring established.
2. Failover Runbook (Primary DIA/MPLS Loss)
Objective: Maintain service if terrestrial primary link fails.
Step Sequence:
- Detection
- AIOps alarms on packet loss/jitter > threshold.
- SD-WAN reports circuit down.
- Automatic Failover
- SD-WAN reroutes traffic via satellite (VSAT) or LTE backup.
- Critical apps prioritized (voice, defense apps); crew Wi-Fi deprioritized.
- Validation
- Synthetic probes confirm apps reachable (HQ, gov cloud).
- MOS probe ensures voice acceptable (>3.8).
- Notification
- NOC generates ticket; ITSM notifies agency stakeholders.
- Carrier opened for root cause.
- Recovery
- On primary restoration, SD-WAN shifts traffic back (graceful).
- Report auto-generated.
3. Incident Response Runbook (Cyber Intrusion Attempt)
Objective: Contain and remediate suspected breach.
Step Sequence:
- Alert
- SIEM flags anomalous login or east-west movement.
- SOAR triggers enrichment (geo-IP, device posture, role).
- Containment
- ZTNA revokes token/session.
- SD-WAN policy isolates VRF (suspect enclave).
- NGFW/FWaaS blocks flows.
- Eradication
- SOC deploys EDR to clean host.
- NDR verifies no residual beaconing.
- Recovery
- Re-image device via golden image.
- Identity reset + MFA revalidation.
- Postmortem
- SOAR produces report.
- Update SIEM rules; lessons fed to AIOps.
4. Disaster Recovery Drill Runbook
Objective: Rehearse full site loss (e.g., embassy evacuation, natural disaster).
Step Sequence:
- Scenario Trigger
- Simulate catastrophic fiber/cable cut or physical site evacuation.
- Failover Activation
- All workloads rerouted via alternate site/colo.
- Staff log in via mobile/LTE, with ZTNA enforcing policies.
- Critical Apps Validation
- Secure voice/video bridging up.
- Classified workloads accessible only from secondary sovereign enclave.
- Time-to-Recover Measurement
- Record RTO/RPO achieved.
- Measure MOS/jitter for comms.
- Debrief
- Document lessons, update runbook.
- Adjust DR posture (add capacity, refine failover paths).
Roles & Responsibilities
- NOC: Circuit monitoring, failover, comms.
- SOC: Cybersecurity, incident isolation, logging.
- Field Techs: Physical install/maintenance.
- Agency IT: Access approval, compliance.
- Exec Stakeholders: Approve DR posture, debrief reports.
KPIs (Tracked in Runbooks)
- Onboarding time (site live <7 days from carrier delivery).
- Failover time (<60 sec detection + reroute).
- Incident MTTR (<2 hrs).
- DR drill success rate (≥95%).
- Compliance alignment (ISO, SOC2, NIST, CJIS, GDPR where relevant).
⚖️ Logos Framing:
Each runbook is a grammar of action.
- Onboarding = writing a new word into the sentence of the network.
- Failover = substituting synonyms without breaking syntax.
- Incident response = correcting meaning when semantics drift.
- DR drill = recursive rehearsal to preserve pragmatic truth.