Deployment Runbook
Operational guidance for platform engineers and SREs taking ownership of a self-hosted PhronEdge deployment. This document covers the structure of a production deployment, the phases of rollout, and ongoing operational cadence.
Detailed operational playbooks including specific monitoring thresholds, DR procedures, and incident response workflows are available to registered enterprise customers with an active support contract.
Rollout phases
Every production PhronEdge deployment follows four phases.
Phase 1: Evaluation (Day 1)
Goal: Confirm the binary works on your infrastructure before wiring production KMS or database.
Start with development defaults (SIGNER_BACKEND=dev_file, local Postgres). Sign a sample policy through the CLI. Run a governed tool call. Verify the chain integrity. Exit this phase with a working end-to-end smoke test.
Duration: one afternoon for a standard deployment.
Phase 2: KMS integration (Day 2)
Goal: Move private key operations into your HSM.
Switch SIGNER_BACKEND to your cloud KMS (AWS, GCP, or Azure). Sign a new policy and verify the signature was computed in your KMS. Test key rotation through the Console. Verify previously signed policies still validate under the archived key.
Exit criteria: all policy signatures are computed inside your HSM. Independent verification works with only the public key.
Phase 3: Database integration (Day 3)
Goal: All governance state lives in your database.
Provision Postgres. Switch VAULT_BACKEND=postgres. Tables are created automatically on first connection. Sign a policy and confirm credentials persist. Configure your standard backup and encryption-at-rest.
Exit criteria: governance data is under your retention policy, your backup schedule, and your access controls.
Phase 4: Production hardening (Week 2)
Goal: Production-ready with full operational ownership.
This phase includes:
- Penetration testing against the gateway
- Load testing at expected traffic
- Monitoring and alerting wire-up
- DR runbook documentation
- Incident response training for your SOC team
- Periodic chain verification scheduling
Specific monitoring thresholds, load test targets, and incident response procedures are provided to registered enterprise customers as part of onboarding.
Ongoing operational cadence
A mature PhronEdge deployment follows a standard operational rhythm.
Daily
- Review Observer dashboard for anomalies
- Automated chain verification
- Review quarantine events from the previous period
Weekly
- Review block rate trends by checkpoint
- Review PII and prompt injection detection rates
- Review agent registrations against change management
Monthly
- Rotate API keys used in CI/CD
- Review team permissions in Console Settings
- Run ungoverned tool scan against all agent repositories
- Validate backup restore with a tabletop exercise
Quarterly
- Rotate signing keys per your key rotation policy
- Archive audit chain per retention requirements
- Review and update signed policy
- Validate DR procedures with a tabletop exercise
Annually
- Full penetration test
- Full audit of audit chain over retention window
- Review threat model against current regulatory landscape
- Re-run deployment checklist against a fresh environment
Disaster recovery
Production deployments require documented DR procedures for four scenarios:
- 1.KMS outage during policy sign - Cached credentials continue governing. New policies cannot be signed until KMS recovers.
- 2.Database corruption - Restore from latest backup. Re-verify chain integrity. Investigate any detected gap.
- 3.Key compromise suspected - Rotate the signing key immediately. Audit KMS logs for unexpected sign operations.
- 4.Gateway outage - Agents with valid cached credentials continue governing for the cache TTL. After expiry, calls fail closed.
Detailed recovery procedures, specific RTO/RPO targets, and communication templates for each scenario are provided to registered enterprise customers as part of onboarding.
Incident response
Three procedures every SOC team needs to know.
Quarantine. Immediate, reversible suspension of an agent. Use when a compromise is suspected and investigation is pending. Available in Console Observer or via CLI.
Reinstate. Restore a quarantined agent. Use when investigation concludes the incident was a false positive or has been remediated.
Kill. Permanent, irreversible termination. Console-only with a confirmation prompt. Use when a breach is confirmed, a regulator has mandated shutdown, or the agent has been malicious.
All three operations anchor events to the audit chain with the initiator identity and reason.
Monitoring
A PhronEdge deployment should be monitored for:
- Gateway latency
- Requests allowed and blocked per unit time
- Block rate by checkpoint
- PII and prompt injection detection rates
- Any tamper events
- Chain verification results
- KMS latency (dependency health)
- Database connection pool utilization
Critical alerts should fire on: chain integrity failure, vault tamper events, kill switch activation, and sustained latency degradation.
Specific thresholds, alert configurations, and dashboard templates are provided to registered enterprise customers.
Upgrade procedure
PhronEdge releases follow semantic versioning.
- Minor upgrades (2.4 to 2.5) are backward compatible. Standard canary-then-rollout procedure.
- Major upgrades (2.x to 3.x) may require policy re-signing. Follow the release-specific migration guide.
After any upgrade, run chain verification and confirm integrity. Keep the previous container image available for rollback.
Compliance verification cadence
For regulated deployments:
| Action | Frequency | Output |
|---|---|---|
| Chain verification | Daily (automated) | Integrity receipt |
| Full audit export | Per regulator cadence | Signed audit pack |
| Policy re-sign | On change | Policy amendment event |
| Key rotation | Per your key rotation policy | Key rotation event |
| Independent verification | Per audit | External validation script output |
Every output is cryptographically signed. Every output is independently verifiable with only your public key.
Support
Enterprise support channels are documented in your support contract. For SaaS deployments, use the Console support surface at phronedge.com/brain.
When opening a support ticket, include:
- Tenant ID
- Approximate timestamp of the issue
- Relevant event IDs from the audit chain
- Output of health verification commands
Do not include:
- Your API key
- Business data
- Customer PII
Next steps
- Enterprise deployment. Configuration reference
- Compliance matrix. Regulation to control mapping
- Threat model. What PhronEdge protects against
- Console guide. UI for CISOs and platform teams
- CLI reference. Operational commands