oncall-runbook
When an incident fires, run a structured triage: fetch the incident, investigate across three independent angles in parallel (for_each fan-out), rank the findings, and produce a one-page summary for the oncall engineer.
What it demonstrates
tool → agentchaining across two phasesfor_eachfan-out on theinvestigatenode (one agent call per dimension)array_appendmerge strategy for accumulating findings from parallel iterations- Sequential synthesis: ranked findings → prose summary
- Multi-LLM compatibility (
ANTHROPIC_API_KEYorOPENAI_API_KEY)
The graph
┌───────────────┐ ┌────────────────┐
│ fetch_incident│ │ seed_dimensions│
└──────┬────────┘ └───────┬────────┘
│ triage phase │
└────────────────────┘
│
▼
┌─────────────────┐
│ investigate │ ← for_each over dimensions (×3)
└────────┬────────┘
│ synthesis phase
▼
┌──────────┐
│ prioritize│
└─────┬────┘
▼
┌────────┐
│ summary│
└────────┘Phases: triage → synthesis.
State schema
| Field | Type | Merge | Description |
|---|---|---|---|
incident | object | — | Raw incident payload |
dimensions | array<object> | — | Investigation angles seeded by list_dimensions.mjs |
findings | array<object> | array_append | Accumulated from each investigate iteration |
prioritized_findings | array<object> | — | Ranked by the prioritize agent |
summary | string | — | Oncall-facing one-pager |
How it runs
export ANTHROPIC_API_KEY=sk-... # or OPENAI_API_KEY=...
oe run examples/oncall-runbook --tuiNo special CLIs required. The --tui flag shows a live dashboard of node statuses and elapsed time.
What happens
- triage phase —
fetch_incidentloadsfixtures/incident.jsoninto state.seed_dimensionswrites three investigation angles:service_health,infra_layer,dependency_failures. - fan-out —
investigateruns once per dimension (three concurrent-ish calls in V1's sequential scheduler). Each call readsincidentand the current$itemdimension, returns structuredfindings[]. Thearray_appendmerge concatenates all three result arrays into one flatfindingslist. - synthesis phase —
prioritizereads all findings + the incident and returns a rankedprioritized_findings[].summaryreads the incident + ranked findings and writes a human-readablesummarystring.
Sample final state:
$ oe state summary
[oncall one-pager prose...]
$ oe state prioritized_findings | head -5
[{"title": "Payment service 500 rate 12%", "priority": "P1"}, ...]Fixtures
fixtures/incident.json contains a sample PagerDuty-style incident payload. Replace it with a real incident to triage a live event.
Try it: variations
1. Add a fourth dimension. Edit list_dimensions.mjs to return a fourth object, e.g. { key: "data_layer", focus: "database slow queries" }. The for_each fan-out automatically picks it up — no graph changes needed.
2. Point at a real PagerDuty API. Replace fetch_incident.mjs with a fetch call to the PagerDuty REST API (using the incident ID from --args). The rest of the graph is unchanged.
3. Add a Slack-post tool node. Append a post_summary tool node after summary that POSTs summary to a Slack webhook. Chain { from: summary, to: post_summary }.