Author → run → evolve loop
OpenExpertise's continuous improvement cycle — author a workflow, run it for real, let the advisor propose what to fix.
When you need this
- You have a first-pass experience (hand-written or generated by
oe ultra) and want to improve it systematically. - A run revealed coverage gaps or performance issues and you want data-driven proposals rather than guessing.
- You want to understand when to iterate via
oe evolveversus when to re-author withoe ultraor hand-edit. - You are setting up a recurring SOP that improves with each execution.
The minimal example
# 1. Author (one-shot LLM authoring)
oe ultra "Review PRs against our security checklist"
mv .openexpertise/drafts/security-pr-review examples/security-pr-review
# 2. Run
oe run examples/security-pr-review
# → run_id: abc123 status: partial (two nodes skipped)
# 3. Evolve
oe evolve examples/security-pr-review --run-id abc123
oe diff examples/security-pr-review
# → proposal: add 'injection' dimension (confidence: high)
# → proposal: tune retry to 3 attempts (confidence: medium)
# 4. Apply what you agree with
awk '/^```diff/{found=1; next} found && /^```/{found=0; next} found' \
examples/security-pr-review/.openexpertise/evolution/abc123.md | git apply
# 5. Re-run
oe run examples/security-pr-reviewRepeat from step 2.
How it works
Author — oe ultra (or hand-authoring) produces a first draft. Generated tool stubs have // TODO: markers; you fill them in before the first meaningful run. The draft is validated by oe validate before you do anything else.
Run — oe run dispatches the graph. The scheduler writes every state mutation to SQLite and every event to a JSONL run log. A partial status means at least one node was skipped (due to a failure or when: condition). A failed status means a fail_run policy fired.
Evolve — oe evolve reads the run log and the state history and sends them to the evolution advisor (an LLM with a structured-output tool). The advisor proposes up to 5 edits: new nodes, tuned parameters, or added dataset cases. It cannot propose removals, edge changes, or schema changes — only additive edits. Proposals land as a Markdown file in .openexpertise/evolution/.
Apply selectively — you review each proposal's rationale and confidence. You extract and git apply the diff blocks you agree with. You ignore or delete the rest. The runtime never auto-applies anything.
When to evolve vs hand-edit:
| Situation | Recommended action |
|---|---|
| Missing coverage area noticed in a real run | oe evolve — the advisor will see it in the state diff |
| You know exactly what to change | Hand-edit experience.yaml directly |
| The prompt needs a complete rewrite | Hand-edit the .md file; oe evolve proposes parameter tweaks, not prompt rewrites |
| First draft is structurally wrong | oe ultra again with a better task description |
| Retry policy too aggressive or too lenient | oe evolve — tune-param proposals handle this well |
When not to evolve: if a run produced no useful state (all nodes skipped, fixture data only, or the state diff is empty), the advisor has nothing to work from. Fix the structural issue first, get a real run, then evolve.
Variations
Run → evolve → re-run in a tight cycle:
for i in 1 2 3; do
RUN=$(oe run examples/my-experience --json | jq -r .run_id)
oe evolve examples/my-experience --run-id $RUN
oe diff examples/my-experience
# review proposals manually, apply what makes sense
doneUse the MCP server to run the full loop from inside Claude Code:
Run examples/review-branch, then evolve it and show me the proposals.Claude Code calls oe_run, then oe_evolve, then reads the proposal file.
Accumulate state across runs — fields with merge: array_append collect findings from every run into the same SQLite table. The evolution advisor sees the growing history in the state_diff.
Promote a stable experience to the examples/ directory:
# Once the experience is stable enough to be a reference
cp -r .openexpertise/drafts/my-experience examples/my-experience
git add examples/my-experience
git commit -m "feat(examples): add my-experience"Gotchas
oe evolveis not a replacement foroe ultra. Evolve proposes additive edits to an existing experience. If the experience needs structural redesign (different phases, different node kinds), re-author withoe ultraor hand-edit.- Empty runs produce low-quality proposals. If your experience ran with fixture data and all nodes returned stubbed outputs, the advisor has nothing real to learn from. Run with real inputs first.
- Applied proposals must pass
oe validate. Always runoe validate examples/my-experienceafter applying a patch. A malformed patch can produce invalid YAML. - The loop has no automatic stopping condition. There is no "done" signal. Run → evolve until the experience behaves reliably on real inputs and new proposals are consistently low-confidence.
See also
- oe ultra — LLM authors for you
- The evolution advisor
- Applying proposals
- Evolution loop concept
- MCP server — run the loop from inside Claude Code