Author → run → evolve loop

OpenExpertise's continuous improvement cycle — author a workflow, run it for real, let the advisor propose what to fix.

When you need this

You have a first-pass experience (hand-written or generated by oe ultra) and want to improve it systematically.
A run revealed coverage gaps or performance issues and you want data-driven proposals rather than guessing.
You want to understand when to iterate via oe evolve versus when to re-author with oe ultra or hand-edit.
You are setting up a recurring SOP that improves with each execution.

The minimal example

bash

# 1. Author (one-shot LLM authoring)
oe ultra "Review PRs against our security checklist"
mv .openexpertise/drafts/security-pr-review examples/security-pr-review

# 2. Run
oe run examples/security-pr-review
# → run_id: abc123   status: partial (two nodes skipped)

# 3. Evolve
oe evolve examples/security-pr-review --run-id abc123
oe diff examples/security-pr-review
# → proposal: add 'injection' dimension  (confidence: high)
# → proposal: tune retry to 3 attempts  (confidence: medium)

# 4. Apply what you agree with
awk '/^```diff/{found=1; next} found && /^```/{found=0; next} found' \
  examples/security-pr-review/.openexpertise/evolution/abc123.md | git apply

# 5. Re-run
oe run examples/security-pr-review

Repeat from step 2.

How it works

Author — oe ultra (or hand-authoring) produces a first draft. Generated tool stubs have // TODO: markers; you fill them in before the first meaningful run. The draft is validated by oe validate before you do anything else.

Run — oe run dispatches the graph. The scheduler writes every state mutation to SQLite and every event to a JSONL run log. A partial status means at least one node was skipped (due to a failure or when: condition). A failed status means a fail_run policy fired.

Evolve — oe evolve reads the run log and the state history and sends them to the evolution advisor (an LLM with a structured-output tool). The advisor proposes up to 5 edits: new nodes, tuned parameters, or added dataset cases. It cannot propose removals, edge changes, or schema changes — only additive edits. Proposals land as a Markdown file in .openexpertise/evolution/.

Apply selectively — you review each proposal's rationale and confidence. You extract and git apply the diff blocks you agree with. You ignore or delete the rest. The runtime never auto-applies anything.

When to evolve vs hand-edit:

Situation	Recommended action
Missing coverage area noticed in a real run	`oe evolve` — the advisor will see it in the state diff
You know exactly what to change	Hand-edit `experience.yaml` directly
The prompt needs a complete rewrite	Hand-edit the `.md` file; `oe evolve` proposes parameter tweaks, not prompt rewrites
First draft is structurally wrong	`oe ultra` again with a better task description
Retry policy too aggressive or too lenient	`oe evolve` — `tune-param` proposals handle this well

When not to evolve: if a run produced no useful state (all nodes skipped, fixture data only, or the state diff is empty), the advisor has nothing to work from. Fix the structural issue first, get a real run, then evolve.

Variations

Run → evolve → re-run in a tight cycle:

bash

for i in 1 2 3; do
  RUN=$(oe run examples/my-experience --json | jq -r .run_id)
  oe evolve examples/my-experience --run-id $RUN
  oe diff examples/my-experience
  # review proposals manually, apply what makes sense
done

Use the MCP server to run the full loop from inside Claude Code:

Run examples/review-branch, then evolve it and show me the proposals.

Claude Code calls oe_run, then oe_evolve, then reads the proposal file.

Accumulate state across runs — fields with merge: array_append collect findings from every run into the same SQLite table. The evolution advisor sees the growing history in the state_diff.

Evolve across several runs — once you have a handful of real runs, pass them together so the advisor only proposes changes the data corroborates more than once:

bash

oe evolve --experience examples/my-experience --runs abc123,def456,ghi789
# → STABLE patterns (recur in ≥2 runs) ranked high/medium; one-off blips dropped or low

See the advisor's cross-run analysis.

Promote a stable experience to the examples/ directory:

bash

# Once the experience is stable enough to be a reference
cp -r .openexpertise/drafts/my-experience examples/my-experience
git add examples/my-experience
git commit -m "feat(examples): add my-experience"

Gotchas

oe evolve is not a replacement for oe ultra. Evolve proposes additive edits to an existing experience. If the experience needs structural redesign (different phases, different node kinds), re-author with oe ultra or hand-edit.
Empty runs produce low-quality proposals. If your experience ran with fixture data and all nodes returned stubbed outputs, the advisor has nothing real to learn from. Run with real inputs first.
Applied proposals must pass oe validate. Always run oe validate examples/my-experience after applying a patch. A malformed patch can produce invalid YAML.
The loop has no automatic stopping condition. There is no "done" signal. Run → evolve until the experience behaves reliably on real inputs and new proposals are consistently low-confidence.

Author → run → evolve loop ​

When you need this ​

The minimal example ​

How it works ​

Variations ​

Gotchas ​

See also ​

Author → run → evolve loop

When you need this

The minimal example

How it works

Variations

Gotchas

See also