Skip to content

Author → run → evolve loop

OpenExpertise's continuous improvement cycle — author a workflow, run it for real, let the advisor propose what to fix.

When you need this

  • You have a first-pass experience (hand-written or generated by oe ultra) and want to improve it systematically.
  • A run revealed coverage gaps or performance issues and you want data-driven proposals rather than guessing.
  • You want to understand when to iterate via oe evolve versus when to re-author with oe ultra or hand-edit.
  • You are setting up a recurring SOP that improves with each execution.

The minimal example

bash
# 1. Author (one-shot LLM authoring)
oe ultra "Review PRs against our security checklist"
mv .openexpertise/drafts/security-pr-review examples/security-pr-review

# 2. Run
oe run examples/security-pr-review
# → run_id: abc123   status: partial (two nodes skipped)

# 3. Evolve
oe evolve examples/security-pr-review --run-id abc123
oe diff examples/security-pr-review
# → proposal: add 'injection' dimension  (confidence: high)
# → proposal: tune retry to 3 attempts  (confidence: medium)

# 4. Apply what you agree with
awk '/^```diff/{found=1; next} found && /^```/{found=0; next} found' \
  examples/security-pr-review/.openexpertise/evolution/abc123.md | git apply

# 5. Re-run
oe run examples/security-pr-review

Repeat from step 2.

How it works

Authoroe ultra (or hand-authoring) produces a first draft. Generated tool stubs have // TODO: markers; you fill them in before the first meaningful run. The draft is validated by oe validate before you do anything else.

Runoe run dispatches the graph. The scheduler writes every state mutation to SQLite and every event to a JSONL run log. A partial status means at least one node was skipped (due to a failure or when: condition). A failed status means a fail_run policy fired.

Evolveoe evolve reads the run log and the state history and sends them to the evolution advisor (an LLM with a structured-output tool). The advisor proposes up to 5 edits: new nodes, tuned parameters, or added dataset cases. It cannot propose removals, edge changes, or schema changes — only additive edits. Proposals land as a Markdown file in .openexpertise/evolution/.

Apply selectively — you review each proposal's rationale and confidence. You extract and git apply the diff blocks you agree with. You ignore or delete the rest. The runtime never auto-applies anything.

When to evolve vs hand-edit:

SituationRecommended action
Missing coverage area noticed in a real runoe evolve — the advisor will see it in the state diff
You know exactly what to changeHand-edit experience.yaml directly
The prompt needs a complete rewriteHand-edit the .md file; oe evolve proposes parameter tweaks, not prompt rewrites
First draft is structurally wrongoe ultra again with a better task description
Retry policy too aggressive or too lenientoe evolvetune-param proposals handle this well

When not to evolve: if a run produced no useful state (all nodes skipped, fixture data only, or the state diff is empty), the advisor has nothing to work from. Fix the structural issue first, get a real run, then evolve.

Variations

Run → evolve → re-run in a tight cycle:

bash
for i in 1 2 3; do
  RUN=$(oe run examples/my-experience --json | jq -r .run_id)
  oe evolve examples/my-experience --run-id $RUN
  oe diff examples/my-experience
  # review proposals manually, apply what makes sense
done

Use the MCP server to run the full loop from inside Claude Code:

Run examples/review-branch, then evolve it and show me the proposals.

Claude Code calls oe_run, then oe_evolve, then reads the proposal file.

Accumulate state across runs — fields with merge: array_append collect findings from every run into the same SQLite table. The evolution advisor sees the growing history in the state_diff.

Promote a stable experience to the examples/ directory:

bash
# Once the experience is stable enough to be a reference
cp -r .openexpertise/drafts/my-experience examples/my-experience
git add examples/my-experience
git commit -m "feat(examples): add my-experience"

Gotchas

  • oe evolve is not a replacement for oe ultra. Evolve proposes additive edits to an existing experience. If the experience needs structural redesign (different phases, different node kinds), re-author with oe ultra or hand-edit.
  • Empty runs produce low-quality proposals. If your experience ran with fixture data and all nodes returned stubbed outputs, the advisor has nothing real to learn from. Run with real inputs first.
  • Applied proposals must pass oe validate. Always run oe validate examples/my-experience after applying a patch. A malformed patch can produce invalid YAML.
  • The loop has no automatic stopping condition. There is no "done" signal. Run → evolve until the experience behaves reliably on real inputs and new proposals are consistently low-confidence.

See also

Released under the MIT License.