Skip to content

OpenExpertiseAI-era Makefile

Codify expert workflows as version-controlled YAML graphs. Run them with deterministic flow + LLM-powered nodes. Let the LLM evolve the graph after each run.

OpenExpertise
292 tests15 packages12 examplesMIT licensedNode 20 / 22 / 24live-API verified

What is it, really? โ€‹

OpenExpertise is the orchestration layer above Claude Code, Codex, and Gemini.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Claude Code / Codex / Gemini      โ”‚    โ”‚   OpenExpertise                       โ”‚
โ”‚   "AI bash"                         โ”‚    โ”‚   "AI Makefile"                       โ”‚
โ”‚                                     โ”‚ vs โ”‚                                       โ”‚
โ”‚   - improvised each run             โ”‚    โ”‚   - same DAG every run                โ”‚
โ”‚   - opaque trajectory               โ”‚    โ”‚   - JSONL event log + SQLite state    โ”‚
โ”‚   - one-shot, no memory             โ”‚    โ”‚   - evolves itself across runs        โ”‚
โ”‚   - general-purpose                 โ”‚    โ”‚   - codifies a specific SOP           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

         autonomous worker                            workflow conductor
                                                     (can call the workers)

It is not an autonomous agent. It is the orchestration layer that lets you wire deterministic code, LLM agents, and CLI agents into reproducible, persistent, self-improving pipelines.

If your team has a SOP that someone has to follow every Monday morning โ€” code review, incident triage, release gates, compliance check, customer onboarding โ€” and you want it to run the same way every time, leave a trail, and get better at it โ€” this is for you.


The 60-second story โ€‹

bash
export ANTHROPIC_API_KEY=sk-...
oe run examples/review-branch --tui

Three reviewers (bugs / perf / tests) fan out over a Python diff. They find a missing null-check, a missing test, and an unclosed cursor โ€” but they miss the SQL injection.

โ“˜ run-2026-05-26-a1b2c3 finished
  findings: 3 issues
  risk_score: 0.30

Now ask the evolution advisor what's missing:

bash
oe evolve run-2026-05-26-a1b2c3
# โ†’ wrote .openexpertise/evolution/run-2026-05-26-a1b2c3.md
#   proposal: "Add `security` dimension โ€” default reviewers focus on
#              logic/tests; injection bugs need a dedicated reviewer."

Apply the one-line YAML patch from the proposal and re-run:

โ“˜ run-2026-05-26-d4e5f6 finished
  findings: 4 issues (+ SQL injection in /users/<id>)
  risk_score: 0.85

The experience improved itself. Author โ†’ run โ†’ evolve, all driven by the same LLM provider.

โ†’ Full walkthrough: examples/review-branch.


Three rival AI coding CLIs talking to each other, in one graph โ€‹

No other workflow framework does this today.

yaml
graph:
  nodes:
    - {
        id: summarize,
        kind: cli-agent,
        provider: claude-code,
        prompt: 'Summarize this topic in one sentence: {{topic}}',
        writes: [summary],
      }
    - {
        id: critique,
        kind: cli-agent,
        provider: codex,
        prompt: 'What does this summary miss? {{summary}}',
        reads: [summary],
        writes: [critique],
      }
    - {
        id: verdict,
        kind: cli-agent,
        provider: gemini,
        prompt: 'Verdict on production-readiness given: {{summary}} + {{critique}}',
        reads: [summary, critique],
        writes: [verdict],
      }
  edges:
    - { from: summarize, to: critique }
    - { from: critique, to: verdict }

One DAG, three vendors, shared SQLite state, replayable event log. 37s real wall time, three CLIs, one trace.

โ†’ See it in action: examples/tri-cli-orchestration.


Pick the example closest to your use case โ€‹

ExampleWhat it showsFeaturing
hello-toolSmallest possible flowtool
agent-echoSingle LLM agent with structured outputagent
dataset-aggregateLoad CSV โ†’ aggregatedataset + tool
review-branch โ˜…The hero demo โ€” multi-dim review + verifier + score + evolutiontool + agent ร—3
oncall-runbookFan out an investigation across 3 dimensionsfor_each
issue-triageClassify โ†’ search dupes โ†’ conditional dedup โ†’ routewhen: edges
release-gatesLicense + changelog + coverage + Claude-Code security scan โ†’ gatetool + cli-agent + agent
cli-orchestrationClaude Code summarizes; Codex critiquescli-agent ร—2
tri-cli-orchestration โ˜…Claude โ†’ Codex โ†’ Gemini in one DAGcli-agent ร—3
deep-researchMulti-source research with cross-referencingagent fan-in
systematic-debuggingHypothesize โ†’ localize โ†’ fix โ†’ verify looptool + agent
brainstormingDiverge โ†’ cluster โ†’ critique โ†’ synthesize top 3cli-agent fan-out + agent

All 12 examples ship with mocked-LLM e2e tests so the structure is verifiable without API keys.


When should I reach for OpenExpertise? โ€‹

Are you trying to ...

automate a recurring, multi-step process that mixes deterministic logic + LLM judgment? โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” YES NO โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ” Use Claude Code โ”‚ โ”‚ or Codex directly. Need it durable, One-shot reproducible, exploration? evolvable? โ”‚ โ”‚ โ””โ”€ Use Claude Code. โ””โ”€ โ–ถ Use OpenExpertise.

If you want a chat-based assistant or one-off task automation, use the underlying CLI directly (Claude Code, Codex, Gemini). OpenExpertise sits above those tools, not beside them.

โ†’ Compare in detail: vs the alternatives.


Build expert workflows once. Run them forever. Watch them get better at it.

Start building โ†’ ย ยทย  Copy a recipe โ†’ ย ยทย  Got a question?

Released under the MIT License.