FAQ + Troubleshooting

The questions and errors that come up first. If you don't find it here, search the site (/) — every page is indexed.

Concept questions

Is this an agent framework?

No. OpenExpertise is the orchestration layer above agent frameworks. The graph is deterministic YAML; the LLM only fills in the gaps inside nodes (a single agent's structured response, or a single skill's reply). The graph never changes at runtime.

If you want an autonomous agent that explores freely, use Claude Code, Codex, or Gemini directly. If you want to wrap one of those agents in a reproducible workflow, use OE. See vs Claude Code directly for the side-by-side.

How is this different from LangGraph / CrewAI / Mastra?

The full matrix is at Compare. Short version:

vs LangGraph — they're Python and code-as-graph; we're YAML-first and dispatcher-modular. You define the same DAG, but ours lives in source-control as a single artifact that humans can review.
vs CrewAI — they're agent-conversation-centric; we're graph-execution-centric. Less "agents talking", more "DAG executes with LLM inside specific nodes".
vs Mastra — closest spiritual neighbor. Different design: we treat the graph + state + events as the durable substrate; agent-internals are pluggable dispatchers.
vs Inngest / Temporal — they're durable execution backends without LLM batteries-included; we ship LLM dispatchers + 429 retry + structured output + evolution loop on top of a similar primitive.

Why YAML and not Python / TypeScript / JSX?

Three reasons:

Git-reviewable. A YAML graph is a flat artifact. Five reviewers can look at the same experience.yaml and see the whole shape.
Tool-friendly. Schema-validated, LLM-authorable (oe ultra writes them), diffable, type-narrowed in editors with the published JSON schema.
Single source of truth. The same file feeds the validator, the runtime, the TUI, and the evolution advisor. Nothing is hidden behind a Python decorator.

If you genuinely want a code-first DAG, see the programmatic runExperience API — but most production users prefer YAML once they've tried both.

What's the difference between a `tool` and a `cli-agent`?

`tool`	`cli-agent`
Pure JS function. Deterministic. You write the .mjs.	Subprocess that spawns `claude` / `codex` / `gemini`. Non-deterministic.
Synchronous-ish (returns when the function returns).	Async — takes seconds to minutes per invocation.
No tokens, no metrics, no LLM dependency.	Tokens spent inside the CLI's session. Not always exposed.
Use for: HTTP fetch, file IO, parsing, deterministic logic, classification.	Use for: anything where you want a different vendor's CLI to do agentic work.

What's the difference between an `agent` and a `skill`?

Both are LLM calls. Difference is where the prompt lives:

agent — inline prompt path + inline JSON schema in experience.yaml. Optimized for one specific structured output you control fully.
skill — points at a SKILL.md package (frontmatter + body, à la Anthropic skills). Optimized for reusable, redistributable capabilities. The SKILL.md travels independent of the experience.

If you'd reuse the same prompt + schema across multiple experiences, package it as a skill. Otherwise inline as an agent.

Does it work without an LLM?

Yes for tool and dataset nodes. You can build a deterministic-only workflow (the hello-tool and dataset-aggregate examples don't touch an LLM).

For agent / skill / cli-agent nodes you need a configured provider (ANTHROPIC_API_KEY, or OPENAI_API_KEY + OPENAI_BASE_URL, or the CLI installed locally). The 12 examples ship with mocked-LLM e2e tests that exercise the structure without burning real tokens.

Can I run experiences offline?

Yes — point OPENAI_BASE_URL at a local vLLM / Ollama / LM Studio server. See Self-hosted LLMs. tool / dataset nodes don't need any model.

How good is `oe ultra`'s output — does it just one-shot the YAML?

No — oe ultra runs an internal critique→revise quality loop (default 1 round). After the first draft, a critic scores it on decomposition + prompt quality, deterministic validation/preflight errors feed an incremental reviser, and it keeps the best-scoring round. There's a monotonicity gate, so the result is never worse than the one-shot. Tune it with --max-rounds <n> (0 disables the loop), OE_ULTRA_SCORE_BAR (default 80), and OE_ULTRA_CRITIC_MODEL. Already have a draft? oe ultra-revise <draftPath> "<feedback>" applies natural-language feedback using the same critique→revise roles. See oe ultra.

oe graph [path] renders the experience's DAG as a Mermaid flowchart (phase subgraphs, per-kind shapes/colors, for_each + when edge labels) straight to stdout — paste it into a GitHub README and it renders natively. Add --html for a self-contained page. For a finished run, oe inspect <run-id> --html writes a self-contained HTML report: the DAG colored by each node's status plus an events timeline and per-node tokens & duration. Both take -o/--out <file> and --lr. See Visualize & report.

Do I get autocomplete for `experience.yaml`?

Yes. oe init scaffolds an experience.schema.json and a # yaml-language-server: $schema=./experience.schema.json header into new experiences, so VS Code (or any yaml-language-server editor) gives autocomplete, hover docs, and inline validation. For existing projects: oe schema --write plus the same header. See Editor support.

Will my workflow change between runs?

No. The graph is fixed by your YAML. The LLM only fills in nodes' content (findings, summary, etc.). The shape is stable, the path is stable, the state schema is stable.

If you want to change the graph, that's oe evolve — but it writes a markdown file with a unified diff, and you git apply it consciously. Nothing auto-applies.

Setup questions

What Node version?

20.x, 22.x, or 24.x. We CI on all three. 18 will work but isn't tested.

Why `pnpm` specifically?

@openexpertise/* ships as a pnpm-workspace monorepo with internal workspace:* links. pnpm install resolves them transparently. Downstream consumers installing from npm can use npm / yarn, but for developing inside the repo, pnpm is required.

If pnpm install fails with ERR_PNPM_UNSUPPORTED_ENGINE, you need pnpm@9+. npm i -g pnpm@9.

Why does `better-sqlite3` install say it's compiling native code?

better-sqlite3 is a native module — first install on a new machine rebuilds it for your platform. If you see a build error here, you're missing build tools:

macOS: xcode-select --install
Ubuntu: sudo apt install build-essential python3
Windows: install Visual Studio Build Tools (the official node-gyp README has the recipe)

Subsequent installs are cached.

Where does state live?

<cwd>/.openexpertise/state.sqlite (one DB per workspace). Events live at <cwd>/.openexpertise/runs/<run-id>.jsonl. Per-run cache at <cwd>/.openexpertise/cache/<key>.json.

All three are git-ignorable by default. If you want a different location, see the OE_DATA_DIR env var documented in Architecture.

How do I run only one node from a graph?

You don't directly — but oe resume <run-id> will skip every cached step and only re-execute the ones that are dirty or downstream of dirty. Combine with oe reset-state <field> to mark a field stale and force re-execution.

Common errors

`Error: Node "X" has unknown kind "xyz"`

Your YAML has kind: xyz, which isn't one of tool / agent / skill / dataset / experience / cli-agent. Check spelling. The 6 valid kinds are documented at Node kinds.

`Error: cycle detected in graph: A → B → A`

oe validate will tell you which edges form the cycle. Common cause: copy-paste mistake creating a back-edge. Use oe inspect <run-id> after a clean run to see the actual ordering.

`Error: schema validation failed for node "X"`

The agent node's structured output didn't match its declared schema. Look at the error path — AJV will say exactly which property failed. Most common:

Missing required property → either tighten the prompt or make the field optional in schema.
additionalProperties: false rejected an unexpected key → the model added a field; either allow it or refine the prompt.
Type mismatch → e.g., model returned "5" (string) where you wanted 5 (number).

The agent dispatcher retries once if shape-invalid. If it fails twice, the node errors and on_error policy kicks in.

`Error: provider "X" not configured`

Your YAML references a provider (anthropic, openai, claude-code, codex, gemini) that isn't set up in environment or in runtime.providers. For LLM SDKs:

bash

export ANTHROPIC_API_KEY=sk-ant-...
# or
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1  # default OK

For cli-agent providers, the CLI must be installed: see cli-agent usage.

`Error: Tool impl ./tools/X.mjs not found`

Path is relative to the experience directory (the dir containing experience.yaml), not the cwd. If you ran oe run examples/review-branch from the repo root, paths like ./tools/foo.mjs resolve to examples/review-branch/tools/foo.mjs.

`Error: Tool returned non-object: undefined`

Your tool's default export didn't return a { state_delta } object. Even if you do nothing, return { state_delta: {} }. See Tool stubs.

`429 Too Many Requests`

The LLM provider is rate-limiting you. The Anthropic and OpenAI clients retry with exponential backoff (250ms, 500ms, 1s, 2s, ...) up to 5 attempts. If you keep hitting 429:

Drop --concurrency (or the YAML's runtime.concurrency)
Drop for_each.concurrency on the busy node
Get a higher-tier API key

See Concurrency + 429 retry for the details.

TUI shows `?` for a node that did run

The TUI tracks nodes by ID. If you've edited the YAML mid-run with oe resume, old run files may have IDs that don't exist anymore. Easiest fix: start a fresh run with oe run.

`oe evolve` produced a diff that won't `git apply`

The evolution advisor's diff isn't pre-validated. If the diff line numbers drifted between when the run happened and when you applied (because you edited the source in between), git apply will reject it. Options:

Edit by hand using the proposal as guidance (it tells you the operation + rationale)
Re-run + re-evolve from a clean state to get a fresh diff
Use git apply --reject to apply hunks that still match and merge the rejected .rej manually

This is documented as a known V1 limitation at Evolution loop.

Performance & cost

How much does a typical run cost?

Depends entirely on the graph. As reference points from our test suite:

Example	Tokens (in/out)	API cost (USD, Claude 3.5 Sonnet)
`hello-tool`	0 / 0	$0.00
`agent-echo`	~600 / ~80	$0.003
`review-branch` (3 dimensions + verifier + score)	~12K / ~2K	$0.07
`tri-cli-orchestration` (Claude + Codex + Gemini chain)	Varies by tier	$0.05 – $0.15
`deep-research` (multi-source synthesis)	~40K / ~6K	$0.20

Use oe inspect <run-id> to see exact tokens per node. Use node.tokens events to integrate with your own cost tracker. See Observability.

Why is my run slow?

Three common causes:

No concurrency — set runtime.concurrency: 4 (or pass --concurrency 4) so independent nodes can run in parallel.
Sequential for_each — default concurrency in for_each is 1. Set concurrency: N to fan out properly.
Long LLM responses — if you're asking for a 5-page-long summary, that's the bottleneck. Tighten the schema's max sizes or split into multiple agents.

oe inspect <run-id> shows per-node duration. Look for the long tail.

Production questions

Can I run this as a service?

Yes. Use the programmatic API (runExperience) inside your service, or oe run as a subprocess. See Deployment for cron / queue / containerized patterns.

How do I monitor production runs?

Subscribe to the EventBus and pipe events to Prometheus / Datadog / your APM of choice. The shape is stable — node.tokens for spend, node.failed for alerts, node.activity for live tracing. See Observability for the recipe.

Is there a hosted version?

No. OpenExpertise is open-source MIT and runs entirely on your infrastructure. Your state, your events, your tokens — never touches our servers.

What about secrets management?

OE itself never reads secrets — it just reads env vars (ANTHROPIC_API_KEY etc.) at runtime. Manage them however you'd manage any env var: 1Password CLI, AWS Secrets Manager, Doppler, etc.

tool nodes that need their own credentials (a tool calling GitHub API, for example) read those from env too. See Deployment for patterns.

→ Still stuck? Open an issue at github.com/xingchengxu/OpenExpertise/issues — please include oe doctor output and the relevant oe inspect excerpt.

FAQ + Troubleshooting ​

Concept questions ​

Is this an agent framework? ​

How is this different from LangGraph / CrewAI / Mastra? ​

Why YAML and not Python / TypeScript / JSX? ​

What's the difference between a tool and a cli-agent? ​

What's the difference between an agent and a skill? ​

Does it work without an LLM? ​

Can I run experiences offline? ​

How good is oe ultra's output — does it just one-shot the YAML? ​

How do I see the graph or share a run with my team? ​

Do I get autocomplete for experience.yaml? ​

Will my workflow change between runs? ​

Setup questions ​

What Node version? ​

Why pnpm specifically? ​

Why does better-sqlite3 install say it's compiling native code? ​

Where does state live? ​

How do I run only one node from a graph? ​

Common errors ​

Error: Node "X" has unknown kind "xyz" ​

Error: cycle detected in graph: A → B → A ​

Error: schema validation failed for node "X" ​

Error: provider "X" not configured ​

Error: Tool impl ./tools/X.mjs not found ​

Error: Tool returned non-object: undefined ​

429 Too Many Requests ​

TUI shows ? for a node that did run ​

oe evolve produced a diff that won't git apply ​

Performance & cost ​

How much does a typical run cost? ​

Why is my run slow? ​

Production questions ​

Can I run this as a service? ​

How do I monitor production runs? ​

Is there a hosted version? ​

What about secrets management? ​