Hybrid LLM routing
Problem
Different tasks have different cost/quality trade-offs. You want frontier-model quality for the final synthesis but a cheaper or faster model for bulk classification passes. Or you want to compare providers on the same task. Or your org already runs a self-hosted vLLM for non-sensitive tasks and only sends sensitive data to Anthropic.
Solution
# experience.yaml
meta:
runtime:
providers:
# Default: Anthropic for untagged agent nodes
default: anthropic
# Named provider alias for OpenAI-protocol models
fast:
kind: openai
model: gpt-4o-mini
base_url: '{{$env.OPENAI_BASE_URL}}' # override for vLLM / Together AI / Groq
api_key: '{{$env.OPENAI_API_KEY}}'
# Another alias for a self-hosted model
local:
kind: openai
model: meta-llama/Meta-Llama-3.1-8B-Instruct
base_url: 'http://localhost:8000/v1'
api_key: 'not-used'
state:
schema:
items: { type: array, items: { type: object }, merge: set_once }
classified: { type: array, items: { type: object }, merge: array_append }
synthesis: { type: string }
graph:
nodes:
- id: load_items
kind: tool
impl: ./tools/load.mjs
writes: [items]
# Bulk classification — cheap model, runs in parallel
- id: classify
kind: agent
provider: fast # ← uses the 'fast' alias above
prompt: ./prompts/classify.md
reads: [items]
for_each:
source: $.items
concurrency: 8
schema:
type: object
properties:
classified:
type: array
items: { type: object }
required: [classified]
writes: [classified]
# Final synthesis — frontier model, runs once
- id: synthesize
kind: agent
# provider: anthropic is implicit (default)
prompt: ./prompts/synthesize.md
reads: [classified]
schema:
type: object
properties:
synthesis: { type: string }
required: [synthesis]
additionalProperties: false
writes: [synthesis]
edges:
- { from: load_items, to: classify }
- { from: classify, to: synthesize }export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# For local vLLM: export OPENAI_BASE_URL=http://localhost:8000/v1
oe run experience.yamlWalkthrough
meta.runtime.providers is a map of provider aliases. The special key default sets which provider is used for agent nodes that don't specify a provider: key. Any other key is an alias you choose; reference it with provider: <alias> on any agent node.
The kind inside a provider definition is either anthropic (the default SDK) or openai (the @openexpertise/llm-openai adapter). Any model speaking the OpenAI chat-completions API works with kind: openai — Azure OpenAI, Together AI, Groq, Fireworks, self-hosted vLLM, Ollama, LM Studio.
base_url and api_key support {{$env.VAR}} interpolation so secrets never live in YAML. If base_url is omitted, the OpenAI adapter uses https://api.openai.com/v1.
The classify node with provider: fast runs 8 items in parallel using the cheap model. The synthesize node with no explicit provider falls back to default: anthropic, getting frontier-model quality for the one call that matters most.
Variations
- A/B test providers: Run the same experience twice with different
defaultvalues by using--set meta.runtime.providers.default=faston the CLI. Compare outputs withoe diff <run-id-a> <run-id-b>. - Per-environment routing: Keep a
providers-prod.yamlandproviders-dev.yamland compose them withoe run --config providers-dev.yaml. - Cost logging: Every
node.tokensevent in the event log includesprovider,model,input_tokens, andoutput_tokens. Aggregate withoe inspect <run-id>or the programmatic EventBus.