dataset-aggregate
A two-node pipeline: a dataset node loads a CSV into state, a tool node aggregates it — no LLM involved.
What it demonstrates
- The
datasetnode kind with a local CSV source - Sequential edge:
load_rows → aggregate - A tool reading from state via the
readscontract - Pure-compute pipelines (no LLM, no external services)
The graph
graph:
nodes:
- id: load_rows
kind: dataset
source:
type: file
uri: ./data/sample.csv
format: csv
writes: [rows]
- id: aggregate
kind: tool
impl: ./tools/aggregate.mjs
reads: [rows]
writes: [total]
edges:
- { from: load_rows, to: aggregate }State schema
| Field | Type | Direction | Description |
|---|---|---|---|
rows | array<object> | intermediate | CSV rows loaded by load_rows |
total | number | out | Sum of the amount column |
How it runs
oe run examples/dataset-aggregateNo ANTHROPIC_API_KEY required. No CLIs required.
What happens
load_rows(dataset node) readsdata/sample.csv, parses it as CSV, and writes each row as an object intorows.aggregatereceives the fullrowsarray via itsreads: [rows]binding, sums theamountcolumn, and writestotal.
$ oe state total
60data/sample.csv contains five rows with amounts 5, 10, 15, 20, 10 — total 60.
How aggregate.mjs reads state
export default async function aggregate(args) {
const rows = args._state?.rows ?? []
const total = rows.reduce((acc, r) => acc + Number(r.amount ?? 0), 0)
return { state_delta: { total } }
}State fields declared in reads are available on args._state. This is the canonical pattern for tools that consume upstream state.
Try it: variations
1. Swap the CSV. Replace data/sample.csv with any CSV that has an amount column (or update the aggregation formula in aggregate.mjs).
2. Add a filter node. Insert a filter tool between load_rows and aggregate that writes filtered_rows from rows (e.g., only rows where amount > 10). Update the edge chain and the reads binding on aggregate.
3. Replace CSV with JSON. Change format: csv to format: json and point uri at a .json file containing an array of objects. The dataset node handles both formats.
LLM-free pipeline
This example shows that OpenExpertise is not only for AI workflows. Any deterministic data pipeline — ETL, file transforms, CI artifact processing — can be expressed as a graph of tool nodes.