Define AI data transformations in YAML. Classify, extract, and generate at scale with type-safe outputs, automatic retries, and rate-limit handling.
Define your data source, transformations, and output schema in a single file. AffineBox handles batching, retries, rate limits, and output validation.
Real pipelines running in production by our beta users.
A SaaS team routes their Zendesk export through AffineBox to auto-classify tickets by category and urgency. Results feed back into their routing rules.
An ops team pulls key fields (vendor, amount, date, line items) from scanned invoices. Schema validation ensures every row has the right types before hitting the DB.
An e-commerce company generates SEO-friendly descriptions from a CSV of product attributes. Outputs are validated against a max-length schema and tone guide.
Not a prompt playground. AffineBox is infrastructure for running LLM transformations on real data at scale.
Define output schemas with types, enums, min/max, regex. Every LLM response is validated before writing. Failed validations trigger retries automatically.
Read from CSV, Postgres, S3, BigQuery, or HTTP APIs. Write to any database, file, or webhook. Connection strings from env vars.
Automatic rate-limit detection and backoff. Configurable concurrency and batch sizes. Resume from where you left off on failures.
OpenAI, Anthropic, Google, or any OpenAI-compatible endpoint. Switch models by changing one line. Cost tracking per run.
For complex logic, use the Python SDK instead of YAML. Same engine under the hood. Compose pipelines programmatically.
Every run is logged with inputs, outputs, latency, cost, and error rates. Export to your own observability stack or use the local dashboard.
Free during beta. Paid tiers launching Q3 2026. Pricing is per record processed through a pipeline (you bring your own LLM API keys).
We ship weekly. Here is what landed recently.
Added resume-from-checkpoint for interrupted runs. Fixed a bug where Postgres output would silently drop rows with NULL primary keys. Improved rate-limit backoff for Anthropic API.
Python SDK now in public beta. New connector: BigQuery (read & write). Output schema now supports nested objects and arrays. Breaking: renamed transform.schema to transform.output_schema.
Added cost tracking per run (prints total $ spent at end). New flag: --dry-run to validate pipeline config without processing. S3 connector now supports IAM role auth.
Initial support for Anthropic Claude models. Added --concurrency flag override. Fixed template rendering for nested row fields.
Install the CLI, write a pipeline YAML, and run it. No sign-up required for the free tier.