◇ Vellum
Infrastructure for shipping AI products.
Versioned prompts, evals on every change, and a deployment pipeline that doesn't get in your way. Built for teams that ship daily.
TRUSTED BY ENGINEERING TEAMS AT
LINEARNOTIONREPLITCURSORPERPLEX.ARC
EVALS
See regressions before they reach production.
Run thousands of test cases on every prompt change. Diffs are visual. Pass/fail is one number.
$ vellum eval run --suite prod-v3 ✓ 847 / 850 passing · 99.6% Δ vs v2.9: +3 cases · -1.2s latency
PROMPTS
Version control for what your model says.
Git-style branching for prompts. Rollback in one click.
DEPLOY
Canary, A/B, instant rollback.
Traffic-shape new prompt versions. Compare metrics live.
OBSERVABILITY
Trace every prompt, every tool call, every retry.
Structured logs, cost per request, latency p99 — by model, by deployment, by customer.
1.2Mrequests/day
38msp99 overhead
$0.003cost/request
"Vellum cut our prompt iteration time from days to hours. Evals on every commit is non-negotiable now."
— Sarah Chen, Head of AI at Cursor