Skip to main content
evalflow is a CLI quality gate for LLM prompts. When you change a prompt, evalflow runs your test suite and blocks CI if quality regresses — the same way pytest blocks CI when tests fail.

Quick Start

Install evalflow and run your first quality gate in 10 minutes.

Core Concepts

Understand quality gates, baselines, and eval methods.

CLI Reference

Complete reference for every evalflow command and flag.

CI/CD Integration

Add evalflow to GitHub Actions, GitLab CI, or CircleCI.

How it works

1

Install

pip install evalflow
2

Initialize

Run evalflow init to create your config, prompt files, and dataset.
3

Write test cases

Add test cases to evals/dataset.json with expected outputs and eval methods.
4

Run the gate

Run evalflow eval. Exit code 0 means pass, 1 means regression, 2 means error — ready for CI.

Why evalflow

Traditional unit tests cannot tell you when a prompt tweak quietly degrades a task. evalflow gives you a local quality gate that works the same on your laptop and in CI.
You changed one prompt.
Summarization improved.
Classification silently broke.
Nobody noticed for 4 days.

evalflow catches this in CI before it ships.

Supported providers

evalflow works with OpenAI, Anthropic, Groq, Gemini, and Ollama. Switch providers with a single flag — no code changes required.