Use Anthropic when your production stack already depends on Claude models. Running evals against the same model family your app ships with keeps eval results closer to real-world behavior.
Add an anthropic block under providers and set default_provider to anthropic:
providers:
anthropic:
api_key_env: "ANTHROPIC_API_KEY"
default_model: "claude-3-5-haiku-20241022"
eval:
default_provider: "anthropic"
api_key_env is the name of the environment variable that holds your key — evalflow reads the variable at runtime and never stores the key itself.
Set your API key
export ANTHROPIC_API_KEY="your-key-here"
Add this line to your shell profile (~/.bashrc, ~/.zshrc, etc.) or a .env file so you do not have to re-export it each session. Never commit your .env file to version control.
Verify the connection
Run evalflow doctor to confirm evalflow can see the key before running any evals:
Run evals
evalflow eval --provider anthropic
Running test cases against claude-3-5-haiku-20241022...
Quality Gate: PASS
If eval.default_provider is already set to anthropic in your evalflow.yaml, you can omit the --provider flag:
Provider notes
- Default model:
claude-3-5-haiku-20241022. You can override it by setting default_model to any Claude model your account can access.
- API key required: An Anthropic account and API key are required. Requests without a valid key will fail immediately.
- Judge model: By default, evalflow uses Groq as the LLM judge. If you want Anthropic to serve as both the model under test and the judge, update the
judge block in evalflow.yaml:
judge:
provider: "anthropic"
model: "claude-3-5-haiku-20241022"