Skip to main content
Use Anthropic when your production stack already depends on Claude models. Running evals against the same model family your app ships with keeps eval results closer to real-world behavior.

Configure evalflow.yaml

Add an anthropic block under providers and set default_provider to anthropic:
providers:
  anthropic:
    api_key_env: "ANTHROPIC_API_KEY"
    default_model: "claude-3-5-haiku-20241022"

eval:
  default_provider: "anthropic"
api_key_env is the name of the environment variable that holds your key — evalflow reads the variable at runtime and never stores the key itself.

Set your API key

export ANTHROPIC_API_KEY="your-key-here"
Add this line to your shell profile (~/.bashrc, ~/.zshrc, etc.) or a .env file so you do not have to re-export it each session. Never commit your .env file to version control.

Verify the connection

Run evalflow doctor to confirm evalflow can see the key before running any evals:
evalflow doctor
✓ ANTHROPIC_API_KEY set

Run evals

evalflow eval --provider anthropic
Running test cases against claude-3-5-haiku-20241022...
Quality Gate: PASS
If eval.default_provider is already set to anthropic in your evalflow.yaml, you can omit the --provider flag:
evalflow eval

Provider notes

  • Default model: claude-3-5-haiku-20241022. You can override it by setting default_model to any Claude model your account can access.
  • API key required: An Anthropic account and API key are required. Requests without a valid key will fail immediately.
  • Judge model: By default, evalflow uses Groq as the LLM judge. If you want Anthropic to serve as both the model under test and the judge, update the judge block in evalflow.yaml:
judge:
  provider: "anthropic"
  model: "claude-3-5-haiku-20241022"