Use OpenAI when you want the simplest setup and broad model coverage. It is a good default if your application already calls OpenAI in production, since your evals will run against the same models your users interact with.
Add an openai block under providers and set default_provider so evalflow knows which provider to use when you run evalflow eval without a --provider flag.
providers:
openai:
api_key_env: "OPENAI_API_KEY"
default_model: "gpt-4o-mini"
eval:
default_provider: "openai"
api_key_env is the name of the environment variable that holds your key — evalflow reads the variable at runtime and never stores the key itself.
Set your API key
export OPENAI_API_KEY="your-key-here"
Add this line to your shell profile (~/.bashrc, ~/.zshrc, etc.) or a .env file so you do not have to re-export it each session. Never commit your .env file to version control.
Verify the connection
Run evalflow doctor to confirm evalflow can see the key before running any evals:
Run evals
evalflow eval --provider openai
Running test cases against gpt-4o-mini...
Quality Gate: PASS
If eval.default_provider is already set to openai in your evalflow.yaml, you can omit the --provider flag:
Provider notes
- Default model:
gpt-4o-mini. You can override it by setting default_model to any model name your account has access to (for example, gpt-4o or o1-mini).
- API key required: OpenAI requires a paid account or active free-tier credits. Requests without a valid key will fail immediately.
- Judge model: By default, evalflow uses Groq as the LLM judge. If you want OpenAI to serve as both the model under test and the judge, update the
judge block in evalflow.yaml:
judge:
provider: "openai"
model: "gpt-4o-mini"