Ensure high-quality LLM outputs with automatic evals. Use a representative sample of user inputs to reduce subjectivity when tuning prompts. Use built-in metrics, LLM-graded evals, or define your own custom metrics. Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow. Use OpenAI, Anthropic, and open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API.
Features
- Create a list of test cases
- Set up evaluation metrics
- Select the best prompt & model
- Use a representative sample of user inputs to reduce subjectivity when tuning prompts
- Use built-in metrics, LLM-graded evals, or define your own custom metrics
- Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow
Categories
Large Language Models (LLM)License
MIT LicenseFollow promptfoo
You Might Also Like
Gen AI apps are built with MongoDB Atlas
MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of promptfoo!