Ensure high-quality LLM outputs with automatic evals. Use a representative sample of user inputs to reduce subjectivity when tuning prompts. Use built-in metrics, LLM-graded evals, or define your own custom metrics. Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow. Use OpenAI, Anthropic, and open-source models like Llama and Vicuna, or integrate custom API providers for any LLM API.

Features

  • Create a list of test cases
  • Set up evaluation metrics
  • Select the best prompt & model
  • Use a representative sample of user inputs to reduce subjectivity when tuning prompts
  • Use built-in metrics, LLM-graded evals, or define your own custom metrics
  • Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow promptfoo

promptfoo Web Site

You Might Also Like
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of promptfoo!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Large Language Models (LLM)

Registered

2023-08-25