Arthur Bench

Bench is a tool for evaluating LLMs for production use cases. Whether you are comparing different LLMs, considering different prompts, or testing generation hyperparameters like temperature and # tokens, Bench provides one touch point for all your LLM performance evaluation.

Features

To standardize the workflow of LLM evaluation with a common interface across tasks and use cases
To test whether open source LLMs can do as well as the top closed-source LLM API providers on your specific data
To translate the rankings on LLM leaderboards and benchmarks into scores that you care about for your actual use case
Bench provides one touch point for all your LLM performance evaluation
Install Bench to your python environment with optional dependencies for serving results locally
Alternatively, install Bench to your python environment with minimum dependencies

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Arthur Bench

Arthur Bench Web Site

User Reviews

Be the first to post a review of Arthur Bench!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Artificial Intelligence Software

Registered

2023-08-21

Similar Business Software

StackAI

StackAI is an enterprise AI automation platform to build end-to-end internal tools and processes with AI agents in a fully compliant and secure way. Designed for large organizations, it enables teams to automate complex workflows across operations, compliance, finance, IT, and support without...

See Software
LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Ango Hub

Ango Hub is a quality-focused, enterprise-ready data annotation platform for AI teams, available on cloud and on-premise. It supports computer vision, medical imaging, NLP, audio, video, and 3D point cloud annotation, powering use cases from autonomous driving and robotics to healthcare...

See Software
Retool

Retool is an AI-powered platform that enables teams to build internal software, agents, and workflows faster using natural language and composable building blocks. It allows users to go from a simple prompt to a fully deployed application that works with their existing data, systems, and...

See Software
Pipefy

Pipefy is the AI-driven Business Orchestration and Automation Technologies (BOAT) platform that delivers enterprise results in days, not months. Designed as a secure orchestration layer, Pipefy bridges the gap between rigid legacy systems (ERPs/CRMs) and agile business needs. It allows IT...

See Software
Hostinger Horizons

Hostinger Horizons is the perfect vibe coding tool, letting you build websites and apps based on an idea or a feeling. Simply describe what you want, and our AI acts as your personal designer and developer, creating a complete, mobile friendly project instantly. Horizons is built to create...

See Software

Report inappropriate content

Arthur Bench

Bench is a tool for evaluating LLMs for production use cases

Get an email when there's a new version of Arthur Bench

Features

Project Samples

Project Activity

Categories

License

Follow Arthur Bench

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered