Announcing LMEval: An Open Source Framework for Cross-Model Evaluation – blog.google

Spread the love

The latest news from Google on open source releases, major projects, events, and outreach programs for early career developers.
At InCyber Forum Europe in April, we open sourced LMEval, a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets. This announcement coincided with a joint talk with Giskard about our collaboration to increase trust in model safety and security. Giskard uses LMeval to run the Phare benchmark that independently evaluates popular models’ security and safety.
New Large Language Models (LLMs) are released constantly, often promising improvements and new features. To keep up with this fast-paced lifecycle, developers, researchers, and organizations must quickly and reliably evaluate if those newer models are better suited for their specific applications. So far, rapid model evaluation has proven difficult, as it requires tools that allow scalable, accurate, easy-to-use, cross-provider benchmarking.
To address this challenge, we are excited to introduce LMEval (Large Model Evaluator), an open source framework that Google developed to streamline the evaluation of LLMs across diverse benchmark datasets and model providers. LMEval is designed from the ground up to be accurate, multimodal, and easy-to-use. Its key features include:

Creating and running evaluations with LMEval is designed to be intuitive. Here’s a simplified example demonstrating how to evaluate two Gemini model versions on a benchmark:
The LMEval GitHub repository includes example notebooks to help you get started.
Understanding benchmark results requires more than just summary statistics. To help with this, LMEval includes LMEvalboard, a companion dashboard tool that offers an interactive visualization of how models stack up against each other. LMEvalboard provides valuable insights into model strengths and weaknesses, complementing traditional raw evaluation data.
LMEvalboard allows you to:
We invite you to explore LMEval, use it for your own evaluations, and contribute to its development by heading to the LMEval GitHub repository: https://github.com/google/lmeval
LMEval would not have been possible without the help of many people, including: Luca Invernizzi, Lenin Simicich, Marianna Tishchenko, Amanda Walker, and many other Googlers.

source

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top