Agent Leaderboard, Galileo Technologies, Inc., San Francisco, California, USA


How to Evaluate Agents: Galileo’s Agentic Evaluations in Action

Jan 22, 2025

Evaluating AI agents is one of the toughest challenges in the world of LLMs—but it doesn’t have to be. In this video, we walk you through Galileo’s cutting-edge Agentic Evaluations capabilities, showing how you can systematically assess and refine the performance of agents in real-world scenarios.Through a hands-on demo, you’ll see how Galileo evaluates a food-ordering agent, highlighting critical metrics and pitfalls often missed in traditional testing. Discover why evaluating agents is so challenging—handling open-ended conversations, edge cases, and task-specific goals—and how Galileo’s tools empower you to overcome these hurdles with confidence.
 
Back
Top