Agent Leaderboard, Galileo Technologies, Inc., San Francisco, California, USA

Administrator · Feb 15, 2025

Developer - Galileo Technologies, Inc.

galileo.ai/blog/agent-leaderboard

huggingface.co/spaces/galileo-ai/agent-leaderboard

docs.galileo.ai/galileo/gen-ai-studio-products/galileo-evaluate

Administrator · Feb 15, 2025

How to Evaluate Agents: Galileo’s Agentic Evaluations in Action

Jan 22, 2025

Evaluating AI agents is one of the toughest challenges in the world of LLMs—but it doesn’t have to be. In this video, we walk you through Galileo’s cutting-edge Agentic Evaluations capabilities, showing how you can systematically assess and refine the performance of agents in real-world scenarios.Through a hands-on demo, you’ll see how Galileo evaluates a food-ordering agent, highlighting critical metrics and pitfalls often missed in traditional testing. Discover why evaluating agents is so challenging—handling open-ended conversations, edge cases, and task-specific goals—and how Galileo’s tools empower you to overcome these hurdles with confidence.

Agent Leaderboard, Galileo Technologies, Inc., San Francisco, California, USA

Administrator

Administrator

Administrator

Administrator

Similar threads