General-purpose robots promise a future where household assistance is ubiquitous and aging in place is supported by reliable, intelligent help. These robots will unlock human potential by enabling people to shape and interact with the physical world in transformative new ways. At the core of this transformation are Large Behavior Models (LBMs) - embodied AI systems that take in robot sensor data and output actions. LBMs are pretrained on large, diverse manipulation datasets and offer the key to realizing robust, general-purpose robotic intelligence.Yet despite their growing popularity, we still know surprisingly little about what today’s LBMs actually offer - and at what cost. This uncertainty stems from the difficulty of conducting rigorous, large-scale evaluations in real-world robotics. As a result, progress in algorithm and dataset design is often guided by intuition rather than evidence, hampering progress. Our work aims to change that.