Jan Leike from Anthropic presenting 'Supervising AI on hard tasks' on July 21, 2024 at the Vienna Alignment Workshop.Key Highlights:
- Overcoming challenges in supervising AI on difficult tasks
- Implementing scalable oversight
- Eliciting AI capabilities with precision
- Applying tampering and adversarial evaluations
The Alignment Workshop is a series of events convening top ML researchers from industry and academia, along with experts in the government and nonprofit sectors, to discuss and debate topics related to AI alignment. The goal is to enable researchers and policymakers to better understand potential risks from advanced AI, and strategies for solving them.