At an Anthropic Research Salon event in San Francisco, four of our researchers—Alex Tamkin, Jan Leike, Amanda Askell and Josh Batson—discussed alignment science, interpretability, and the future of AI research.
0:00 Introduction
0:30 An overview of alignment
4:48 Challenges of scaling
8:08 Role of interpretability
12:02 How models can help
14:31 Signs of whether alignment is easy or hard
18:28 Q&A — Multi-agent deliberation
20:38 Q&A — Model alignment epiphenomenon
23:43 Q&A — What solving alignment could look like