What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically? Join Anthropic researchers Josh Batson, Emmanuel Ameisen, and Jack Lindsey as they discuss the latest research on AI interpretability. Read more about Anthropic's interpretability research: https://www.anthropic.com/news/tracing-thoughts-language-model Sections: Introduction [00:00] The biology of AI models [01:37] Scientific methods to open the black box [6:43] Some surprising features inside Claude's mind [10:35] Can we trust what a model claims it's thinking? [20:39] Why do AI models hallucinate? [25:17] AI models planning ahead [34:15] Why interpretability matters [38:30] The future of interpretability [53:35]
From Anthropic