Today we’re joined by Robert Osazuwa Ness, a senior researcher at Microsoft Research, Professor at Northeastern University, and Founder of Altdeep.ai. In our conversation with Robert, we explore whether large language models, specifically GPT-3, 3.5, and 4, are good at causal reasoning. We discuss the benchmarks used to evaluate these models and the limitations they have in answering specific causal reasoning questions, while Robert highlights the need for access to weights, training data, and architecture to correctly answer these questions. The episode discusses the challenge of generalization in causal relationships and the importance of incorporating inductive biases, explores the model's ability to generalize beyond the provided benchmarks, and the importance of considering causal factors in decision-making processes.
CHAPTERS
00:00:00 - Intro
00:03:30 - Causal analysis in large language learning
00:07:45 - Do GPT-3.5 and GPT-4 excel in causal inference?
00:10:37 - LLMs show potential in tackling causal problems
00:20:59 - Benchmarking causal tasks in LLMs
00:33:29 - GPT models struggle with causal generalization
00:38:05 - Constructing causal graphs using variable relationships
00:42:33 - Large language models for causal inference are useful but flawed
00:50:05 - Unlocking Causal Reasoning in GPT-4
01:01:29 - Current projects and future directions