Jerry Tworek


How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

Oct 16, 2025

What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI’s VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn’t). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.

We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don’t), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI?

This is the MAD Podcast —AI for the 99%. If you’re curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.

00:00 - Intro
01:01 - What Reasoning Actually Means in AI
02:32 - Chain of Thought: Models Thinking in Words
05:25 - How Models Decide Thinking Time
07:24 - Evolution from O1 to O3 to GPT-5
11:00 - Before OpenAI: Growing up in Poland, Dropping out of School, Trading
20:32 - Working on Robotics and Rubik's Cube Solving
23:02 - A Day in the Life: Talking to Researchers
24:06 - How Research Priorities Are Determined
26:53 - Collaboration vs IP Protection at OpenAI
29:32 - Shipping Fast While Doing Deep Research
31:52 - Using OpenAI's Own Tools Daily
32:43 - Pre-Training Plus RL: The Modern AI Stack
35:10 - Reinforcement Learning 101: Training Dogs
40:17 - The Evolution of Deep Reinforcement Learning
42:09 - When GPT-4 Seemed Underwhelming at First
45:39 - How RLHF Made GPT-4 Actually Useful
48:02 - Unsupervised vs Supervised Learning
49:59 - GRPO and How DeepSeek Accelerated US Research
53:05 - What It Takes to Scale Reinforcement Learning
55:36 - Agentic AI and Long-Horizon Thinking
59:19 - Alignment as an RL Problem
1:01:11 - Winning ICPC World Finals Without Specific Training
1:05:53 - Applying RL Beyond Math and Coding
1:09:15 - The Path from Here to AGI
1:12:23 - Pure RL vs Language Models
 

Why One of OpenAI’s Top Researchers Walked Away - EP 53 Jerry Tworek

Jan 22, 2026

We bring you an exit interview with Jerry Tworek, one of the most influential (and least public) figures behind the modern AI boom. Tworek recently left OpenAI after nearly seven years at the company, where he worked on or led some of its most important breakthroughs, including the shift toward reasoning-driven models. In a candid, wide-ranging conversation, Tworek reflects on OpenAI’s transformation from a 30-person research lab into a global AI powerhouse, what it’s like to build frontier models amid nonstop pressure, and why he ultimately decided it was time to walk away.

We dig into what Tworek thinks the entire AI industry is getting wrong, why nearly every major lab is now converging on the same ideas, and where the next real breakthroughs might come from. He talks openly about risk, research culture, reinforcement learning, architectural dead ends, AGI timelines, and the psychological toll of working at the center of the AI arms race.

Chapters
00:00:00 Intro - Core Memory Year One
00:03:37 Jerry Tworek Joins the Pod
00:05:34 Reflecting on Seven Years of Hypergrowth at OpenAI
00:09:08 Why Jerry Left: Risk, Org Charts, and Research Freedom
00:12:28 Beyond Pre-Training: The Need for New Scaling Methods
00:16:19 The "Sad" Homogeneity of Current AI Labs
00:23:35 The Mavericks: Carmack, Ilya, and LeCun
00:26:32 Training AI on Video Games
00:32:14 Two Big Bets: New Architectures & Continual Learning
00:34:32 Updating AGI Timelines
00:37:16 Q-Star, Strawberry, and the Fear of New Tech
00:41:10 The Coup and the Soap Opera
00:43:29 Is the Hype Justified?
00:49:23 The Polish Mafia
00:53:15 Google's Comeback and OpenAI's Fumble
00:59:27 Why Anthropic is Impressive
01:03:01 Jerry's Next Chapter: Taking Big Research Bets
01:11:56 Is AI Research Star-Driven?
01:14:32 Meditation & Conclusion
 
Back
Top