Feb 27, 2026
AI is not just getting smarter. It is getting faster
by learning how to optimize the hardware it runs on.
In this episode, Sharon Zhou, VP of AI at AMD and former Stanford
AI researcher, explains how language models are beginning to write
and optimize their own GPU kernel code. We explore what self
improving AI actually means, how reinforcement learning is used in
post training, and why kernel optimization could be one of the most
overlooked scaling levers in modern AI.
Sharon breaks down how GPU efficiency impacts the cost of training
and inference, why catastrophic forgetting remains a challenge in
continual learning, and how verifiable rewards from hardware
profiling can help models improve themselves. The conversation also
dives into compute economics, synthetic data, RLHF, and why
infrastructure may define the next phase of AI progress.
If you want to understand where AI scaling is really happening
beyond bigger models and more data, this episode goes under the
hood.
Stay Updated:
Craig Smith on X: https://x.com/craigss
Eye on A.I. on X: https://x.com/EyeOn_AI
(00:00) Preview and Intro
(00:25) Sharon Zhou’s Background and Transition to AMD
(02:00) What Is Self-Improving AI?
(04:16) What Is a GPU Kernel and Why It Matters
(07:01) Using AI Agents and Evolutionary Strategies to Write Kernels
(11:31) Just-In-Time Optimization and Continual Learning
(13:59) Self-Improving AI at the Infrastructure Layer
(16:15) Synthetic Data and Models Generating Their Own Training Data
(20:48) AMD’s AI Strategy: Research Meets Product
(23:22) Inside the NeurIPS Tutorial on AI-Generated Kernels
(30:59) Reinforcement Learning Beyond RLHF
(39:09) 10x Faster Kernels vs 10x More Compute
(41:50) Will Efficiency Reduce Chip Demand?
(42:18) Beyond Language Models: Diffusion, JEPA, and Robotics
(45:34) Educating the Next Generation of AI Builders