MatX CEO Reiner Pope on the Dwarkesh Podcast

Our portfolio company MatX's CEO Reiner Pope joined Dwarkesh Patel on the Dwarkesh Podcast for a technical deep-dive on the math behind how LLMs are trained and served. The conversation walks through how batch size shapes token cost and inference latency, how expert parallelism is laid out across GPU racks in mixture-of-experts models, and why pipeline parallelism helps training but does little for inference.

Reiner also covers why both memory bandwidth and memory capacity bound context length and model scaling, why reinforcement learning is pushing models roughly 100x past Chinchilla-optimal training, what public API pricing reveals about a model's prefill and decode costs, and the jump in scale-up domains from 8-GPU Hopper to 72-GPU Blackwell to 500+ GPU Rubin systems.

Read the full write-up on Dwarkesh.