Here’s the story behind why mixture-of-experts has become the default architecture for cutting-edge AI models, and how NVIDIA’s GB200 NVL72 is removing the scaling bottlenecks holding MoE back.