MLX benchmarks are in and I did not expect these results. the M5 Max blows the M3 Ultra out of the water, despite having more GPU cores and higher memory bandwidth. Compute-bound prefill is much faster (up to 2x) thanks to the new M5 Neural Accelerators, but also memory-bound decoding is faster, so long as you use MoE models instead of dense models. The M5 Ultra will be a beast. Can’t wait to see those numbers