Think in Diffusion, Talk in Autoregression: Why TiDAR Actually Matters

How TiDAR turns “free token slots” into real throughput instead of wasted GPU cycles.

Nov 26, 2025

∙ Paid

Autoregressive decoding has owned high-quality language generation for years. It also completely disrespects how GPUs actually work.

One decoding step emits exactly one token, but it drags the entire model’s weights and the full KV cache through memory. You pay the cost of a huge forward pass to say a single word. On modern GPUs like the H100, that means…

Continue reading this post for free, courtesy of Manu.

Or purchase a paid subscription.