thou that art

thou that art

Think in Diffusion, Talk in Autoregression: Why TiDAR Actually Matters

How TiDAR turns “free token slots” into real throughput instead of wasted GPU cycles.

Manu's avatar
Manu
Nov 26, 2025
∙ Paid

Autoregressive decoding has owned high-quality language generation for years. It also completely disrespects how GPUs actually work.

One decoding step emits exactly one token, but it drags the entire model’s weights and the full KV cache through memory. You pay the cost of a huge forward pass to say a single word. On modern GPUs like the H100, that means…

User's avatar

Continue reading this post for free, courtesy of Manu.

Or purchase a paid subscription.
© 2026 Manu Bhardwaj · Publisher Privacy ∙ Publisher Terms
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture