Think in Diffusion, Talk in Autoregression: Why TiDAR Actually Matters
How TiDAR turns “free token slots” into real throughput instead of wasted GPU cycles.
Autoregressive decoding has owned high-quality language generation for years. It also completely disrespects how GPUs actually work.
One decoding step emits exactly one token, but it drags the entire model’s weights and the full KV cache through memory. You pay the cost of a huge forward pass to say a single word. On modern GPUs like the H100, that means…



