Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation

What’s New

[2025.09.28] 🎉 Github page is created!

[2025.09.19] 🎉 TTA-DIFFUSION is accepted to NeurIPS 2025!

TL;DR We introduce TTA-DIFFUSION, an inference-time token timestep allocation method that mitigates update-forgetting in diffusion language models. By preserving stable tokens and reallocating denoising to uncertain ones (linear/adaptive schedules), we improve control accuracy and fluency while cutting steps: on sentiment control we exceed prior diffusion models by >20% accuracy and reach ~½ the perplexity using < 1/5 the denoising steps.

Abstract

Classifier guidance is effective for steering diffusion language models, but it often causes update-forgetting—token-level edits made at one step are overwritten later—degrading fluency and controllability. We formalize this phenomenon and propose TTA-DIFFUSION, an inference-time approach that assigns per-token timesteps and reallocates denoising effort where refinement is needed. Stable tokens receive smaller timesteps, while uncertain or classifier-critical tokens are updated more aggressively via linear or gradient-adaptive schedules.

Built on a simplex-space diffusion LM with progressive step reduction, TTA-DIFFUSION achieves strong control with far fewer steps. On sentiment control it delivers >20% higher accuracy and ~2× lower perplexity than prior diffusion baselines using less than one-fifth the steps, and it reduces toxicity while maintaining diversity on detoxification. The method highlights timestep allocation as a principled, efficient mechanism for stable, controllable text generation.

Method

Token Timestep Allocation. Each token x_i is assigned its own timestep t_i = f(i,t). We use (i) a linear schedule that gradually increases timesteps across positions and (ii) an adaptive schedule that maps normalized classifier-gradient magnitudes to timesteps so high-importance tokens get smaller steps, preserving previous edits.

Simplex Diffusion Extension. We operate directly in vocabulary space via a logit simplex mapping (SSD-style), enabling seamless classifier integration without embedding-space mismatch.

Progressive Step Reduction. We fine-tune models to run with fewer diffusion steps (e.g., 100→50) to cut inference cost while maintaining control fidelity and fluency.

Overview of TTA-DIFFUSION: linear & adaptive allocation with progressive step reduction

Main Results

Summary of detoxification and sentiment control. Higher is better for Acc / Dist-3; lower is better for toxicity and perplexity (PPL). TTA-DIFFUSION attains strong control and fluency with substantially fewer steps.

Model	Detoxification (↓ better)			Sentiment Control
Model	Avg. tox	Max tox	PPL	Acc (↑)	PPL (↓)	Dist-3 (↑)
PPLM	30.6	59.7	107.4	42.6	201.1	0.94
GeDi	22.0	36.1	98.8	79.9	98.6	0.91
DExperts	15.1	32.0	48.0	83.2	31.8	0.93
Air-decoding	18.5	40.4	49.0	82.6	27.1	0.94
LM-Steer	19.1	47.0	44.4	85.4	78.8	0.86
Diffusion-LM (T=2000)	21.8	–	131.2	72.8	89.3	0.94
SSD-LM (T=1000)	24.6	50.3	58.3	76.2	51.1	0.94
LD4LG (T=250)	14.5	–	296.4	59.9	70.7	0.95
TESS (T=1000)	14.6	32.3	58.8	71.1	31.7	0.85
TTA-DIFFUSION (T=200)	12.2	26.0	40.6	94.7	20.5	0.85
TTA-DIFFUSION (T=100)	12.2	26.7	46.3	91.0	25.4	0.86
TTA-DIFFUSION (T=50)	12.5	27.3	59.5	85.6	42.0	0.88
Numbers adapted from the paper draft. “–” indicates a value not reported.

@inproceedings{kim2025tta, title = {Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation}, author = {Kim, Woojin and Do, Jaeyoung}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2025} }

Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation

Advances in Neural Information Processing Systems (NeurIPS), 2025

What’s New

Token-wise timestep allocation stabilizes classifier-guided refinement.

Abstract

Method

Overview of TTA-DIFFUSION: linear & adaptive allocation with progressive step reduction

Main Results

BibTeX