NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
NVIDIA's Gated DeltaNet-2 improves linear attention by decoupling erase and write operations, outperforming Mamba-2 in efficiency and memory management.
Gated DeltaNet-2 addresses the memory bottleneck in linear attention models by using separate channel-wise gates for erasing old content and writing new data. This refinement allows for more precise updates to the fixed-size recurrent state, showing superior performance over previous architectures like Mamba-2 in tests trained on 100B tokens.