DeepSeek tests sparse attention to reduce AI costs
Chinese AI firm DeepSeek is experimenting with sparse attention mechanisms to significantly lower the processing costs of large language models. The approach focuses computations on key parts of input data, potentially halving resource demands. This development could make advanced AI more accessible amid rising energy concerns.
DeepSeek, a prominent Chinese AI startup, announced in September 2025 its ongoing tests of sparse attention techniques aimed at slashing the computational costs of transformer-based models. Traditional attention mechanisms in AI systems like large language models (LLMs) require processing every token against every other, leading to quadratic growth in compute needs as models scale. Sparse attention, by contrast, selectively attends to a subset of relevant tokens, reducing this burden.
The initiative builds on research from earlier works, such as those by Google and Meta, but DeepSeek's implementation targets practical deployment in resource-constrained environments. According to the company's blog post, initial benchmarks on their DeepSeek-V2 model showed a 40-60% reduction in inference costs without notable drops in performance. "By optimizing attention patterns, we're not just cutting costs—we're enabling broader AI adoption," stated DeepSeek's chief scientist, Liang Wang, in the announcement.
Timeline-wise, DeepSeek began internal testing in early 2025, following the release of their open-source V2 model in May. The sparse attention module integrates with existing architectures, allowing retrofitting to models up to 236 billion parameters. Contextually, this comes amid global scrutiny over AI's environmental footprint; training a single large model can consume energy equivalent to hundreds of households annually.
Experts note potential implications for edge computing and mobile AI. However, challenges remain, including ensuring sparsity doesn't compromise long-context understanding. DeepSeek plans to release the technique as open-source by year's end, inviting community feedback.
No major contradictions appear in reporting, as details align across technical previews.