Tuesday, 31 March, 2026
DeepSeek Launches Sparse Attention Model, Halves API Costs
By Isha

DeepSeek has unveiled V3.2-exp, a sparse-attention model using a “lightning indexer” and fine-grained token selection to trim inference expenses. In long-context applications, this architecture can cut per-call API costs by up to 50%. The model is open-weight and publicly available on Hugging Face, enabling further third-party validation and adoption.
Read full story at TechCrunch