IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

>This repository provides a patch for SGLang and vLLM that enables IndexCache inference acceleration for models using DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.

Paper here [1].

[1] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse:

https://arxiv.org/abs/2603.12201

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Comments