IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
Posted by teleforce 5 hours ago
Comments
Comment by teleforce 5 hours ago
>This repository provides a patch for SGLang and vLLM that enables IndexCache inference acceleration for models using DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.
Paper here [1].
[1] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse: