Asynctls

research Apr 22, 2026 11 min

AsyncTLS: 4.7x Faster Long-Context LLM Inference With Two-Level Sparse Attention

AsyncTLS sparse attention fuses block filtering, token selection, and async KV cache offloading for 1.3-4.7x throughput gains at 48k-96k token contexts.