This repository contains the optimized CUDA kernel implementation for InfLLM V2's Two-Stage Sparse Attention Mechanism. Our implementation provides high-performance kernels for both Stage 1 (Top-K ...
This repository contains the complete solution for compiling and running xformers on NVIDIA RTX 5090 D with Blackwell architecture (sm_120), including all necessary patches, scripts, and documentation ...