HuggingFace Kernel Hub Build Guide¶
Package and publish Flash Sparse Attention Triton kernels to HuggingFace Kernel Hub.
Prerequisites¶
- Nix
- kernel-builder:
- HuggingFace authentication:
- Create the target repo on huggingface.co
Build¶
Generate kernel package¶
Configure Nix environment and build¶
cd huggingface_kernels
export NIX_BUILD_CORES=1
export NIX_CONFIG="max-jobs = 1
extra-substituters = https://huggingface.cachix.org
extra-trusted-public-keys = huggingface.cachix.org-1:ynTPbLS0W8ofXd9fDjk1KvoFky9K2jhxe6r4nXAkc/o=
"
kernel-builder build-and-copy -L
Upload to Hub¶
Usage¶
from kernels import get_kernel
fsa = get_kernel("JingzeShi/flash-sparse-attention", version=1)
# Dense forward
out = fsa.flash_dense_attn_func(q, k, v, is_causal=True)
# Sparse attention
out = fsa.flash_sparse_attn_func(q, k, v, is_causal=True, softmax_threshold=0.01)
# Gated attention
out = fsa.flash_gated_attn_func(q, k, v, alpha, delta, is_causal=True)
# Decode with KV cache
out = fsa.flash_dense_attn_with_kvcache_func(q, k, v)