HuggingFace Kernel Hub Build Guide¶

Package and publish Flash Sparse Attention Triton kernels to HuggingFace Kernel Hub.

Prerequisites¶

Nix

kernel-builder:

curl -fsSL https://raw.githubusercontent.com/huggingface/kernels/main/install.sh | bash

HuggingFace authentication:
```
hf auth login
```
Create the target repo on huggingface.co

Build¶

Generate kernel package¶

python scripts/build_hf_kernels.py --clean

Configure Nix environment and build¶

cd huggingface_kernels

export NIX_BUILD_CORES=1
export NIX_CONFIG="max-jobs = 1
extra-substituters = https://huggingface.cachix.org
extra-trusted-public-keys = huggingface.cachix.org-1:ynTPbLS0W8ofXd9fDjk1KvoFky9K2jhxe6r4nXAkc/o=
"

kernel-builder build-and-copy -L

Upload to Hub¶

kernel-builder upload --repo-type model

Usage¶

from kernels import get_kernel

fsa = get_kernel("JingzeShi/flash-sparse-attention", version=1)

# Dense forward
out = fsa.flash_dense_attn_func(q, k, v, is_causal=True)

# Sparse attention
out = fsa.flash_sparse_attn_func(q, k, v, is_causal=True, softmax_threshold=0.01)

# Gated attention
out = fsa.flash_gated_attn_func(q, k, v, alpha, delta, is_causal=True)

# Decode with KV cache
out = fsa.flash_dense_attn_with_kvcache_func(q, k, v)