Well, for now, the method to set the HF CLI to the absolute fastest mode is currently as follows. Since the method for setting environment variables varies by OS, replace the export part according to your environment.
Below is a focused, step-by-step playbook that actually increases Hugging Face (HF) CLI download speed, with background for each step and concrete commands. The knobs and commands are current as of v1.1.x of huggingface_hub.
Background: why fast ISP ≠ fast model pulls
Large files on the Hub are served via a CDN and, by default, downloaded through the HF CLI using the Rust-based hf-xet path. Xet reconstructs files from chunks; this favors SSD/NVMe and high concurrency, and it exposes environment variables you can tune. Route/POP (region) variability also matters, so the same 500 Mb/s line can feel “very fast” or “very slow” depending on path and local I/O. (Hugging Face)
1) Put the cache on a fast SSD and update the tools
Why: Eliminates a common disk bottleneck and ensures you have recent CLI/Xet behavior. HF_HOME controls where the Hub cache lives. (Hugging Face)
# References:
# - HF env vars (HF_HOME, HF_HUB_CACHE, timeouts, Xet knobs): https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
# - CLI install/usage: https://huggingface.co/docs/huggingface_hub/en/guides/cli
python -m pip install -U "huggingface_hub" hf_xet
export HF_HOME="/fast-ssd/.cache/huggingface" # move cache to SSD/NVMe
export HF_HUB_DOWNLOAD_TIMEOUT=60 # reduce spurious timeouts on slow routes
HF_HOME sets the base cache dir; HF_HUB_DOWNLOAD_TIMEOUT raises the per-request timeout from the 10 s default to something more forgiving. (Hugging Face)
2) Turn on Xet “high performance” and set explicit concurrency
Why: The Xet backend exposes controls that directly impact throughput. High-perf mode tries to saturate network/CPU; the per-file concurrency (HF_XET_NUM_CONCURRENT_RANGE_GETS) increases parallel chunk reads. Defaults are conservative (16). (Hugging Face)
# References (HF_XET_HIGH_PERFORMANCE, HF_XET_NUM_CONCURRENT_RANGE_GETS):
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
export HF_XET_HIGH_PERFORMANCE=1
export HF_XET_NUM_CONCURRENT_RANGE_GETS=24 # try 24–32 on fast SSD/CPU; 8–16 on weaker machines
These variables are officially documented; raising concurrency helps if you have headroom. (Hugging Face)
3) If the destination is an HDD, force sequential writes
Why: Parallel random writes that help SSDs can thrash spinning disks. The Xet toggle below switches to sequential writes and is specifically recommended for HDDs. (Hugging Face)
# Reference: HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY (HDD-only)
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
export HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=1
4) Download only what you need; use --include and --local-dir
Why: Avoids pulling entire repos and keeps writes local to your target folder. The CLI implements patterns and a direct “download-to-folder” mode. (Hugging Face)
# References:
# - CLI download patterns & --local-dir: https://huggingface.co/docs/huggingface_hub/en/guides/cli
# - Download guide (hf_hub_download/snapshot_download): https://huggingface.co/docs/huggingface_hub/en/guides/download
hf download Comfy-Org/Qwen-Image-Edit_ComfyUI \
--include "split_files/diffusion_models/qwen_image_edit_fp8_e4m3fn.safetensors" \
--local-dir "/path/to/ComfyUI/models/diffusion_models"
5) If speeds are spiky under the CLI, A/B test the transfer path
Why: Isolate whether Xet routing/behavior is the issue on your machine/route. You can explicitly disable Xet; set this before importing or invoking HF code so it’s honored. (Hugging Face)
# References:
# - HF_HUB_DISABLE_XET (must be set prior to import/CLI): https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
# - Note about setting it before use: https://github.com/huggingface/huggingface_hub/issues/3266
export HF_HUB_DISABLE_XET=1
hf download <repo> --include "<file>" --local-dir "<dir>"
6) Use a multi-connection HTTP downloader for one-off big pulls
Why: When a single TCP stream underperforms on your current path, multiple HTTP range connections often achieve higher steady throughput. aria2c is a simple, robust option. (DEV Community)
# Reference/recipe: https://dev.to/susumuota/faster-and-more-reliable-hugging-face-downloads-using-aria2-and-gnu-parallel-4f2b
# 1) Get the “Copy download link” (?download=true) from the file’s page on the Hub
aria2c -x16 -s16 -j4 -c \
"https://huggingface.co/<repo>/resolve/main/<path/to/large-file>?download=true"
# -x connections per server; -s splits per file; -j parallel jobs; -c resume
7) Change network egress (region) when speeds are unusually low
Why: The Hub’s CDN and peering vary by POP and time. Toggling VPN on/off or switching to a nearby region often changes anemic 1–2 MB/s to tens of MB/s. Forum reports and timeout traces on cdn-lfs*.hf.co illustrate this effect. (Hugging Face Forums)
8) Know what’s deprecated (so you don’t chase the wrong lever)
Why: Many older posts recommend hf_transfer. In current huggingface_hub v1.x, hf_transfer is removed and HF_HUB_ENABLE_HF_TRANSFER is ignored. Use the Xet knobs (HF_XET_HIGH_PERFORMANCE, etc.) instead. (Hugging Face)
60-second diagnosis (copy/paste)
This quickly tells you where the bottleneck is on your system/route.
# 0) Prep
python -m pip install -U "huggingface_hub" hf_xet # https://huggingface.co/docs/huggingface_hub/en/guides/cli
export HF_HOME="/fast-ssd/.cache/huggingface" # https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
# 1) CLI with tuned Xet (expect best performance on SSD)
export HF_XET_HIGH_PERFORMANCE=1 # https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
export HF_XET_NUM_CONCURRENT_RANGE_GETS=24
hf download <repo> --include "<bigfile>" --local-dir "<dir>"
# 2) Same CLI, but Xet off (if #1 was spiky)
export HF_HUB_DISABLE_XET=1 # set BEFORE running HF code
hf download <repo> --include "<bigfile>" --local-dir "<dir>"
# 3) Multi-connection HTTP (expect steady speeds if single-stream path is weak)
aria2c -x16 -s16 -j4 -c "<direct ?download=true URL>" # https://dev.to/susumuota/faster-and-more-reliable-hugging-face-downloads-using-aria2-and-gnu-parallel-4f2b
Interpretation:
• #1 fast and stable → keep Xet high-perf + SSD.
• #1 spiky but #2 steady → Xet/route interaction on this box; run with Xet disabled here. (GitHub)
• Both slow but aria2c fast → single-stream/POP issue; keep CLI for convenience but prefer multi-connection pulls for huge files. (DEV Community)
• All slow → switch region (VPN/off-VPN) and retry. (Hugging Face Forums)
Common pitfalls (quick checks)
- Downloading whole repos by accident. Use
--includeto pull only the large file(s) you need. (Hugging Face) - Tiny default timeouts. Raise
HF_HUB_DOWNLOAD_TIMEOUTto reduce intermittent read timeouts oncdn-lfs*.hf.co. (Hugging Face) - HDD as destination. If you must, add
HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=1. (Hugging Face) - Expecting
hf_transferto help. It’s gone in v1.x; use Xet high-perf instead. (Hugging Face)
Short, curated references
Official docs
- Environment variables (HF_HOME/HF_HUB_CACHE, timeouts, Xet knobs, disable Xet). Useful to confirm every variable above. (Hugging Face)
- CLI guide (
hf download,--include,--local-dir, timeouts). Step-by-step usage examples. (Hugging Face) - Migration note:
hf_transferremoved;HF_HUB_ENABLE_HF_TRANSFERignored; useHF_XET_HIGH_PERFORMANCE. (Hugging Face)
GitHub issues (real-world behavior)
- “HF_HUB_DISABLE_XET not disabling unless set before import” — clarifies when the toggle takes effect. (GitHub)
- “Large downloads stuck/slow with VPN” — intermittent slow/0% cases on big files with CLI. (GitHub)
Community threads/recipes
- Intermittent slow speeds, disconnects, and region changes as fixes. Background on route/POP sensitivity. (Hugging Face Forums)
- Practical
aria2crecipe for multi-connection, resumable downloads. (DEV Community)