Why Hugging Face breaks into multiple network “shapes”
Unlike a single npm registry POST that mostly talks to one well-known hostname, Hugging Face orchestration fans out. The browser or the huggingface_hub Python library might call huggingface.co for API and HTML, while Git clients resolve hf.co short hosts for repositories. Large files frequently move through storage and CDN front domains that are not identical to the pretty Hub URL in your address bar. Git LFS batch APIs and standalone download URLs can therefore appear as different registrable suffixes in your Clash logs even though, from the user’s perspective, it is “one model download.”
That diversity matters because split routing is fundamentally about intent per flow. You might want a low-latency, quota-friendly path for interactive browsing while giving multi-gigabyte Git LFS traffic a different outbound that tolerates long sessions. You might also want domestic DIRECT access to a regional mirror while still using a proxy for the canonical Hub API. Clash cannot infer product semantics; it matches domains, IPs, and geolocation tags. Your job is to map what git, git-lfs, or huggingface-cli actually hit—then encode that into ordered rules that point at the right policy groups.
This guide intentionally diverges from our Cursor and npm developer proxy article: registries and editor CDNs are chatty in a different way than terabyte-scale model weights. Here the emphasis is CDN object hosts, LFS negotiation, resumable transfers, and the failure modes you see when only half the call graph is proxied correctly.
Hostnames and surfaces you should verify in logs
CDN vendors and storage backends evolve. Treat any static list—including this article—as a starting point you confirm on your own machine during a controlled download. Common families you will see in 2026-era Hub traffic include DOMAIN-SUFFIX,huggingface.co for the main site and APIs, DOMAIN-SUFFIX,hf.co for Git-related short domains, and various storage or CDN domains that appear when Git LFS fetches large blobs. Some downloads use presigned URLs that decode to cloud object storage hostnames outside the huggingface.co tree entirely; if your rules stop at a single suffix, those flows fall through to your catch-all and may pick the wrong policy group.
Practical workflow: enable verbose-enough logging in your Clash Meta (mihomo) GUI or tail your core log while running a small huggingface-cli download for a tiny shard, or start GIT_TRACE_PACKET=1 GIT_CURL_VERBOSE=1 git lfs pull for a controlled slice of traffic. Collect the set of HTTPS hosts that appear, then collapse them into DOMAIN-SUFFIX lines where safe. Prefer suffix coverage for stable registrable bases; use exact DOMAIN matches when one sibling must differ. If you mirror weights internally, add your mirror’s suffix explicitly so internal fetches never wander out through a foreign exit by accident.
Remember that not every stall is geopolitical routing. Disk bandwidth, antivirus scanning, Wi-Fi airtime, and Python GIL contention can all masquerade as “proxy slow.” Clash optimizes path selection, not your NVMe queue depth.
DOMAIN-SUFFIX coverage with the hostnames you actually see. Copying mega-lists from stale gists often misses the one CDN your region hits hardest—the exact failure we are trying to avoid.
Policy groups: give LFS, CDN blobs, and the Hub different lanes
Policy groups are the named outbounds your rules target. A pragmatic layout for Hugging Face introduces at least two lanes beyond your generic browser bucket: one group optimized for long, high-throughput downloads—call it HF_LFS—and another for interactive Hub UI and REST calls—call it HF_WEB. Some teams add a third HF_CDN when logs show large object traffic routinely detaches to a different suffix family than the LFS batch API hosts. The point is not ornamental YAML; it is operational clarity when midnight training runs fail and you need to swap only the fat-pipe exit without touching unrelated tabs.
Outer select groups make human overrides easy; inner url-test groups help when you want automatic selection among similar nodes. For huge sequential downloads, flapping automatic picks can reset partially fetched objects and amplify retries. When you observe that pattern, pin a stable node inside HF_LFS temporarily to isolate whether instability comes from the auto selector or from upstream throttling. The structural patterns mirror our YAML policy groups guide—names must stay honest six months later.
proxy-groups: - name: "HF_WEB" type: select proxies: - "Auto-US" - "Direct" - name: "HF_LFS" type: select proxies: - "Large-US" - "Direct" - name: "Large-US" type: url-test proxies: - "node-us-1" - "node-us-2" url: "https://www.gstatic.com/generate_204" interval: 300
The sketch is illustrative—your subscription’s node names and probe URL should follow operator guidance. The structural point is to give model weights a throughput-oriented lane you can reason about independently from quick Hub JSON calls.
Rules: ordering, catch-alls, and storage exceptions
Clash walks rules from top to bottom until one matches. Put narrower, intent-specific lines before broad GEOIP or final MATCH entries. If a presigned object hostname is missing, that flow will slide into whatever giant bucket sits below—often defeating the purpose of your carefully named HF_LFS group. When you adopt Rule Providers for community lists, keep Hugging Face–critical suffixes in a short, reviewable block you control; remote lists should not silently override a storage domain you rely on for training.
A minimal skeleton might route confirmed Hub and Git short domains to HF_WEB, while sending known large-object storage suffixes you verified in logs to HF_LFS. Exact policy names must match proxy-groups verbatim.
rules: - # Put specific blob/CDN suffixes before broad hf.co catch-alls - DOMAIN-SUFFIX,cdn-lfs.huggingface.co,HF_LFS - DOMAIN-SUFFIX,cdn-lfs.hf.co,HF_LFS - DOMAIN-SUFFIX,xethub.hf.co,HF_LFS - DOMAIN-SUFFIX,huggingface.co,HF_WEB - DOMAIN-SUFFIX,hf.co,HF_WEB - # Presigned third-party URLs: add exact hosts from logs—avoid ultra-broad STORAGE rules - # ... domestic DIRECT / LAN bypass ... - MATCH,PROXY
Regional LFS edges sometimes appear as more specific names such as cdn-lfs-us-1.hf.co or cdn-lfs-eu-1.hf.co; a parent suffix like cdn-lfs.hf.co may or may not cover every variant your resolver returns—verify once per network. Xet-related traffic may show under xethub.hf.co subdomains for bridge and transfer roles. Avoid keyword rules that are broader than you intend; they are easy to overfit and can pull unrelated traffic into HF_LFS. When two rules could match, the earlier one wins—reordering is a legitimate fix, not a hack.
Timeouts, retries, and how to correlate logs with rules
Large downloads retry at multiple layers: curl inside Git, Git LFS itself, Python’s HTTP stack in huggingface_hub, and sometimes custom accelerators such as hf_transfer. When a middle hop returns 403, 416, or a truncated chunk, clients may restart with different range requests—your Clash log will show new connections that are not morally “new downloads,” even though the UI percentage rewinds.
Use logs to answer a crisp question: did this TCP flow pick the policy I expected? If the hostname is correct but the outbound is wrong, an earlier rule matched—reorder or narrow the broader matcher. If the outbound is correct but throughput collapses, suspect the node, QUIC or MTU issues, or upstream fair-use throttling rather than YAML syntax. If TLS handshakes fail before bytes move, confirm system time and core freshness; an outdated Clash Meta core should not be the reason you cannot complete a modern TLS session to a storage host.
| Symptom | First checks |
|---|---|
| Model page loads; LFS hangs at 0% | Missing suffix for batch API or object host; compare Git verbose logs with active rules |
| Speed starts strong, then stair-steps down | Auto selector flapping; pin HF_LFS to one node; watch for parallel uploaders on same Wi-Fi |
| Works in browser, fails in WSL | Split env: see WSL2 and Windows Clash for mirrored networking and proxy variables |
| Random TLS or certificate errors | MITM security software, wrong system clock, or stale core—review Meta upgrade path |
TUN versus explicit Git proxy settings
TUN mode captures eligible IP traffic transparently, which helps tools that ignore environment variables. Classic system proxy ports require explicit configuration: Git’s http.proxy and https.proxy, or lowercase http_proxy variables in shells. Mixing both half-configured layers produces “mysterious” partial success—some subprocess inherits TUN-routed packets while another follows a dead SOCKS port.
For reproducible team behavior, many shops standardize on explicit proxy variables inside CI and on developer laptops, while reserving TUN for GUI apps that cannot be wrangled otherwise. Either approach can work with Hugging Face as long as DNS mode aligns with how your rules expect to see domains at match time. Symptoms that look like “wrong proxy” are sometimes resolver interaction between fake-IP and redir-host modes; validate with a short test download before blaming the Singapore node.
Client-side knobs that pair with Clash lanes
Routing fixes the path; clients fix parallelism. The Hugging Face ecosystem ships fast-moving knobs—high-performance Xet settings, concurrent range-get limits, library verbosity flags, and cache directories on fast disks—documented upstream rather than duplicated here. Check the current huggingface_hub release notes for the exact variable names in your version, then combine a stable HF_LFS outbound with sensible parallelism instead of endlessly tuning probe intervals alone.
For authenticated private models or higher rate limits, tokens still ride over HTTPS—your proxy must not strip headers or break WebSocket upgrades where APIs use them. Keep corporate SSL inspection in mind: some enterprises decrypt and re-sign TLS inside the LAN, which breaks in subtle ways for large streaming bodies.
DNS alignment for long CDN flows
When Clash synthesizes answers or forwards queries inconsistently with how applications resolve names, domain-based rules may not see the same strings you typed in the browser. If downloads intermittently bypass intended policy groups, capture both the queried name and the eventual SNI from logs. Our Fake-IP versus redir-host article explains the tradeoffs without repeating the entire DNS chapter here; apply the same discipline to storage hosts, not only human-readable Hub pages.
Core headroom and staying current
Object storage endpoints negotiate modern TLS aggressively. Running a current Clash Meta (mihomo) core avoids handshake failures that masquerade as bad rules. The Meta upgrade guide walks through replacing the engine safely on desktop clients. Routing logic still lives in your YAML, but an outdated core should not be the reason you cannot complete a session to a CDN edge.
Open source and where to read more
Clash Meta moves quickly; syntax and defaults evolve. For authoritative behavior, keep upstream documentation and release notes next to your profile. The mihomo repository is the right place for deep issues and advanced examples—separate from day-to-day installer downloads, which we keep on our site for clarity.
Closing thoughts
Hugging Face is a living CDN puzzle: Git LFS negotiation, HTML and JSON from the Hub, and multi-terabyte model weights pulled from storage domains that do not always share a single cute suffix. Treat those flows as distinct intents in Clash: name policy groups honestly, keep storage exceptions ahead of blunt catch-alls, and validate behavior with logs rather than folklore. Compared with generic “proxy everything,” disciplined split routing ages well—when a vendor adds a new object host, you adjust a short block instead of guessing which mega-list swallowed your traffic.
Unlike the npm-and-editor story, the dominant risk here is not localhost debugging loops but missed object-host suffixes and unstable auto selectors during hour-long transfers. Fix those first; then tune clients for parallelism. That pairing is what turns fragile overnight downloads into boring infrastructure.
For match order and providers in depth, continue with the YAML routing guide; for developer registry patterns that complement this piece, see Cursor and npm split routing. Browse the full tech column for more.