POOL_FAILOVER_LOOP Warning

Pool Failover Instant Fallback Loop Between Primary/Backup

Stratum miner with two configured pools (primary + backup) cycles between them continuously when both are intermittent: primary fails, miner switches to backup, backup fails, miner switches back to primary, repeat. Aggressive failback-on-recovery defaults turn momentary outages into chronic flapping. Hashrate drops 5-15%, rejected shares spike, neither pool sees a sustained connection. Structural fix is a tertiary pool, weight-based routing, and a 600-second failback cooldown.

Warning — Should be addressed soon

Affected Models: All ASIC miners (Antminer S9 / S17 / S19 / S21 series, Whatsminer M30S / M50S / M60S, Avalon 1166 / 1246 / 1346 / 1566, Innosilicon T2T / A10) and all open-source miners (Bitaxe Supra / Ultra / Gamma / GT / Hex, NerdMiner, NerdAxe, NerdQAxe, NerdNOS, PiAxe, StealthMiner) running any firmware that implements Stratum priority-based failover with default failback policy.

Réponse rapide

Symptoms

Miner dashboard or `cgminer -api stats` shows rapid alternation between configured pools — connection bouncing every 30 seconds to 5 minutes
Pool-side dashboards on both configured pools show the worker connecting and disconnecting in a tight loop, never sustaining 15+ minutes
Realized hashrate is 40-70% below nameplate despite chips reporting healthy temperatures and HW% under 2%
Stratum log lines like `Pool 0 alive`, `Pool 0 not responding`, `Pool 1 alive`, `Pool 1 not responding` repeating in cycles
Rejected share rate spikes during the seconds following each pool switch (stale work submitted to the new pool from previous pool's job context)
`mining.subscribe` and `mining.authorize` log lines appear repeatedly without the matching long quiet window of healthy hashing
`Detected new block` notifications come through out of order relative to wall-clock — block hashes alternating between two pools' views
On Bitaxe / NerdMiner / open-source firmware: `stratumURL` field flips between the two configured URLs every API refresh
Backup pool — configured `for emergencies` and not expected to take real traffic — now showing the same hashrate share as primary
Both pools report normal status independently when checked from a different connection (pools aren't down — your path to them is intermittent)
Total payout this period is materially lower than your hashrate-weighted expectation, with shares scattered across both pool accounts
Switching the miner to a third unconfigured pool exclusively eliminates the symptoms — confirms failover-policy loop, not a pool-side problem

Step-by-Step Fix

Open the miner's pool configuration UI and add a tertiary pool. Most firmware exposes pool1 / pool2 / pool3 (some up to 8). Pick a third pool with a different operator and a different network path — Ocean, F2Pool, ViaBTC, public-pool.io, solo.ckpool.org, or your self-hosted node, depending on your strategy. The third pool gives the failover state machine somewhere to actually rest. Save, reboot, observe 30 minutes — the loop should immediately quiet down because the miner now has a stable target when both original pools are flapping.

Verify pool order matches your strategy. Pool 1 = your preferred (highest weight, lowest latency, best fee). Pool 2 = same model as pool 1 (don't mix solo with FPPS unless intentional). Pool 3 = catch-all you'd be okay running on for a week if both top pools went down. Reboot the miner. Confirm the dashboard shows pool 1 as the active connection within 60 seconds of boot.

Hard power-cycle the miner — 30 seconds off at the breaker, then back on. Clears any wedged Stratum task state from the previous loop and confirms the new config takes effect cleanly. Watch the logs for the first 5 minutes after boot — you want to see one clean `mining.subscribe` succeed, then long quiet runs of `mining.notify`, not a rapid sequence of subscribe / authorize / disconnect / reconnect.

Verify each configured pool is reachable. From a laptop on the same network: `nc -zv pool-host 3333` for each (substitute correct port). All three should return `succeeded`. If one fails, that pool is wrong for your network path — replace with a different operator or different region. Run the test three times over 15 minutes to catch intermittent pools.

Disable any unused pool slots. Some firmware retains stale configurations from previous testing — if pool4 through pool8 have leftover URLs, blank them out completely. Confirms the failover state machine isn't probing dead URLs and adding noise to its decision-making.

Set a failback cooldown / minimum dwell time on firmware that exposes it. On DCENT_OS, Braiins OS+, LuxOS, Vnish, NerdNOS: look for `failback_delay`, `min_dwell_time`, or `pool_settle_time`. Set to 600 seconds (10 minutes). This means once on a fallback pool, do not failback to a higher-priority pool until it has been continuously available for 10 minutes. This is the single most effective software fix for the loop. Stock Bitmain / MicroBT firmware doesn't expose this — see Tier 3 for the firmware flash.

Enable TCP-level keepalives. Stratum has no protocol-level keepalive, so client-side `SO_KEEPALIVE` is the only defence against silent NAT eviction. On firmware that exposes it, set `tcp_keepalive_idle = 60`, `tcp_keepalive_interval = 30`, `tcp_keepalive_count = 5`. The miner now sends a TCP-level probe every 60 seconds of idle, keeping the NAT entry warm and detecting dead connections in 60-150 seconds instead of 300+.

Tune router NAT table. Web UI of router → Advanced → NAT or Connection Tracking. Increase `tcp_established` timeout from default (often 60-600 s) to 7200 s. Increase max NAT entries if your router exposes the setting. If multiple miners share the router, check connection-tracking table size against actual count of miner sessions × 3 pools each. ISP-supplied gateways are often the bottleneck — consider a Mikrotik or OPNsense replacement (see Tier 3).

Switch to weight-based load-balance routing instead of strict priority. cgminer: `--load-balance` plus per-pool `--quota` settings. Braiins OS+ / DCENT_OS / LuxOS: `balance` mode in pool config, with weights (e.g. 70/20/10 for primary/backup/tertiary). This eliminates the loop structurally — every pool is always in use, no failover transition. Trade-off: variance increases slightly, especially if pool 3 has a different fee/payout model. Right answer for chronic-flap environments where uptime matters more than purity.

Add explicit per-pool authorization headers. If your pool requires worker-specific authorization, ensure each of your configured pools has the right worker name (e.g. `wallet.worker1` for pool 1, possibly different format for pool 3). A failed `mining.authorize` looks identical to a failed `mining.subscribe` in some firmware logs and contributes to false-positive failovers. Verify by tail-watching the log for one full pool transition cycle.

Antminer only: flash DCENT_OS for full failover-policy controls. Stock Bitmain firmware gives you exactly one failover knob (pool order). DCENT_OS — D-Central's open-source Antminer firmware — exposes the full set: failback cooldown, minimum dwell time, weight-based load balancing, TCP keepalives, per-pool keep-warm probing. Built by Mining Hackers, fully open-source, no licensing. Flash, configure 3 pools with weights 60/30/10 and a 10-minute failback cooldown, reboot, observe 30 minutes. Loop should be gone. Alternatives: Braiins OS+, LuxOS, Vnish — all expose the same controls. Stock Bitmain / MicroBT does not.

Whatsminer / Avalon: use vendor-tool failover settings where they exist. MicroBT btminer firmware exposes a partial set of failover knobs through the BTMiner app and `btminer.conf`. Avalon AvalonMiner has a similar but more limited config. Both are less flexible than DCENT_OS / Braiins OS+ / LuxOS — but those firmware projects don't currently support Whatsminer / Avalon hardware (DCENT_OS Whatsminer/Avalon support is on D-Central's roadmap, not shipping today). Use what's there: set `pool_keepalive=true` and `pool_failover_minimum_dwell=600` if exposed.

Bitaxe / NerdMiner / open-source: read the upstream issue tracker. ESP-Miner #1618 on GitHub is the canonical thread for failback policy on Bitaxe. NerdNOS handles failover differently and is generally more configurable. If you're seeing a failover loop (not stickiness — see the related Bitaxe Fallback error) on a Bitaxe specifically, your two pools are flapping faster than ESP-Miner's settle window — the structural fix is to add a tertiary or to switch to NerdNOS firmware where supported on your hardware variant.

Run a watchtower script. A small script on a Raspberry Pi, Home Assistant, or any always-on machine: every 60 s, poll each miner's `/api/system/info` (or cgminer API) and snapshot `current_pool`. If the last 5 snapshots show ≥4 distinct pools, trigger a `POST /api/system/restart` and log an alert. Cap restart attempts at 3 per hour to prevent restart-loops on top of failover-loops. This is reactive, not preventative — but catches loops you didn't predict, and provides the data for tuning your structural fix.

Run a stratum proxy in front of all miners (fleet-scale). Braiins Farm Proxy, ckpool's `ckproxy`, or stratum-proxy let you configure failover policy centrally — your miners point to one proxy, the proxy handles all upstream failover with whatever logic you write. Trade-off: extra moving part, single point of failure if the proxy itself goes down. For fleets of 5+ miners this is the right architecture; for a single home miner, Tier 1 + Tier 2 alone is enough. Run the proxy on a small Pi 4 or thin-client mini-PC; configure 3 upstream pools with weights.

When to stop DIY: Tier 1 + Tier 2 + Tier 3 deployed correctly and the loop persists, AND you've proven via single-pool testing that each pool sustains independently. At this point you have a network-path issue (ISP, DNS, NAT) beyond pool config. Open a D-Central support ticket with: 30-minute log capture from the affected miner(s), NAT table size of your router, router model, ISP name, list of pools tested, single-pool test results. We'll triage the upstream cause.

Pool-strategy consultation: if the failover loop revealed your pool selection doesn't match your mining strategy (e.g. solo-to-FPPS mixed without intent), book a Mining Consulting session. We'll structure pool selection, weights, and failover policy against your actual revenue and variance goals — Bitcoin maximalist, Canadian-power-cost-aware, no corporate filler. Typical engagement: 1-2 hours, $150-$400 CAD, output is a written pool-strategy spec you can hand to whoever runs your fleet.

Firmware flash service: if you want DCENT_OS deployed on an Antminer fleet but don't want to flash yourself, D-Central offers firmware flash service alongside ASIC Repair. Drop the controllers off (or ship them in for fleet operators), pick them up flashed and configured. Typical turnaround 5-10 business days. Pricing per controller — check the ASIC Repair service page or contact us for fleet quotes.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

All Repair Services

Foire aux questions

What does the POOL_FAILOVER_LOOP error mean?

Stratum miner with two configured pools (primary + backup) cycles between them continuously when both are intermittent: primary fails, miner switches to backup, backup fails, miner switches back to primary, repeat. Aggressive failback-on-recovery defaults turn momentary outages into chronic flapping. Hashrate drops 5-15%, rejected shares spike, neither pool sees a sustained connection. Structural fix is a tertiary pool, weight-based routing, and a 600-second failback cooldown. Commonly reported on: All ASIC miners (Antminer S9 / S17 / S19 / S21 series, Whatsminer M30S / M50S / M60S, Avalon 1166 / 1246 / 1346 / 1566, Innosilicon T2T / A10) and all open-source miners (Bitaxe Supra / Ultra / Gamma / GT / Hex, NerdMiner, NerdAxe, NerdQAxe, NerdNOS, PiAxe, StealthMiner) running any firmware that implements Stratum priority-based failover with default failback policy..

Can I fix the POOL_FAILOVER_LOOP error myself?

This is a moderate repair that needs some hands-on ASIC experience and the right tools. Start with: Open the miner's pool configuration UI and add a tertiary pool. Most firmware exposes pool1 / pool2 / pool3 (some up to 8). Pick a third pool with a differen... If you are not equipped for board-level work, D-Central can diagnose and repair it at our Laval bench.

How much does it cost to repair?

Many cases are a free or low-cost DIY fix; a full bench repair runs up to $450 CAD. D-Central can diagnose and quote it.

Key Terms in This Fault

Jump to the full definition of the technical terms involved in this fault:

Hashrate PSU Firmware Nonce Mining pool

Related Error Codes

Own your firmware — DCENT_OS (Antminer first)

DCENT_OS is D-Central’s open-source, GPL-3.0 firmware effort, now in public beta on Antminer (SHA-256) hardware — signed S9 and S19j Pro (Zynq/XIL) images are free to download. It is experimental and not production-ready. We build on the shoulders of the open-firmware projects that came before us, and we are starting with Antminer before widening hardware support. If you run Antminer gear, or just want firmware you can fully own and audit, grab the beta image. This is a free public beta, never a pre-order — collection only, we will not email you anything else yet.

I agree to D-Central storing my email to contact me about this. See our privacy policy.

Printable quick-reference cards

ASIC Miner Error-Code Quick-Reference Card — print-to-PDF one-pager
Stratum Share-Rejection Error Decoder Card — print-to-PDF one-pager
ASIC PSU & Connector Pinout Quick-Reference Card — print-to-PDF one-pager

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.

ASIC Repair Services Browse All Errors

Pool Failover Instant Fallback Loop Between Primary/Backup

Symptoms

Step-by-Step Fix

When to Seek Professional Repair

Foire aux questions

Key Terms in This Fault

Related Error Codes

ASIC Miner - Mining Pool Authentication Failed

ASIC Miner - Frequent Pool Disconnections

Antminer - Stratum V2 Connection Issue

Antminer S19 - Cannot Connect to Pool

Antminer S19 - High Rejected Share Rate

Bitaxe - Pool Rejected Shares High Rate

ASIC Miner - DNS Resolution Failed

ASIC Miner - VPN Interfering with Pool Connection

Own your firmware — DCENT_OS (Antminer first)

Printable quick-reference cards

Still Having Issues?

Produits, réparations et guides connexes