Whatsminer M50S++ – Firmware Boot Hang On Splash
Warning — Should be addressed soon
Symptoms
- Web UI at `http://<miner-ip>` is unreachable or shows a stale dashboard frozen on a timestamp 5+ minutes old
- WhatsminerTool reports the miner as `Offline` or `Timeout` even though the miner pings and SSH responds
- BTMiner API calls (`get_miner_status`, `get_error_code`) hang with no response or fail with `connection refused` on port `4028`
- Hashrate on the pool side flatlines to zero within 1-3 stratum windows after the freeze
- `dmesg` shows entries like `BUG: scheduling while atomic`, `Kernel panic - not syncing`, or `Out of memory: Killed process … btminer`
- `top` shows `btminer` RSS climbing steadily over days/weeks — starts ~150 MB, climbs to 350-500+ MB before hang
- `/var/log/messages` or `/var/log/btminer.log` ends mid-line with no graceful shutdown entry
- Watchdog timeout in `/proc/sys/kernel/watchdog_thresh` is at default `60`s but the system never reboots after the hang
- After a manual hard power-cycle the miner boots clean and runs at nameplate for hours-to-days before hanging again
- Pattern: hangs cluster at the same time of day, after the same elapsed runtime, or after specific stratum events
- Control-board LED stuck solid green (kernel up, BTMiner dead), slow amber (watchdog tripped but reboot stalled), or off (full lockup)
- Fans either stay at the last commanded RPM, ramp to 100% (control loop crashed open), or hold steady forever — they do not drop to idle
Step-by-Step Fix
Hard power-cycle at the breaker for 60 seconds. Do not use the soft reset on WhatsminerTool — that command goes to the same daemon that is hung. The breaker forces a complete control-board cold boot, which clears DDR cache, releases all kernel resources, and gives BTMiner a clean slate. Wait for full power discharge (control-board status LED dark for at least 20 seconds) before re-energizing. Confirm hashrate returns to nameplate within 7 minutes.
Schedule a cron reboot at low-revenue hours. SSH to the miner and run `crontab -e`. Add the line `30 3 * * * /sbin/reboot` and save. This reboots the miner at 03:30 local time daily, costing roughly 5-6 minutes of hashing — under 0.5% of daily revenue. On stock builds older than `20240315`, this single workaround prevents most hang events. Verify with `crontab -l` and check `/var/log/messages` after the first scheduled reboot to confirm it ran clean.
Confirm BTMiner is the latest build for your hardware revision. Web UI → System → Version → record `fwver` and `hwver`. Cross-reference against MicroBT's firmware matrix on the WhatsminerTool download page. If you are more than two trains behind, schedule an update for after your next clean reboot — never update a hanging miner mid-hang. The wrong update on the wrong revision will brick the control board's flash.
Disable any aggressive overclock or custom power profile. Web UI → Mining → Power Mode → set to `Normal` (not `High` or `Performance+`). Aggressive profiles increase PSU communication frequency and chip-side I2C traffic, both of which provoke the rare-but-real driver deadlock paths. Run at stock for 7 days; if the hangs stop, your tuning was provoking the bug. Tune back up slowly with a 24-hour stability window between steps.
Verify ambient and PSU rail under full load. IR thermometer at the intake grille (target ≤ 30 °C); multimeter on the PSU output rail under full hash (target ≥ 13.8 V sustained on a stock M50S++). Thermal stress and PSU sag do not cause the hang directly, but they accelerate the conditions (chip retries, I2C error storms) that trigger it.
Capture a hang in real time. Set up a watcher from your laptop running every 60 seconds: poll `get_miner_status` on port 4028 with a 5-second timeout and log any failure with a timestamp. When the watcher logs a hang, immediately SSH (from a persistent screen/tmux session) and capture `dmesg | tail -100`, `top -b -n 1 | head -20`, `cat /proc/meminfo | head`, `ps aux | grep btminer`, `cat /var/log/messages | tail -200`. Save everything before the watchdog reboots — the data is gone after.
Update firmware to the latest train for your hardware revision. Download the correct `.bin` from MicroBT's firmware matrix (V1 boards → V1 train, V2 boards → V2 train). Use WhatsminerTool's batch-upgrade feature, not the legacy single-miner CGI page (the CGI path has been observed to time out leaving the miner in a half-flashed state). Verify post-upgrade: `get_version` returns the new `fwver`, `get_miner_status` returns `Mining`, hashrate returns to nameplate within 10 minutes.
Lower watchdog timeout for faster auto-recovery. SSH and edit `/etc/sysctl.conf` adding the line `kernel.watchdog_thresh = 30`, then reload with `sysctl -p`. This halves the time between a hang and a watchdog-triggered reboot, cutting dead-hash-time accordingly. Does not fix the hang, but reduces revenue impact while you work on the root cause. Confirm with `cat /proc/sys/kernel/watchdog_thresh` returning `30`.
Quantify the memory leak. Run a 5-minute polling script that records `ps -o rss= -C btminer` to a CSV. Run for 72 hours minimum. Plot the RSS curve. Healthy: stable around `~150-250 MB`. Leaking: monotonic climb of `+5-20 MB/hour`. If RSS hits `400+ MB` on a 512 MB board or `300+ MB` on a 256 MB board, an OOM event is imminent. Confirmed leak means firmware update is the real fix; cron-reboot is the workaround until then.
Reseat all hashboard data and power connectors. Power off at the breaker. Open the chassis (Phillips #2 + Torx T10). Pull each hashboard half-out, inspect the data-side connectors for oxidation or bent pins, reseat firmly until the click. Pay special attention to the I2C / SMBus pins — those are the ones that, when intermittent, provoke the kernel deadlock path. Same for the PSU communication harness — a wiggly connector here is a low-probability but real hang trigger.
SD-card recovery flash. If the hang correlates with a botched firmware update or you suspect flash corruption: download the official recovery image for your hardware revision, write to a microSD with `dd` or BalenaEtcher, insert into the control board's SD slot, hold the recovery button (or short the recovery jumper — varies by board rev) while powering on. The board flashes itself from SD over 5-10 minutes; status LED transitions through green-amber-green when complete. Power off, remove SD, power on. The miner should boot clean from internal flash.
Hashboard chip reseat / reflow if I2C errors point at one board. From hashboard isolation testing, if you have isolated the hang to a specific board, pull it on the bench. Inspect every BM1398 chip BGA edge under magnification — look for cracked solder fillets at corner balls (most common failure point). Marginal chips that pass continuity but have intermittent I2C connectivity are the trigger. Reflow with preheat (`150 °C` bottom) + hot air (`310-330 °C` top, `30 s`). Apply fresh thermal paste, reassemble, reinstall, observe for 72 hours.
Roll firmware back if the latest train introduced a regression. Rare but real — MicroBT has shipped builds that introduced hang behaviour on specific hardware revisions. If your hangs started immediately after a recent firmware update, roll back one train and observe for 7 days. Use WhatsminerTool's downgrade path (some builds have signature checks that block downgrade — if so, SD recovery is the fallback).
Fleet-side: implement reboot-on-hang automation. If you operate more than ~5 M50S++ units, build a polling daemon on your management host that runs `get_miner_status` against each miner on a 60-second loop and triggers a remote power-cycle (smart PDU API) on any miner that fails to respond for 3 consecutive polls. This catches hangs before the hardware watchdog needs to (often 5-10 minutes faster) and gives you a hang-event log per miner for trending.
Inspect for PSU firmware mismatch. The M50S++ PSU has its own firmware (typically P21E or P221B, depending on production batch). MicroBT periodically ships PSU firmware updates that fix communication-link bugs. WhatsminerTool exposes PSU firmware version and update path. If your PSU firmware is more than 18 months old, update it as part of the same maintenance window. PSU-firmware update botches are themselves a brick risk — verify the update window is uninterrupted.
Stop DIY when latest firmware on the correct hardware revision still hangs within 48 hours, you have reseated all hashboards and the I2C error pattern in `dmesg` persists, SD recovery flash succeeded but the miner re-hangs within a week, the control board fails to boot at all after a flash attempt, or you see capacitor bulging, burnt traces, or any visible damage near the SoC, DDR, or PMIC. Book a D-Central ASIC Repair slot.
D-Central bench process: SoC-side full firmware re-flash with engineering builds, JTAG-level kernel debug if available for that board revision, capacitor and PMIC bench-test under load. Hashboard-side: fixture-test each board with the M-series test rig, isolate marginal chips by per-position load test, replace or reflow as appropriate. Burn-in: 48-hour nameplate hash test post-repair, with hang-watchdog automation running — if it hangs in the 48 hours, the repair is not shipped.
Ship safely. Pack the miner in its original M50S++ foam if you have it; otherwise double-box with at least 5 cm of foam on every side. Hashboards in anti-static bags if you ship boards only. Include a note with: firmware version and `hwver` strings, pattern of hangs (every X hours, after Y events, etc.), what you have already tried, and whether you have captured `dmesg` from a hang event (attach the log file or include it on a USB stick). The diagnostic detail you ship with the unit is directly proportional to how cheap the repair is.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
