Whatsminer M53S+ – Thermal Runaway Shutdown
Critical — Immediate action required
Symptoms
- BTMiner / WhatsMinerTool fault log shows `TEMP_OVER`, `Error 350` (hashboard high-temp protection), or `Error 275` (power overtemp warning) — sometimes all three within seconds on the M53S+ event timeline
- Hashboard temperature on `SM0` / `SM1` / `SM2` climbs above `90 °C` and keeps rising despite firmware throttle attempts — the throttle is not winning
- Coolant outlet temperature hits or exceeds `65-70 °C` while inlet stays nominal — cold plate is dumping heat the loop cannot pull away
- Coolant inlet temperature creeps above the MicroBT spec window — `20-50 °C` Normal mode, `20-40 °C` HPM, `±2 °C` accuracy required
- Inlet/outlet ΔT collapses below `~6 °C` (loop stalled) or balloons above `~18 °C` (flow good but primary-side capacity blown)
- Audible cavitation, whine, or rattle from the primary pump that was not there last week
- Hashrate cliff — chassis drops from `~210 TH/s` to under `100 TH/s` in seconds as the firmware throttles, then to `0` if `TEMP_OVER` trips
- Front-panel LED transitions from steady green → slow amber/red blink → solid red within the same minute
- BTMiner journal repeating `hashboard temp sensor over threshold`, `cooling demand exceeded`, `chip frequency forced down`, or `emergency power cut` lines
- Coolant in the sight glass looks darker, cloudier, or has visible particulate vs commissioning baseline — chemistry has degraded
- Pressure-relief valve at the manifold has vented (wet residue, salt creep, or operator-reported pop) — system pressure exceeded `4 bar`
- Just came out of summer or facility ambient has crept above `35 °C` — cooling-tower / dry-cooler approach temperature is killing loop capacity
- Recently moved, freshly-commissioned, or coolant was topped up — strongly suggests airlock in the cold plate rather than a true thermal failure
- Sister M53S+ or M53S++ in the same loop reports the same event simultaneously — primary-loop fault, not miner-local
Step-by-Step Fix
Drop the miner to `Low-Power` mode immediately via WhatsMinerTool (Remote Ctrl → Power Mode → `Low`) or pull the network cable so the firmware idles. This buys minutes to diagnose without further chip stress. If chips are already past `95 °C`, hit the breaker and let the loop pull residual heat for `5-10 minutes` before re-energising.
Confirm the primary pump is running and not cavitating. Walk to the pump skid; listen for steady hum, not 'marbles in a can.' Verify discharge pressure gauge sits at the normal operating value. If the pump is dead, hit the standby cutover. If neither pump is running, breaker the M53S+ before the loop fully stagnates.
Inspect the sight glass for steady non-aerated flow. Bubbles, foam, or stop-and-start patterns mean air ingestion or a pump suction problem. A solid, steady column means the loop is at least moving.
Verify ambient at the dry cooler / primary heat exchanger is in spec. If facility ambient has climbed (heatwave, AC failure, neighbouring equipment dumping heat), that is the upstream cause. Restore HVAC or add ventilation before re-loading the miner.
Bleed the M53S+ cold plate if the chassis was recently moved, freshly commissioned, or the loop was opened. Crack the outlet quick-connect a quarter-turn with the pump running until non-aerated coolant flows steady for `~30 seconds`. Re-seat. Air in the serial three-section cold plate triggers thermal events on whichever section traps the bubble.
Pull and clean the `100-mesh` main-line strainer. Isolate, drain, inspect — anything older than six months will have debris. If your loop never had a `100-mesh` strainer (or had a coarser mesh), install one before re-energising. Particulate in the cold-plate microchannels is one of the slowest, most expensive failures to undo.
IR-verify all temperature sensors against the dashboard with a `±1 °C` IR thermometer. Shoot inlet and outlet manifolds, cold-plate skin over each `SMx` zone, and the PMIC packages. Any sensor reading more than `5 °C` cooler than IR is lying — stop running the chassis until it is replaced or recalibrated. A lying sensor is more dangerous than an honest over-temp.
Quantify loop flow with a clamp-on ultrasonic flow meter at the M53S+ inlet. Target `≥10 L/min` per chassis, steady. Flow oscillation more than `±15 %` points at pump cavitation or a cycling valve. Hard-low flow points at pump degradation, valve closure, or strainer block.
Measure differential pressure across the miner — inlet pressure tap minus outlet pressure tap. Expected `~30-60 kPa` at rated flow on a clean cold plate. Elevated ΔP means blockage. Document the value before any cleaning so you have a baseline for next time.
Pull a coolant sample (`~250 mL` from the sample port). Original PG25 has a faint pink, yellow, or green tint depending on inhibitor brand. Brown, black, cloudy, or particulate-laden coolant means chemistry is exhausted — drain, flush, refill before this miner runs another full-power shift.
Reverse-flush the M53S+ cold plate if Step 9 showed elevated ΔP. Disconnect from the loop, connect a clean demin-water source to the *outlet* quick-connect, drain through the inlet to a waste container. Use a MicroBT-approved cleaning additive. Repeat until flush water runs clean. Reconnect and re-bleed (Step 5).
Inspect and torque every quick-connect to manufacturer spec. Cross-threaded or partially-engaged connectors cause flow-loss events that masquerade as thermal events. A drop of approved thread sealant or a fresh o-ring on a worn fitting takes a chronic intermittent fault out of your life.
Send a coolant sample for full chemistry analysis — pH, inhibitor reserve, dissolved iron / copper / aluminium, total dissolved solids, microbial count. Local industrial water-treatment lab does this in `48-72 h`. Use the result to schedule full coolant changeout (`12-24 months` for PG25 in well-maintained loops; `6-12 months` for under-inhibited or contaminated).
Service the primary heat exchanger. Plate-and-frame: crack and chemically clean plates, replace gaskets if hardened. Dry cooler: backwash fins top-down with low-pressure water (do not bend the fins), confirm every fan at full RPM, replace any fan with audible bearing wear. Cooling tower: blowdown, basin clean, fill inspection, makeup-water chemistry verification.
Replace the primary pump or its impeller if Step 8 showed cavitation that persisted after suction-side fixes. A cavitating pump ingests bubbles that erode the impeller, which cavitates worse. Spec the replacement for `+25 %` head margin over peak demand, not nameplate match.
Tune BTMiner power mode and frequency targets to match what the loop can actually reject. If primary-side capacity is `±10 %` of rated heat load and HPM operation is `+15 %` over Normal, drop HPM and run Normal year-round. The hashrate cost (`~10-15 %`) is cheaper than one runaway-induced repair.
Update BTMiner firmware to current stable; thermal-management logic gets revised version-to-version. Roll back one version if the runaway started immediately after a firmware update. Document firmware version and date alongside the fault timeline so you can correlate.
Add loop instrumentation if you do not have it: continuous inlet/outlet temp logging, pressure logging at pump discharge and miner inlets, flow-meter logging, coolant-level switch with low-level alarm, leak-detector cable along the floor under the rack. Instrumentation that catches a runaway at minute one instead of minute thirty pays for itself the first event.
Stop DIY when silicon damage is visible, IR confirms a lying sensor and you have no bench fix, runaway recurs within `30 days` of a Tier 1-3 fix, the cold plate is suspected blocked beyond reverse-flush recovery, or the chassis was over-temp for more than `~5 minutes`. The M53S+ silicon is not a forgiving platform — book D-Central hydro service.
D-Central hydro bench: chassis goes onto our calibrated hydro test loop with flow, pressure, and temperature instrumentation across inlet/outlet. We isolate each `SMx` section, IR-map the cold plate at full power, run BTMiner diagnostics with chip-level visibility, and replace any combination of cold-plate gaskets and seals, hydro flow / temp sensors, BM1362 chips, PMIC components, and full coolant flush + refill. Donor BM1362 inventory available for unrecoverable boards.
Ship safely. Drain the chassis fully (residual coolant in transit is fine but expect partial drain). Cap both quick-connects with supplied or industrial blanking caps. Anti-static-bag any pulled board. Double-box. Include a printed note with observed fault codes and timestamps, firmware version, operating mode (Normal / HPM), coolant brand and last changeout date, and your contact — every minute of saved diagnostic time at our bench is repair-cost saved on your invoice.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
