Whatsminer M30S – Temperature Too High
Critical — Immediate action required
Symptoms
- WhatsminerTool / web UI or miner.log displays TEMP_OVER or numeric codes 350, 351, 352, 600, 610, 233-235, or 243-245
- Miner enters protection mode (auto-shutdown or locked in down-clocked state) and refuses to restart until the code clears
- LED indicator flashes red (slow-red on environment/sensor; fast-red on genuine hashboard overheat)
- Hashrate drops to zero after the trip, or falls 5-20% below nameplate (86-112 TH/s M30S family) in the minutes before the trip
- MinerTool shows temperature errors with SM0/SM1/SM2 readings climbing above 85 C or environment reading above 40 C
- Fans ramp to 100% duty (all four 12038-class axials at ~6000 RPM) audibly in the seconds before shutdown
- power.log shows PSU internal temperature climbing above its sensor threshold (codes 243/244/245)
- Per-hashboard temp spread is wide - e.g. SM0 at 68 C, SM2 at 92 C (one board is the culprit)
- Error only appears at specific times of day (afternoon/evening) - ambient-tracking or grid-voltage-sag fault
- Intake air feels cool by hand but miner still reports Environment temp high (button-board sensor lying, not real heat)
- Error recurs within minutes of restart with no change in ambient - sensor drift, paste failure, or PSU thermal loop
- Visible dust mat on intake grille, packed-dust insulation in heatsink fins, or burnt-dust smell on first power-up
- M30S has been running stock firmware 18+ months with no paste refresh (typical paste degradation window on M30 class)
Step-by-Step Fix
Hard-cut power at the PDU or breaker for 60 seconds (not a soft shutdown) and let the miner cool for 15 minutes. A hard cut discharges the sensor bus pull-up network and clears any wedged I2C state, which is the root cause in roughly 15% of TEMP_OVER tickets after a firmware upgrade or grid brownout. While you wait, confirm the intake grille has at least 15 cm of clearance in front of it and nothing is recirculating exhaust back into the intake - a common home-rack mistake. When you power back up, the code should either be gone (wedged state) or return fast (real fault).
Measure intake air temperature with an IR thermometer aimed at the front grille - not the room, not the hallway. Target is 30 C or below for high-performance mode on M30S and 35 C or below for normal mode. Record the number so you can compare it against what the miner reports in step 5 to catch a lying sensor. If real ambient is over 30 C, fix the room first: improve ventilation, add intake air, duct the exhaust, or consider relocation before you touch silicon. Canadian basements are usually fine in winter; summer attics never are.
Pull the exact error-code payload from WhatsminerTool (Remote Ctrl > ExportLog). Note whether it is 600/610 (environment sensor), 350-352 (hashboard SMx), or 233-245 (PSU-side). The code identity tells you which subsystem to chase - do NOT assume all temperature codes mean the same thing. Screenshot the status page before restart, because some codes clear on reboot and hide the real diagnosis. Different codes route to different fixes below; skipping this step is how you end up replacing a PSU to fix a $30 button-board sensor.
Reboot once and watch the first 10 minutes. Via web UI or a 30-second power cycle. If the code returns in under 2 minutes, the trigger is still present and you should skip ahead to Tier 2. If it runs clean for 10+ minutes then trips again, you are chasing a thermal-cycling or warm-up fault - paste degradation, fan bearing drag, or ambient-tracking - and you should continue with Tier 2 anyway, just with less urgency.
Cool-room-hot-code sensor sanity check. If the IR thermometer reads 22 C at the intake and the miner reports environment temp 55 C, you have a lying sensor. Do not chase airflow, do not repaste hashboards, do not replace the PSU. The environment sensor on M30S lives on the button board at the front of the chassis - a small 2-screw PCB carrying a TMP75-class I2C sensor - and it is the cheapest, most-overlooked repair part in the Whatsminer ecosystem. Confirm the pattern, then skip to step 9 in Tier 2.
Firmware version audit. Read the running firmware version string from the status page (format 20XXXXXX.XX). If you flashed forward recently, post-20230101 builds tightened the TEMP_OVER escalation threshold - readings that used to be warning-only now hard-shut. Roll one authorized build back but only within the same M30 hardware revision (M30V1 / M30V2 / M30V3 - verify from the sticker inside the chassis and WhatsminerTool before flashing). Never cross-flash hardware revisions. Never flash custom firmware - anti-tamper codes 100001-100003 will fire and can corrupt sensor-driver state.
Power down at the PDU, open the chassis, blow out every dust trap with compressed air in short bursts from 30 cm away. Go across the intake grille, then through the heatsink fin stack from the intake side pushing dust OUT the exhaust - never reverse-blow, which packs dust deeper into the fin pack. Do the fan blades, the PSU intake vent, and the button-board area. If you find a dust mat thicker than 1 mm anywhere, you just found your root cause. Wipe stubborn grime with 99% isopropyl alcohol on a lint-free wipe - never paper towel, which contaminates joints with fibres.
Fan audit with multimeter and manual spin. Power off. Hand-spin each of the four 12038 axial fans - equal resistance on all four, no bearing grind, no wobble, no blade damage, no cracks, no melted edges. Power on and compare fan0-fan3 RPM readings at the same duty command from the API. Target within 10% of each other. A fan running 20% below its pair under the same duty command is dying - swap it before chasing deeper faults. Fan failure is the single most common real-thermal cause on M30 units older than 24 months.
Reseat the button-board ribbon, then swap the button board if reseating does not fix it. Two screws on the front bracket release the button board. Pull it, inspect for dust, physical damage, cracked traces, or a scorched SOT-23-sized sensor IC. Reseat the ribbon firmly at both ends (listen for the click), wipe connector pins with 99% IPA if grime is visible, reinstall. Power up. If the reported environment temp now matches your IR thermometer within 2 C - fixed. If the reading is still wrong, order a replacement button board (sub-$30 CAD aftermarket, 3-minute swap). This resolves the majority of cool-room-hot-code tickets.
Flash the latest authorized MicroBT firmware for your exact hardware revision. Verify M30V1 / M30V2 / M30V3 from the status page and the sticker inside the chassis BEFORE flashing - wrong-rev binaries brick the control board and are a cause of TEMP_OVER on otherwise healthy hardware. Use WhatsminerTool's official firmware workflow, not a third-party tool. Post-flash, run a 30-minute baseline at nameplate hashrate and log whether TEMP_OVER returns. If the flash itself fails, stop - that is a Tier-4 recovery-mode ticket.
Swap the hot hashboard to a known-good slot to isolate the fault. Power off, label slots 0/1/2 with tape, move the SMx-implicated board to a different slot, power up, observe logs for 10 minutes. Code follows the board = sensor, paste, or chip fault on that specific board (continue to Tier 3 paste refresh). Code stays in the slot = adapter-plate channel, control-board I2C channel, or the ribbon for that slot (continue to Tier 3 electrical). This is the single most diagnostic move for localized hashboard codes and it costs nothing but 15 minutes.
Under-load PSU rail check for codes 243-245. Multimeter on DC, probe at the PSU output under full hashing load. Expect steady DC within the M30-class tolerance band - MicroBT does not publish per-model tolerances (a documented gap). If output sags under load, the PSU is tiring and feeding heat back into its own sensor, which produces a phantom thermal code while the real fault is electrical. About 25-30% of PSU-side TEMP_OVER tickets in the D-Central repair queue trace to an aging PSU. Also verify line voltage at the wall under load - residential afternoon sag can push a marginal PSU over the edge at specific times of day.
Thermal paste refresh on the hot hashboard. Remove the heatsink clamshell (Phillips #2 and Torx T10 depending on chassis revision). Clean old paste with 99% IPA on lint-free wipes - never paper towel. Inspect thermal pads on PMIC and voltage-domain ICs: crumbled, cracked, or dried pads need replacement with thickness-matched Gelid GP-Extreme pads. Apply a uniform thin layer of Arctic MX-6 or Thermal Grizzly Kryonaut on each ASIC using the rice-grain-sized centre-dot method - excess paste insulates and makes things worse. Reassemble with consistent clamshell torque. Run 20 minutes and re-check the SM spread.
Copper busbar torque check for codes 233-245. Power off, disconnect input. Insulated torque wrench on the M2.5 M30-class busbar hardware - community consensus 4-5 N.m (MicroBT publishes no official spec). A loose busbar creates a high-resistance joint that dumps heat exactly where the PSU-input sensor sits, producing phantom thermal codes when the real fault is electrical. IR-scan each busbar junction under load after retorque. A lingering hot spot means the copper is pitted and the lug needs replacement - that is a Tier-4 call.
I2C bus health check on the adapter plate. Clip a logic analyzer (Saleae Logic 8 or a sigrok-compatible FX2 clone) to adapter-plate test points for SDA and SCL. Healthy bus: clean ~3.3 V idle-high, sharp falling edges, no chatter between transactions. Stuck bus: SDA or SCL held low by a dead or failing slave. Isolate by unplugging each slave in turn and re-scoping - whichever slave releases the bus when removed is the culprit. This is the real diagnostic behind a flat-identical-readings-across-all-chips log pattern.
Sensor IC reflow or replacement. If a specific hashboard-resident sensor is drifting, a hot-air reflow is reasonable: preheat bottom to ~150 C if you have a preheater, then top-side hot air at 280-310 C for ~15-20 seconds over the SOT-23-sized sensor package. Cool naturally, retest with a thermocouple reference. If reflow doesn't restore function, replace outright - TMP75 / TMP112 parts are Mouser/Digi-Key stock, $2-$5 CAD per part. Match the exact package from the silkscreen or a good photo of a donor board. Hot-air rework station, tweezers, flux, patience.
Stop DIY and book D-Central repair when any of these are true: per-board SM spread stays over 10 C after a full paste and pad refresh (chip-level fault under the heatsink), PSU shows internal burn or capacitor bulging (stop before damaging adjacent parts), TEMP_OVER returns within 48 hours of clean paste refresh with verified busbar torque (PCB-level damage), firmware flash went sideways and the miner is stuck in recovery or bootlooping, or IR scan shows a hot spot at a busbar lug that persists after proper retorque (copper pitted, lug replacement required). Book at https://d-central.tech/services/asic-repair/
D-Central bench process for M30-class TEMP_OVER tickets: per-hashboard isolation on a test fixture with programmable load, SMT chip probing to localize failing chip positions, graded salvaged-grade M30-class parts from stock, full paste and pad refresh, PSU internal inspection and component-level repair where possible, busbar lug replacement when needed, and a 24-hour burn-in at nameplate hashrate before ship-back. Ship the whole unit or just the suspect board + button board in anti-static bags, double-box with 5 cm of foam, include a note with observed symptoms, error codes, firmware version, power environment (120 V / 208 V / 240 V), and your contact. Saves diagnostic time, which saves you money on the repair invoice.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
