Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

ICERIVER_BOARD_COLD Warning

IceRiver Single Hashboard 0 Hashrate + Cold (Dead Board Test)

Single hashboard reports 0 hashrate while other boards run normally, AND the affected board's chip surface stays at or near ambient room temperature instead of the healthy 50-60 C band. Cold + zero hashrate is the cleanest single-board failure signature on multi-board IceRiver KS hardware: chips are not drawing power. Three buckets: PMIC dead / on-board 12 V rail collapse, controller-to-board cable failed (data ribbon or 12 V power lead), or hashboard data input failure on the board itself. Differs from hot-but-zero-hashrate where chips draw power but data path is broken.

Warning — Should be addressed soon

Affected Models: IceRiver KS3, KS3L, KS3M, KS5, KS5L, KS5M (all multi-board KS chassis)

Symptoms

  • Web UI per-board hashrate readout shows one board at `0 TH/s` while other(s) read at or near nameplate (`~4 TH/s` per board on KS5L, `~5 TH/s` per board on KS5M)
  • IR thermometer or thermal camera shows the dead board's chip surface within `5 C` of ambient while healthy boards sit at `50-60 C`
  • Dashboard reports per-board `Temp2` on the dead board as `0`, garbage (`-1`, `-10`), or sub-`30 C` while ambient is `~22 C`
  • Chip count for that one board reads `0` while other boards report a full chain
  • Total rig hashrate sits at the expected fraction of nameplate (e.g. `~8 TH/s` on a KS5L with one of three boards down)
  • Power draw at the wall is reduced by roughly the share that one hashboard normally pulls (e.g. `~2,200 W` on KS5L instead of `~3,400 W`)
  • Fans on the affected board run at normal RPM but the heatsink stays cold to the touch even after `30+ minutes` of operation
  • Issue began suddenly — clean transition from `4 TH/s` to `0 TH/s` on that board, often after a power event or recent maintenance cycle
  • Visual inspection: no smoke, no scorched solder mask, no swollen capacitors, no burnt-insulation odor — the board looks fine, just cold and silent
  • Miner log shows `chip count 0`, `asic init fail`, `i2c read fail`, or `spi timeout` lines on that one chain only
  • Front-panel LEDs (`D2` / `D3`) show partial init — most boards reach ready state, but one chain never completes enumeration
  • Slot-swap test: dead board stays cold in a known-good slot AND a known-good board goes cold in the suspect slot (controller-side data lane / cable fault)

Step-by-Step Fix

1

Hard power-cycle from the rear rocker for `30 seconds`. Not a soft reboot through the dashboard — pull mains power, wait `30 seconds` for caps to bleed, restore mains. The IceRiver miner daemon can wedge after a brownout, surge, or firmware update and refuse to enumerate one of the hashboards even though the hardware is fine. A clean cold start clears wedged state. About `8-10 %` of cold-board reports clear at this step alone.

2

Take an IR-thermometer reading on each hashboard's chip surface after `5 minutes` of operation. Confirm the thermal differential matches the symptom — one board cold (within `5 C` of ambient), the others at `50-60 C`. If you don't have an IR thermometer, a careful hand check works: healthy boards' heatsinks are uncomfortably warm, the dead board's is room temperature. Document which slot is cold before going further — this is the most important data point for the rest of the procedure.

3

Power off at the breaker, wait `30 seconds`, pull the chassis lid. Locate the cold board by slot. Visually inspect both the data ribbon and the `12 V` power lead serving that board for: a connector visibly partially out of seat, a lead with discolored or melted insulation near the connector, a bent or scorched pin, or any obvious mechanical damage from shipping or recent maintenance. If you see something visibly wrong, that's almost certainly your fix.

4

Reseat both the data ribbon and the `12 V` power lead on the dead board only. Disconnect each, inspect contacts for blackening / oxidation / bent pins, reconnect firmly until you feel and hear the positive click. Don't force it — IceRiver receptacles will let you mis-orient connectors if you push hard enough. Vibration and thermal cycling walk pins, and a contact with `< 100 %` mating area passes idle traffic but fails enumeration timing — board never wakes, stays cold.

5

Reboot and wait `5 minutes`, then recheck thermal and the dashboard. Fans should ramp on the affected board within `60 seconds` of boot if it's healthy. The chip surface should reach `~50 C` within `5-10 minutes`. The dashboard per-board hashrate should populate to `~4 TH/s` (KS5L) or per-model nameplate within the same window. If the board warms up and starts hashing, you're done — keep an eye on it for `48 hours`. Recurring events on the same slot = replace the cable next time.

6

Probe `12 V` at the dead hashboard's input pads under load. Power on with the chassis lid still off (carefully — fans spinning, cables exposed). Multimeter on DC, `20 V` range. Positive on the hashboard's `12 V` input pad (PCB silkscreen labeled, usually a wide trace), ground on chassis. Watch the meter during the first `60 seconds` after boot. Healthy: ≥ `11.8 V` sustained. Failing: rail droops below `11 V`, collapses to `0 V`, or oscillates. Missing entirely (`0 V`) — `12 V` isn't reaching the board.

7

Probe `12 V` at the controller-side end of the suspect cable. If `12 V` is missing at the board input, follow the cable back. Probe `12 V` at the controller's output connector (positive on the wire that goes to the dead board, ground on chassis). Healthy: same `≥ 11.8 V` you measured at a working board. Missing on controller side too → controller / PSU fault on that channel. Present on controller side but missing at board side → cable fault, break in the wire or high-resistance connector.

8

Cross-swap the `12 V` power lead with a known-good lead from a healthy board. Power off. Pull the `12 V` lead from a known-good board. Pull the dead board's lead. Swap them: known-good board now uses the suspect lead, dead board uses the known-good lead. Reboot, watch thermal and dashboard for `10 minutes`. Fault follows the cable → bad cable, replace (`$15-30 CAD`). Fault stays with the dead board → cable fine, problem is at the board's `12 V` input or beyond.

9

Cross-swap the data ribbon similarly. Same procedure, different cable. Power off. Swap data ribbons between the dead board and a known-good board. Reboot, watch dashboard for chip-count populate and per-board hashrate. Fault follows the data ribbon → bad ribbon, replace. Fault stays with the dead board → data ribbon fine, the dead board's data input or PMIC is the fault.

10

Slot-swap the dead board into a known-good slot. Power off. Move the dead board into a slot that previously had a working board. Move a known-good board into the formerly-dead slot. Reboot. Fault follows the board → bad board, Tier 3 / 4 territory. Fault stays with the slot → controller / cable / connector for that slot is the fault, escalate to Tier 4. Most decisive test in the entire diagnostic tree — `5 minutes`, zero parts, definitive answer.

11

Scope `clk` and `data` lines on the cable side. 2-channel scope (Rigol DS1054Z or similar). Probe the controller's output `clk` and `data` lines to the dead board's connector during the first `60 seconds` after boot. Trigger on rising edge of `clk`. Compare against same probe at a healthy board's connector. Healthy: characteristic toggling bursts during enumeration. Flat / silent / one channel only on the dead board's cable but present on healthy boards' cables → controller data driver failure for that slot specifically.

12

Probe data lines at the board's connector pads, after the cable. Same scope, same window, same trigger. But now probe at the hashboard's connector pads (receiving side). Signal present at cable end but absent at the board's pad → connector itself is bad, replace. Signal present at the board's pad but the first chip in the chain doesn't enumerate → board-side data path failure. Inspect for: cracked solder on level-shifter IC at the data input, hairline trace cracks, or corroded edge contact (rare but seen on KS5L / KS5M backplane interconnect).

13

Inspect the dead hashboard's input PMIC and input MLCCs under magnification. Cracked MLCCs on the `12 V` input bulk caps (often hairline) cause rail collapse the moment the PMIC tries to enable. Look for: cracked dielectric, lifted terminations, discoloration, swelling. Replace any cracked MLCC with the same value / package. Inspect the PMIC for visible heat damage. PMIC replacement requires hot-air rework station + microscope + matching part — bench work, not field work.

14

Reflow the PMIC's exposed pad if you suspect a cracked solder joint rather than dead component. Preheat PCB to `150 C`, top-side hot air at `310-330 C` for `30 seconds`. Let it cool naturally, clean flux residue with isopropyl `99 %`. Lower-risk than chip-level reflow because PMIC packages tolerate a thermal cycle well. Document before/after with meter or scope so you know whether the rework actually changed the fault.

15

Replace the controller-to-board cable assembly if cross-swap testing condemned the cable. Source aftermarket cables in correct pitch / length for your KS model — KS3-family and KS5-family use different connectors, lengths can vary by chassis revision. Don't fabricate one — timing margins on enumeration bus are tight. Aftermarket KS cable sets run `$15-45 CAD` depending on model. Replace at next service interval if cable looks marginal even when not the active fault.

16

Stop DIY and ship to D-Central when: controller data lines are silent for one slot (controller swap requires bench imaging); a hashboard PMIC is dead and you don't have hot-air rework + microscope + matching PMIC stock; backplane interconnect on KS5L / KS5M shows oxidation requiring resurfacing; cold-board fault returns within `30 days` of Tier 3 repair; capacitor bulging / MLCC cracking / scorched PCB / burnt-insulation odor; or you cross-flashed `xyys` / `tswift` overclock firmware and one board went cold immediately afterward.

17

D-Central bench process for cold-board faults. Chassis-level inspection. AC and `12 V` measurements under instrumented load with a known-good reference rig running alongside for direct comparison. Cable inventory check — every controller-to-board cable scoped against a reference. Hashboard isolation in a known-good chassis to definitively split board-side from controller-side. Controller swap with cross-imaged eMMC if controller is the fault. PMIC / MLCC component-level rework on the affected board if it is. `24-hour` nameplate burn-in across all boards before unit ships back.

18

Pack carefully. Whole chassis, double-boxed with at least `5 cm` foam every side. KS chassis are not as rigid as Antminer chassis; ship-damage during inbound transit is a leading cause of intermittent cold-board reports we receive. Hashboards can ship in anti-static bags inside the chassis if you must. Include a printed note: model + revision, observed symptoms, which slot is cold, dashboard per-board hashrate before shipping, current firmware version, what you tried, and contact info. Saves bench diagnostic hours.

19

Note on chip-level repair scope. Chip-level rework on KS hashboards is not the typical fix for a cold-board fault — chip damage almost always presents with hot-but-non-hashing or partial-chain symptoms, not cold-and-silent. Cold-and-silent points overwhelmingly at PMIC, cable, or controller data line. If your cold board turns out to be chip-level damage, we'll be transparent — chip salvage market for IceRiver `1004LV100` family is thinner than Bitmain's, we'll quote a parts-bridge plan or referral if we can't source matching parts in-house.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.