Avalon 1246 – ASIC Chip Temperature Abnormal
Informational — Monitor and address as needed
Symptoms
- CGMiner API `{"command":"estats"}` on port 4028 returns non-zero `ECHU[x x x]` with bit 128 set on one or more chains
- `PVT_T` array shows a chip position reading 5-15 C hotter than chain median
- `MTmax` per board above 80 C but below the 85 C `OVER_TEMP` ceiling — miner keeps hashing
- `MTavg` still within spec (55-68 C) — this is an outlier-chip fault, not a broad overheat
- Web UI shows no red alerts — `ECHU_128` is not surfaced in Canaan's dashboard, only via the API
- Hashrate 2-8% below nameplate because firmware quietly reduces work to the flagged chain
- `kern.log` contains periodic `PVT_T abnormal`, `chip temp warn`, or `ECHU warn` entries every 5-15 minutes
- Fault appeared gradually over days or weeks — aging signature, not a power-event signature
- Flag clears after a cold reboot, returns within 2-6 hours of hashing
- Ambient and fan RPMs both in spec, yet `ECHU_128` still flags — heat-transfer problem, not airflow
- Miner has 18+ months of continuous duty without a paste refresh
- Thermal camera shows a single chip position visibly hotter than its neighbours on the same board
Step-by-Step Fix
Hard power-cycle at the PDU for a full 60 seconds — breaker off, caps drain, then power on. A soft reboot sometimes carries a stale `ECHU_128` flag across the restart; a full cold-start clears cached state and transient post-firmware-update glitches. Watch the first 15 minutes of API output after power-up to confirm whether the flag returns or clears. This alone clears roughly 10% of tickets in D-Central's Canaan queue.
Verify ambient at the intake grille with an IR gun — not room-middle. Target `<= 30 C` at the front face. `PVT_T abnormal` is a thermal-margin problem; if you are operating at `30+ C` ambient, you have no margin to lose and the flag will keep returning. Move the miner, add airflow, or crack a window before pursuing other causes.
Clear the front 30 cm of the intake grille — shelves, curtains, boxes, another miner's exhaust. The 1246 is a stacked-fan unit that chokes on restricted intake. Zero-dollar, two-minute fix that resolves a surprising percentage of `ECHU_128` tickets that appeared after the workshop was rearranged.
Re-pull `estats` 30 minutes after power-up: `echo -n '{"command":"estats"}' | nc <miner-ip> 4028`. Confirm `ECHU` is zero across all chains, or that the flag has returned. Save this snapshot — it is your post-Tier-1 baseline and it is the first thing D-Central's bench will ask for if the fix escalates to us.
Inspect the intake mesh for obstruction — paper shreds, pet hair, packing material, insect debris. Anything blocking even 10% of the mesh creates a microlocal airflow dead zone and the chips behind the obstruction flag as `PVT_T` outliers. Clean with a soft brush and re-test.
Compressor-blow the fin stack and fans. Real air compressor at 80-90 PSI, miner off, blow from the exhaust side back through the intake. Canned air does not move enough volume to clear a 1246 fin stack. Spin each fan blade by hand while blowing. Expect a visible dust cloud on pass one; do a second pass after 30 seconds to confirm.
Re-seat the flagged hashboard and inspect its ribbon and power connectors. Power off, wait 5 minutes. Label slots 0/1/2 before removal. Inspect IDC contacts for oxidation or blackening; wipe with 99% IPA on a lint-free wipe if needed. Reseat firmly until the latch clicks. A connector sitting 0.5 mm proud of seated feeds garbage thermal data to the MM3 controller.
Swap the flagged hashboard between slots. Move the flagged board to a different slot, power on, run 30 minutes, pull fresh `estats`. If `ECHU_128` follows the board, the board itself is the problem (paste, chip, or on-board NTC). If it stays in the original slot, the control path, AUC3 bus, or slot-specific wiring is the problem. This 20-minute diagnostic saves hours of guessing.
Replace the intake and exhaust fans if any are slow or grinding. 1246 stock fans run 6000-7000 RPM at full duty. A fan below 5500 RPM at 100% duty still reports `working` but produces a regional airflow deficit that shows up as `PVT_T` outliers on the chips directly behind it. Replace all fans at once — the survivors are almost as tired as the one you just replaced.
Verify PSU rail under load. Multimeter on DC, probe at the PSU-to-AUC3 connector while the miner is hashing at full nameplate. Expect ~12.0 V input rail; below 11.7 V means the PSU is tired or the circuit is undersized. A sagging rail forces on-board voltage regulators to work harder, bleeding extra heat through the PMICs and raising nearby chip temps.
Full thermal paste refresh on the flagged board. Remove the fin stack — Phillips or Torx fasteners, preserve or replace Kapton-backed thermal pads on the PCH and voltage-domain ICs. Clean old paste with 99% IPA and lint-free wipes, no residue on die or heatsink. Apply Arctic MX-6 or Thermal Grizzly Kryonaut in a thin uniform layer, rice-grain blob per die, let mounting pressure spread it. Reassemble with even torque. This alone restores full thermal margin on the vast majority of 18-24 month old 1246 boards flagging `ECHU_128`.
Full thermal paste refresh on all three boards when diagnosis points to global paste degradation (all chains flagging). Don't do one board at a time — factory paste ages at roughly the same rate across the miner, and a single-board refresh means you'll be back in six months for the other two.
Replace crumbled thermal pads on PCH and voltage-domain ICs while the fin stack is off for paste refresh. Crumbled pads transfer heat poorly and raise nearby chip temps indirectly through the PCB. Match pad thickness with a caliper measurement before sourcing replacements — Canaan does not publish pad specs.
Reflow the repeat-offender chip only if Tier-3 paste refresh was clean and the same chip position is still flagging `ECHU_128` 2-4 weeks later. Preheat bottom at 150 C, top-side hot air at 310-330 C for ~30 s, natural cool-down, fresh paste on reassembly. The A3206 BGA tolerates one reflow well. A second reflow on the same chip within 90 days rarely holds — replace the chip instead.
Tune AUC3 IIC bus if Step 8 showed a slot-specific flag. Edit config to `--avalon7-aucspeed 200000` (down from 400000 default), leave `--avalon7-aucxdelay 19200`. A slower bus is more tolerant of marginal cables and NTC noise. Watch the log for `CODE_MMCRCFAILED`. If those events disappear and `ECHU_128` clears, you had an AUC3 bus-margin problem masquerading as a chip thermal fault.
Stop DIY and book a D-Central ASIC Repair slot when any of these are true: same chip position flags after a clean paste refresh and one reflow cycle; a reflowed chip re-flags within 30 days; visible capacitor bulging, cracked MLCCs, or burnt-component smell; thermal camera isolates a single chip running `15 C+` hotter than neighbours and paste refresh did not resolve it; slot-specific flag persists after AUC3 replacement. At that point it is bench territory — test fixture, chip-level tools, graded A3206 replacement stock. Book at d-central.tech/services/asic-repair/ — 5-10 business day turnaround, Canada / US / international.
D-Central bench process on a 1246 `ECHU_128` case: test-fixture boot with programmable load, chip-by-chip thermal isolation via API `PVT_T` extraction plus IR thermal mapping, NTC and PMIC cross-validation. Single chip confirmed as root cause gets chip replacement with graded A3206 stock, proper reflow profile, and 24-hour burn-in at nameplate before return shipping. PMIC or voltage-domain root cause gets component-level repair with matching-spec parts, not a blanket board swap.
Ship safely. Hashboards in anti-static bags, double-boxed with `>=5 cm` of foam on every side. Include a physical note inside the box with: full `estats` baseline from Step 4, firmware version, install date, last paste-refresh date, ambient at install, and which Tier 1-3 steps you've completed. Every minute our bench spends reconstructing the fault history adds to the repair bill. A well-documented ship-in saves real dollars.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
