Avalon 1366 – Dead Chip Count Exceeded
Warning — Should be addressed soon
Symptoms
- `cgminer-api` `stats` reports `ASIC count` below `120` on at least one of the three hashboards (e.g. `MM count: 110/120, board disabled`)
- Realized hashrate has dropped roughly proportional to missing chips (each A3200CFA ~`0.36 TH/s` of the board's `~43 TH/s` share)
- Boot log shows non-zero `ECHU` on a specific board, or controller reports `ASIC_COUNT_LOW` and skips powering that hashboard
- Dashboard shows two boards green/active and one greyed out / marked `disabled`, even though the board is physically installed
- After a power cycle the missing-chip count is stable, not flickering — the chips are physically gone, not transiently glitching
- `MW0` array contains `0x0` entries clustered on adjacent chip indices (`A1`-`A3`, or `A19`-`A21`, etc.)
- Thermal camera or IR thermometer shows cold spots at near-ambient on the running board while neighbours run `60-85 °C`
- `PVT_T` per-group temperature reads abnormally low or returns `-` / `--` for the affected group
- Group voltage rail off-spec — `1.8 V` (`VDDIO`) or `0.75 V` (`VTOP`) measures wrong, or one rail reads `0 V`
- Rest of the miner is healthy — no fan errors, no PSU faults, no over-temperature; the issue is isolated to chip count on a single board
- Chip count drift accelerated after a known event: overclock push, thermal excursion, power surge, or a prior repaste/rework
- If the board was previously third-party repaired: pattern of losing `1` chip/month consistently after the repair (grey-market chip provenance)
Step-by-Step Fix
Hard power-cycle the miner — `60 seconds` off at the breaker, not a soft reboot. Some MM firmware builds latch a board-disable state across soft reboots but clear it on a cold boot. If the board returns at `120/120`, you had a stuck firmware state, not a dead chip. Re-monitor `24 hours` to confirm it stays healthy before declaring victory.
Re-seat the AUC3 ribbon and the power loom on the affected hashboard. Power off at the breaker, unplug, remove the chassis lid, disconnect/reconnect the AUC3 ribbon between the control board and the suspect hashboard. Inspect for bent pins, oxidation, or blackened contacts. Repeat for the power harness; listen for the click on every connector.
Read `ASIC count` directly from `cgminer-api` rather than trusting the web dashboard. From any LAN machine: `echo -n '{"command":"stats"}' | nc <miner-ip> 4028`. Parse the `MM ID0/1/2` blocks for `ASIC count`. This is the ground truth; the dashboard can lag, cache, or misreport.
Verify intake ambient at the front grille with an IR thermometer. Target `≤ 35 °C` at the front of the miner — anything above pushes `Tj` toward the `>100 °C` mortality zone for `A3200CFA` and accelerates chip death across the fleet. Filter clean and ambient correction is the cheapest possible intervention.
Check the Canaan firmware portal (`avalonminer.org/firmware-document/`) for current MM image for your A1366 hardware revision. Some MM builds shipped with overly aggressive `0x0` nonce reporting that masquerades as chip mortality. Roll one MM version back or forward only after confirming the build matches your hardware revision — wrong MM bricks the controller.
Multimeter the group voltage rails on the affected board under full hashing load. Probe each group's `1.8 V` (`VDDIO`) and `0.75 V` (`VTOP`) test points. Healthy: all 40 groups within `±5%` of nominal. If one group reads `0 V` or grossly off-spec while neighbours are clean, the PMIC for that group is dead, not the chips — the dead chip count likely returns to zero once the regulator is replaced.
Thermal-camera the running board after `5 minutes` of hashing. Boot at stock frequency, lift the chassis lid, image the hashboard from above with a `FLIR ONE Pro` or equivalent. Dead chips appear as cold spots at ambient while neighbours run `60-85 °C`. Photograph the heatmap and overlay on a board layout to identify failed positions — disambiguates chip death from thermal-pad failure in `30 seconds`.
Re-paste the entire board. Power off, cool `30 minutes`, remove heatsink, clean every chip with `99% IPA` and lint-free wipes, apply `Arctic MX-6` or `Thermal Grizzly Kryonaut` in a thin uniform layer. Replace any failed thermal pads on PMICs with `1.0 mm` `5-6 W/mK` pads. Reassemble with calibrated screw torque — uneven heatsink pressure is itself a cause of chip mortality.
Verify PSU rail voltage under load at the hashboard input. Multimeter probes at the power harness pins on the affected board while the miner is hashing at full power. Expect `≥ 13.8 V` sustained. PSU sag below that drives chip stress and is one of the most common causes of multi-board chip-count drift; swap with a known-good PSU and re-baseline.
Run for `24 hours` after Tier 1-2 interventions and re-query `cgminer-api` `stats`. Compare against your baseline. Stable count above the disable threshold = bleeding stopped. Still dropping = silicon mortality in progress; escalate to Tier 3 or 4.
Decode the `MW0`/`PVT_T` per-chip map for surgical targeting. Pull `MW0` via `cgminer-api`, identify indices reading `0x0`, cross-reference with board silkscreen positions `A1`-`A120`. Confirm with the thermal photo from Step 7. You should now have a list of `1-12` specific chip positions to rework — mark them with a fine-tip silver Sharpie on the board edge.
Source replacement chips: `A3200CFA` primary, with `A3200CMA` and `A3200CMCV3` documented as drop-in compatible on A1346/A1366 boards. Use new-old-stock or D-Central salvaged-grade chips of known provenance — avoid grey-market lots, which historically show accelerated mortality. Order `1.5×` your dead-chip count to allow for rework loss.
Pre-heat the board, remove the dead chip. Bottom-side preheat platform `~150 °C`. Apply flux around the target chip's BGA. Top-side hot air at `300-330 °C` for `~30-45 seconds`. Lift the chip with vacuum tweezers — it should release cleanly. If it fights, wait `5 more seconds`; do not force it (you'll lift pads). Bottom-side preheat is non-negotiable on these boards.
Clean the pad, re-tin, place the new chip. Wick residual solder with copper braid, clean pads with `99% IPA`. Apply fresh solder paste or pre-tin pads. Align new `A3200CFA` to the silkscreen orientation marker (asymmetric corner — get this wrong and the chip is dead on first power-up). Lower into place. Top-side hot air `310-330 °C` for `30 s`. Light pressure with tweezers; do not press down.
Cool naturally on an antistatic mat for `5+ minutes` — never blow cold air on a hot board (thermal shock cracks adjacent joints). Clean flux with IPA. Re-paste all 120 chips (you've disturbed the heatsink anyway). Bench-test on a universal Avalon test fixture if available, or reinstall and re-query `cgminer-api` `stats`. Confirm `ASIC count` recovered to `120/120` and the previously-dead positions now report non-`0x0` `MW0` values.
Stop DIY when: `>3` chips dead on one board; PMIC / voltage regulator suspected; you lack hot-air rework experience on `0.4 mm`-pitch BGA; a previous DIY rework left the board worse; capacitor bulging or board discoloration is visible. D-Central runs Tier 4 chip-level rework on Avalon hashboards — book a slot and bring photos of the `MW0` map and the thermal image (saves bench time, saves you money).
D-Central bench process for A1366: universal Avalon test-fixture bench-up, full per-chip enumeration via `cgminer` debug build, voltage-rail integrity across all 40 groups, chip-level rework with new-old-stock or graded `A3200CFA`, full re-paste with high-end paste, `24-hour` post-repair burn-in at nameplate `~43 TH/s` per board. Boards with `>12` dead chips quoted as scrap-for-parts; boards with `1-3` dead return as-new.
Ship safely to Quebec. Anti-static bag the affected hashboard(s) — leave the rest of the miner intact unless instructed. Double-box with `≥ 5 cm` foam every side. Include a printed note: observed symptoms, MM firmware version, `cgminer-api` `stats` output, your `MW0` map, contact info. Canada-wide / US / international accepted; return turnaround `5-10 business days` typical.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
