Avalon 1166 Pro – Single Chip Failure
Warning — Should be addressed soon
Symptoms
- AvalonMiner status / `cgminer-api estats` reports `119 / 120 / 120` (or one chain at 119) for the three chains' chip counts
- Realized hashrate runs `0.5-0.7 TH/s` below nameplate, stable not drifting
- `GHSmm` (measured) trails `GHSavg` (average) by `~0.6 TH/s` consistently
- Affected chain's `MW` array shows one zero or near-zero entry while the other 119 cluster normally
- `cgminer-api stats` returns `chain_acn[X] = 119` for one chain
- `kern.log` shows isolated `asic X timeout` or `chip X not responding` lines that do not recover
- `PVT_T` temperature array shows one chip reporting `0`, `-1`, or stuck-low value
- Chassis red LED is NOT lit — Canaan threshold logic considers this 'healthy'
- Hashrate drop happened suddenly (over hours), not gradually over weeks
- Pool-side rejected share rate is NOT elevated — head-count problem, not HW% problem
- Cold-start (60 s breaker off + reboot) does NOT restore the chip count to `120`
- Thermal imaging shows one chip on the affected board running notably colder OR hotter than its domain mates
Step-by-Step Fix
Hard power-cycle at the breaker for 60 seconds, then boot. Wait 20 minutes for thermal steady state before re-checking chip count via `cgminer-api stats`. Real chip failures don't come back; transient bus or brownout events do. This single step clears about 1-in-12 phantom 'dead chip' reports we see in the queue, costs nothing, and rules out wedged firmware state before you commit to a teardown.
Capture the diagnostic baseline. SSH or use the AvalonMiner web UI to record per-chain `chain_acn`, per-chain `PVT_T` array, `GHSmm` vs `GHSavg`, MM3 firmware version (`MM ID0 Ver`), and a 30-minute `kern.log` snapshot. Save it to a file. If you ship the unit to D-Central, this snapshot saves us 1-2 hours of bench diagnosis time and saves you that on the invoice.
Verify ambient and intake conditions. IR thermometer at the front grille — target `≤ 30 °C`, hard-limit `≤ 35 °C`. A 1166 Pro running marginal-thermal will drop chips intermittently as ambient creeps up. If your 'dead chip' only appears in the afternoon and recovers overnight, you have a cooling problem, not a chip failure. Fix airflow before reaching for a soldering iron.
Check Canaan's MM3 firmware portal and AvalonMiner community channels for known-issue advisories on your specific hardware revision. If a firmware build is known to drop chips spuriously and you're on it, roll one version forward or back before committing to hardware repair. Document your `MM ID0 Ver` string before flashing — roll-back capability matters.
Multimeter on DC at the PSU-to-hashboard connector under full hashing load. Expect `~12.0 V`, accept `11.8-12.2 V`, reject below `11.6 V` sustained. Probe each of the three hashboard inputs separately — a partially failing PSU can deliver clean rail to two boards and starved rail to a third. If sag is present, swap the PSU with a known-good unit and re-check chip count after 30 minutes of stable mining.
Re-seat every connector on the affected hashboard. Power off at the breaker. Disconnect data ribbon and power lead. Visually inspect contacts under bright light for blackening, oxidation, bent pins, dust. Reseat firmly until you hear/feel the click. Apply DeoxIT D5 to IDC headers if you're in a humid environment. Boot, wait 20 min, re-check chip count. About 8% of 'dead chip' tickets are loose ribbon contacts faking a chip drop.
Swap hashboards between slots to localize the fault. Label the three slots `0/1/2` with tape. Move the suspect board (the `119`-chip one) to a known-good slot. Boot, monitor 20 min. If `119` follows the board → board-side fault, continue to Tier 3. If `119` stays in the original slot → control-board / cable / connector fault on the chassis side, which is a different repair path entirely.
Verify line voltage at the panel under load. On 240 V split-phase expect `235-245 V`; on 208 V commercial expect `202-212 V`; on 220 V European expect `215-235 V`. Low line voltage forces the PSU to pull more current, producing ripple that knocks marginal chips off the bus. If line voltage sags at the same time of day daily, you have an electrical-supply problem and the chip will keep dropping until you fix it.
Thermal-image the affected hashboard under load. Open the chassis carefully (boards run `~350 W` each — relocate fans to maintain airflow if you'll leave it open longer than 5 min). Bring the miner to thermal steady state (~10 min). Sweep a FLIR-class thermal camera across the chip side. Cold spot = unpowered/unresponsive (silicon or rail-feed dead). Hot spot = shorted internal. Thermally normal = bus or BGA-joint failure. Note the chip position number from the silkscreen.
Probe the suspect chip's voltage domain. Identify which domain the dead chip belongs to (see facts file for D-Central's working domain map; verify against your specific hashboard revision). Probe Vcore at the domain's test point under load. If Vcore is `5-15%` below the other domains, you have a domain power problem. Inspect MLCCs, bulk caps, and MOSFETs in that domain under a microscope. Replace failing passives BEFORE replacing the chip — replacing the chip on a starved domain just kills the new chip in 2-8 weeks.
Reflow the suspect chip as last-cheap-attempt before chip replace. Remove heatsink. Apply flux around the BGA perimeter. Preheat the hashboard from underneath to `~150 °C` (preheat plate or IR preheater). Top-side hot air at `320-340 °C`, `~10 mm` standoff, slow circular motion for `30-45 s` until you see solder reflow settle. Cool naturally on the preheat (don't pull it cold, you'll crack joints). Re-paste, reassemble, retest. About `15-20%` of 1166 Pro 'dead chip' boards come back with a reflow alone — BGA crack, not silicon death.
Replace thermal paste and pads on the entire hashboard while you have it open. Arctic MX-6 or Thermal Grizzly Kryonaut on the chips. Replace any thermal pads on PCH/voltage-domain ICs that look glassy, hardened, or compressed-non-rebounding. Dried paste on a 1166 Pro pushes Tj on the worst-positioned chips up by `8-12 °C`, putting borderline chips into the failure window. If you're already at the bench, do all 120 chips on the affected board.
Stop DIY when: (a) you confirmed Tier 3 reflow but the chip still reads dead, (b) you found voltage-domain damage requiring component-level repair beyond cap/MOSFET swap, (c) two boards in the same rig developed dead chips within 60 days (systemic, not isolated silicon), or (d) you don't have hot-air rework, microscope, and BGA experience. The next step is chip replacement — a `$120-220 CAD` bench job at D-Central. Don't learn BGA rework on a `$400` hashboard.
D-Central bench process: test fixture with programmable load to confirm dead-chip diagnosis; BGA reball station for joint-only failures; salvaged-grade A3206 chips from end-of-life 1166 Pro inventory; hot-air rework with bottom-side preheat profiled to A3206 specs; post-repair `24-hour` burn-in at nameplate frequency; full board re-paste and re-pad while open. Turnaround: `5-10 business days` from receipt.
Forward-looking note: D-Central is exploring DCENT_OS support on Canaan/Avalon hardware as part of our open-source firmware roadmap. Currently DCENT_OS ships only on Antminer (Bitmain) silicon and is NOT an available fix for 1166 Pro today. When/if Canaan support lands, per-chip diagnostics will become a standard SSH endpoint instead of requiring CGMiner API parsing. Watch d-central.tech/dcent-os for roadmap updates. Until then: stock Canaan MM3 firmware and `cgminer-api` for diagnostics.
Ship safely. Pack the entire miner if practical, otherwise pull the affected board only and pack it in an anti-static bag, double-boxed with `≥5 cm` of foam on every side. Include the diagnostic snapshot from Tier 1: chip-count history, MM3 firmware version, suspect chip position number, observed PSU voltage under load, and your contact info. This saves us bench time, which directly reduces your invoice.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
