Avalon 1266 – Hashboard Failure
Critical — Immediate action required
Symptoms
- Canaan dashboard or AUC3 Web UI reports **`board_count < 3`** on the 1266, or a specific chain shows as "failed" / "not detected"
- `cgminer` JSON API at port `4028` returns a `SYSTEMSTATU` object where one or more of `MW0 / MW1 / MW2` is empty, all-zero, or absent
- `ECHU[0]`, `ECHU[1]`, or `ECHU[2]` is **non-zero and climbing** — the MM is flagging hashboard error-correction trips on a specific chain
- `ECMM` is non-zero — module-management layer is reporting a fault that stock firmware won't articulate further
- Realized hashrate is **`~66%` of nameplate** (one board down) or **`~33%` of nameplate** (two boards down) on the 1266
- Red status LED **sustained** on the front panel — one of the seven overloaded "red LED means something" faults per the [Canaan 721-841 service PDF](https://canaan-io-static.s3.amazonaws.com) (`Toohot / Loopback failed / PG failed / Core test failed / Voltage error / Temp sensor error / No fan`)
- One hashboard's `PVT_T0..2` temperatures run `10°C+` hotter than the other two at the same fan duty
- One or more specific chip positions drop out of `PVT_T` / `PVT_V` arrays while the rest of the chain still reports — a dead or marginal chip
- `GHSmm` and `GHSavg` diverge by more than `5%` sustained — the chip array is returning fewer good nonces than the theoretical curve predicts
- PSU fan runs normally but DC output at the board-side harness sags under three-board load (passes on one- or two-board load)
- Miner was recently moved, serviced, had a hashboard or PSU swapped, or had thermal paste refreshed in the last `30` days
- Miner was recently exposed to a power surge, brownout, lightning event, or utility switching transient
- Web UI loads and the AUC3 IP is reachable, but one Chain Status panel is empty / shows placeholder dashes
Step-by-Step Fix
Hard-power-cycle at the PDU / breaker for a full `60` seconds. Not a Web UI reboot — a true AC disconnect. Wait the full minute for PSU bulk caps to fully discharge and MM state to flush, then power back up and watch the dashboard for `3/3` to stabilize. This is the cheapest diagnostic on any 1266 chassis and recovers a meaningful fraction of `CHAIN_FAIL` tickets caused by wedged MM state after a brownout, network hiccup, or firmware restart. While the chassis sits dark, note any red LEDs, unusual PSU fan behaviour, or burnt smell — those observations steer the rest of the triage.
Confirm network path to the AUC3. Browse to `http://<miner-ip>/` — does the Web UI load? `ping` the IP from a laptop on the same subnet. If the AUC3 is unreachable, the failure category is not "hashboard failure" — it's an AUC3 / network problem, and the correct playbook is [Avalon - AUC USB Connection Lost](https://d-central.tech/asic-troubleshooting/avalon-auc-usb-connection-lost/) or [Avalon - AUC Controller Network Loss](https://d-central.tech/asic-troubleshooting/avalon-auc-controller-network-loss/). Confirm the control board before chasing hashboards.
Capture the `cgminer` API snapshot before touching anything. `curl http://<miner-ip>:4028 -d '{"command":"estats"}' -H "Content-Type: application/json" > snapshot-before.json`. Repeat with `{"command":"stats"}`. This single snapshot captures `PS[0..2]`, `ECHU[0..2]`, `ECMM`, `MW0..2`, `PVT_T0..2`, `PVT_V0..2`, `GHSmm`, `GHSavg`, and `SYSTEMSTATU` — everything a D-Central bench needs to reproduce your fault when the chassis arrives. If you're planning to ship, capture this before you start diagnostics and include it with the shipment; you'll save us hours of reproduction time and save yourself repair dollars.
Inspect for physical damage and capture recent service history. Walk around the miner: burnt smell, discoloured PSU housing, melted output harness, visible capacitor bulging, burn marks near the control board? Was the miner moved, opened, had a hashboard or PSU swapped, or had paste refreshed in the last `30` days? Service history is the single highest-value piece of context — a `CHAIN_FAIL` right after a paste refresh almost always points to a reversed power sequence or a loose ribbon; a failure out of the blue after months of stable operation almost always points to PSU wear or a feeder event.
Confirm intake ambient and airflow. Intake `≤35°C` per the [A1066 manual](https://avalonminer.org/firmware-document/) — applies to the 1266 as well, same sensor set. An overheated PSU can cut output to one rail and present as a failed chain even though this isn't a classic "overheat" error. Canadian summer garages and any ducted setup with bad return-air flow hit `40°C+` at intake and kill PSUs long before hashboards. Measure at the intake grille with an IR thermometer, not at room-middle.
DMM-verify PSU output at the control-board harness under load. Disconnect the PSU-to-control-board harness. Multimeter on DC: probe open-circuit and expect `12.0 – 12.6V`. Reconnect firmly, power up, and re-probe during the boot window and during steady-state hashing (first `5` minutes of a fully-loaded hash). Expect `≥11.8V` sustained under full three-board load. A PSU that reads `12.4V` open and collapses to `9V` under load is tired regardless of hours on it. Multimeter leads are the cheapest Tier-2 diagnostic and they resolve the PSU fault class definitively without opening any boards.
Swap to a known-good `1266`-compatible Canaan PSU. Use a confirmed-working PSU from the `A1056 / A1066 / A1066 Pro / A1126 Pro-S / A1146 Pro / A1166 Pro / A1246 / A1266` family — all share the same connector and output spec. Do NOT attempt to cross-connect a Bitmain `APW9` / `APW12` or similar PSU; Canaan and Bitmain pinouts differ and wrong-brand PSU is a documented failure mode that can destroy the hashboards outright. Power up and observe `3/3` on the dashboard within the first `90` seconds.
Re-seat every ribbon, harness, and signal cable in the chassis. Kill AC. Open the chassis. Unplug every control-board-to-hashboard ribbon and the PSU-to-control harness. Inspect each connector under bright light for blackening, corrosion, green oxide, or bent / recessed pins. Clean contact surfaces with `99%` isopropyl alcohol and a lint-free swab. Re-seat firmly — feel and hear each click. Bad connectors are a statistically large contributor to this chassis's `CHAIN_FAIL` reports because of the vibration profile from the stock fans running at high duty in warm ambients.
Cable-tie the AUC3 ribbon and PSU harness to the chassis frame. If Step 7 or Step 8 resolved the fault and you found wear / corrosion at a connector, secure the affected cable to the frame with a zip-tie so chassis-fan vibration can't work the connector loose again. [Zeus Mining documents this on the 1246](https://www.zeusbtc.com/articles/asic-miner-troubleshooting/2634-how-to-troubleshoot-the-avalon-a1246) and the fix applies identically on the 1266: the internal IIC ribbon or external AUC USB drifts under vibration until contact is marginal and enumeration fails intermittently.
Isolate one hashboard at a time by unplugging the other two. Power down. Disconnect hashboards `1` and `2` ribbons. Power up. Read the dashboard: does board `0` enumerate alone (`board_count = 1`)? Repeat for `1` and `2`. All three pass alone but fail together = PSU can carry single- or dual-board current but sags under three-board load; replace PSU. One specific board fails alone = that board has localized `U1/U2/R8/R9` damage, a dead chip, or a PMIC short — continue to Tier 3 or 4. None enumerate alone = control-board / AUC3 is the fault — Tier 3 or 4.
Swap the AUC3 IIC ribbon if you have a spare. If all PSU checks pass, one specific board consistently fails to enumerate alone, and the serial log shows `MMCRCFAILED`, the IIC ribbon on that path is the next cheapest suspect. A cracked conductor inside the ribbon's flex section doesn't show on visual inspection — the only way to isolate is to swap. Spare ribbons are cheap from parts suppliers or can be pulled from a parts-donor `1246 / 1166 Pro / 1266` chassis.
Re-flash the factory MM firmware via the AUC3 Web UI if you suspect firmware corruption. Only proceed if the control board is reachable and the failure followed a recent firmware event. VERIFY the image is for `A1266` specifically — cross-flashing a `1246` or `1166 Pro` image onto a `1266` will brick the control board because Canaan signature-checks MM firmware per-model. Rollback to an earlier image is blocked, so only flash the current correct image or newer. See [Avalon - Firmware Flash via AUC](https://d-central.tech/asic-troubleshooting/avalon-firmware-flash-via-auc/) for the full procedure; don't interrupt the flash once it has started.
Capture the control-board serial log at boot via USB-TTL. Connect a USB-TTL adapter (FT232 / CH340 / CP2102) to the control board's UART header — location varies by AUC3 revision, check the silkscreen. At the documented baud rate, capture the full boot log during a cold start. MM boot messages, `MMCRCFAILED` errors, IIC init failures, and per-chain hashboard-handshake attempts all appear here. This is the single highest-value diagnostic on a `1266` where the Web UI is alive but chains are silent — the serial log tells you which of the four failure categories (PSU comm / IIC bus / per-board power / chip-level) is active. A persistent `MMCRCFAILED` on every boot points squarely at AUC3 IIC integrity — either `aucspeed / aucxdelay` tuning or a hardware bus fault.
DMM diode-mode check on `U1`, `U2`, `R8`, `R9` on the failing board. Remove the board. Probe `U1` and `U2` in diode mode; measure `R8` / `R9` resistance against known-good values from the board's silkscreen or a donor board. Per the [Zeus Mining A11/A12 repair guide](https://www.zeusbtc.com/manuals/4848-avalon-a11-a12-series-hash-board-repair-guide), these are the primary victims of a reversed install sequence — connecting signal or positive before negative burns them instantly. Burnt `U1/U2` reads dead-short or open in diode mode; burnt `R8/R9` reads open or way off nominal. All four parts are cheap but require hot-air rework and surface-mount soldering to replace. If you find damage on a single board, replace and re-test. Identical damage on multiple boards = feeder event took them all out simultaneously; ship the whole chassis rather than shotgun-replacing.
Scope the `12V` rail at the hashboard input during boot. A `50 MHz` handheld scope captures the rail-up ramp during MM power sequence. Healthy: clean step from `0V` to `12V` in tens of milliseconds, flat at `12V` thereafter with `<200 mV` ripple. Damaged PSU: oscillation, overshoot, or incomplete ramp. Damaged per-board power sequence: rail comes up and immediately collapses as `U1/U2` fail to latch. This single scope capture distinguishes PSU faults from per-board faults more reliably than DMM averaging, which smooths out transient events.
Chip-level `PVT_T / PVT_V` analysis + targeted reflow. With the failing board isolated and hashing solo, query `PVT_T0..2` and `PVT_V0..2`. Identify the `1 – 2` worst chip positions by temperature spread, voltage offset, or dropout. Remove the heatsink, apply flux, reflow with preheat-plus-hot-air (preheat bottom to `~150°C`, top-side hot air at `310 – 330°C` for `~30s`). The `A3206`-class BGA tolerates a reflow cycle well. Let it cool naturally, re-paste with Arctic MX-6 or Thermal Grizzly Kryonaut, reassemble, re-test. If `PVT` data returns clean, you've rescued a `~$300+` hashboard with a `$15` pass of flux and paste.
Tune `--avalon7-aucspeed` and `--avalon7-aucxdelay` via CGMiner. If Step 13 serial logs show intermittent `MMCRCFAILED` and ribbon swap didn't fully resolve, halve `aucspeed` to `200000` or double `aucxdelay` to `38400` via the control board's CGMiner command line. Retest. This is undocumented in Canaan materials and lives only in the cgminer source — a hallmark of A-series diagnostic lore that lives in the community. Not a permanent fix on a failing ribbon, but buys `30 – 90 days` while the replacement parts ship.
Inspect the control-board surge-input MOSFET and `12V → 3.3V` chain. If the control board shows no LED activity at power-up with a known-good PSU, a surge event has likely taken out the input-side MOSFET or a downstream regulator. Under magnification, look for hairline cracks in MOSFET packages, discolouration of solder pads, or cooked silkscreen. Replacement is a hot-air rework job. This fault typically takes out the control board alone while PSU and hashboards remain healthy — isolation by swapping in a parts-donor AUC3 is straightforward if you have one.
Stop DIY when multiple hashboards show identical `U1/U2/R8/R9` damage. Three boards with the same burnt MOSFET set means a feeder event (surge / brownout recovery spike / lightning-induced transient) simultaneously damaged all three. That's a full-chassis bench rebuild: three-board component-level repair, PSU replacement, control-board surge-input-MOSFET replacement, and `24-hour` nameplate burn-in. Shotgun-replacing at home rarely lands — D-Central's bench isolates each board on a programmable load, verifies each chip's `PVT_V` and `PVT_T` against a known-good baseline, and only returns boards that pass burn-in.
Stop DIY when the failure narrows to a specific chip position across multiple boards. If Step 16 `PVT` analysis isolates the same chip *position* (physical location on the PCB) failing on two or more boards, you're in PCB-level territory — it's likely a voltage-domain or PMIC issue rather than a silicon failure. This requires test-fixture isolation and is not a home-bench repair. Same if a chip position recovers after reflow and fails again within `30 days`: the chip itself is marginal and needs replacement, which requires the BGA rework tooling and graded-salvage chip inventory D-Central keeps on bench.
Stop DIY when the control board / AUC3 is suspected dead. Step 18's surge-damage signature, or a control board with zero LED activity at power-up with a known-good PSU, requires a bench fixture to localize. The AUC3 has a multi-stage `12V → 3.3V` chain and any stage can fail. D-Central carries parts-graded AUC3 control boards for swap-out and cross-references repair-queue logs for failure patterns across the 1166 Pro / 1246 / 1266 fleet.
Ship the miner properly. Pack the chassis in the original Canaan box if available, or double-box with `5 cm+` of foam on every side. Include the PSU — a tired PSU that mostly works on your bench may be the root cause of what looks like "board failure." Include a note with: serial number, observed symptoms, screenshots of the dashboard, the `cgminer` API snapshot from Step 3, any `PS[0..2]` / `ECMM` / `ECHU` values captured, service history (when opened, what was done, by whom), and contact info. That snapshot saves D-Central's bench hours of reproduction time, which saves you repair dollars directly. Canada-wide shipping standard; US / international welcomed.
Discuss the repair-vs-replace economics up front. A full `1266` bench rebuild (PSU + three-board component-level + control + burn-in) runs CAD `$750 – $1,500` depending on what we find. A used `1266` on the secondary market in late `2025` runs CAD `$1,100 – $2,000` — the model is current-gen and resale is stronger than older A-series chassis. The math leans toward repair more often on a `1266` than on a `1166 Pro`, but neither is automatic. D-Central quotes honestly before committing bench hours: if the repair economics don't work, we'll tell you. Sometimes the right answer is "salvage the PSU for parts, sell boards that still test alive, and put the cash toward a 1346 or newer."
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
