Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

CHAIN_FAIL / 1346 hashboard detect fail / A3218 chain Critical

Avalon 1346 – Hashboard Not Detected

Canaan MM on the Avalon 1346 reports fewer than 3 of 3 hashboards enumerated after the MM power sequence completes. The A3218 chain on one (or more) boards fails to respond to the AUC3 chain-discovery handshake, so SYSTEMSTATU reports board_count < 3 and the affected chain's MW / ECHU / PVT arrays stay empty.

Critical — Immediate action required

Affected Models: Avalon 1346 (104 TH nameplate, A3218 silicon). Chassis shares control-board / PSU / AUC3 architecture with the A1326 and A1366 (A13-series) and re-uses cooling hardware (HA1250H12SB-Z fan) across the A13 family. Diagnostics on this page cross-apply to the A1326 and A1366; A1446 / A1466 reuse the same MM firmware family but different silicon (A3228/A3229), so the CHAIN_FAIL decoding is identical while the per-chip voltage / temperature baselines differ.

Symptoms

  • Canaan Web UI or AUC3 dashboard reports board_count: 2 or board_count: 1 on the 1346 (should be 3) — the chain that did not enumerate is named in the Status tab
  • cgminer JSON API on port 4028 (`curl http://<ip>:4028 -d '{"command":"estats"}'`) returns SYSTEMSTATU with one of MW0 / MW1 / MW2 empty or absent while the other two are populated
  • ECHU[x] reads 0 or unreadable for the failed chain, while the other two chains show non-zero error-correction traffic (confirms the chain is silent at the MM layer)
  • PVT_Tx / PVT_Vx arrays empty or all-zero for the failed chain while the other two chains report normal per-chip telemetry
  • ECMM (Module Management error code) non-zero, often with a CHAIN_FAIL signature — MM has tried and given up on discovery for the failed board
  • Reported GHSmm (theoretical hashrate) drops to roughly two-thirds of nameplate (~69 TH/s instead of ~104 TH/s) because only 2 of 3 boards are counted toward the capacity figure
  • Red status LED sustained on the front panel — one of the seven fault categories stock Canaan firmware lumps into a single LED state
  • Control-board serial console shows `MMCRCFAILED` or `asic_init_fail` lines tied to a specific chain index during the MM boot sequence
  • Miner was recently moved, reseated, had a hashboard swapped, had thermal paste refreshed, or had a PSU disconnected and reconnected within the last 30 days
  • Chassis was recently exposed to a power surge, brownout, lightning event, or a PDU / breaker trip on the feeder
  • Dashboard hashrate reads 60-70% of nameplate for more than 30 minutes with no thermal throttling reported and ambient under 30 °C
  • Rebooting the miner sometimes restores 3/3 temporarily, then a specific chain drops back out within minutes to hours (intermittent enumeration)
  • Pool shows dropped hashrate / stale-share spike timed to the chain loss event

Step-by-Step Fix

1

Hard-power-cycle at the PDU or breaker for a full 60 seconds. Not a Web UI reboot — a true AC disconnect. Wait the full minute for PSU bulk caps to discharge and the MM state machine to flush. Power back up and watch the dashboard for the missing chain to come back. The A13-series MM can wedge after a brownout, a network hiccup, or a reboot storm, and a cold boot recovers a meaningful fraction of CHAIN_FAIL tickets with zero tools. While waiting, note any red LEDs, unusual PSU fan behaviour, or burnt smell — those observations steer the rest of the triage. Log the event so you can spot a recurring pattern.

2

Confirm the AUC3 / Web UI is reachable before chasing the chain. Browse to `http://<miner-ip>/`, ping the miner IP from a laptop on the same subnet, and check the AUC3 LED colour (green = working per the A1066 manual pattern that applies across Canaan AUC3 revisions). If the Web UI is unreachable or the AUC3 LED is dead, the root cause is not CHAIN_FAIL — it is an AUC3 / network fault and the correct playbook is the AUC USB Connection Lost page or the AUC Controller Network Loss page. Only continue here if the control side is alive and reporting.

3

Read `SYSTEMSTATU`, `PS[0..2]`, `ECMM`, `ECHU[0..2]` via the cgminer JSON API on port 4028. `curl http://<ip>:4028 -d '{"command":"estats"}' -H "Content-Type: application/json"` returns the full telemetry surface that the Canaan Web UI hides. Identify which chain index is failing (0, 1, or 2). If `PS[0..2]` are all zero, PSU comms are dead — branch to the PSU playbook. If `ECHU[x]` is zero for one chain only and the other two are normal, this IS a CHAIN_FAIL and you are on the right page. Screenshot the full response before continuing — D-Central's bench will want it if the miner ends up shipped.

4

Review service history and physical-event history. Was the miner opened, reseated, paste-refreshed, or had a hashboard swapped in the last 30 days? Did a storm, brownout, breaker trip, or PDU event happen recently? A CHAIN_FAIL right after service is almost always mechanical (ribbon, power-sequence order, loose connector); a CHAIN_FAIL right after a power event is almost always electrical (`U1/U2/R8/R9`, PMIC damage, chip surge). A CHAIN_FAIL out of the blue on a miner that has run unchanged for months is most often ribbon-wear or connector-oxide from chassis-fan vibration. Service history is the single highest-value piece of context — log it before you touch a screwdriver.

5

Confirm ambient intake temperature at the intake grille. Canaan spec for the A13-family (inheriting the A1066 / A1246 envelope) is inlet air at or below 35 °C. A Canadian basement in February is never the problem, but a garage in July or a closed utility closet with bad return air can hit 40 °C+ at intake and stress the PSU and per-board regulators hard enough to drop a chain intermittently. Measure at the intake grille with an IR thermometer, not room-middle. If intake is out of spec, fix airflow first — a CHAIN_FAIL that is really a thermal-overload-masking-as-enumeration-fail will fight you forever otherwise.

6

DMM-verify PSU output at the control-board harness, first open-circuit and then under boot load. Kill AC. Disconnect the PSU-to-control-board harness. Probe DC at the harness tip: expect 12.0-12.6 V open-circuit. Reconnect firmly. Power the miner up and re-probe at the same point during the boot window (first 30 seconds): expect ≥11.8 V sustained. A PSU that reads 12.3 V open and sags to 9 V under the transient demand of three boards coming up simultaneously is tired, and the weakest chain (usually the longest ribbon path) is the first to drop out. This is the cheapest Tier-2 diagnostic that separates PSU faults from downstream faults definitively.

7

Swap to a known-good Canaan PSU from the A11/A12/A13 compatible family. The A1346 shares its PSU envelope with the rest of the Avalon bench fleet (A1056 / A1066 / A1066 Pro / A1126 Pro-S / A1146 Pro / A1166 Pro / A1246 / A1266 family per Canaan shop listings, with the A13 nameplate draw at the upper end of that envelope). Do NOT cross-connect a Bitmain APW-series PSU — Canaan and Bitmain pinouts differ and cross-brand PSU is a documented failure mode that kills control boards. Power up, observe 3/3 on the dashboard within the first 90 seconds. If the chain returns, the fault was PSU; if it does not, continue.

8

Re-seat every ribbon, harness, and signal cable in the chassis — focus on the failed chain. Kill AC. Open the chassis. Unplug the control-board-to-hashboard ribbon for the specific failing chain and inspect it under bright light: blackening, green oxide, corrosion, bent or recessed pins, cracked housings, any sign of heat damage near the crimp? Clean any dirty contact surfaces with 99% isopropyl alcohol and a lint-free swab; let evaporate fully. Re-seat firmly — feel and hear the click. Do the same for the PSU-to-control harness. Chassis-fan vibration on the A13-family is the dominant cause of intermittent CHAIN_FAIL on otherwise-healthy boards.

9

Cable-tie the AUC3 ribbon and PSU harness to the chassis frame. If Step 7 or 8 resolved the fault and you found wear or oxide at a connector, secure the affected cable to the chassis frame with a zip-tie so chassis-fan vibration cannot work it loose again. Zeus Mining documents this exact failure-and-fix pattern on the A1246 and it applies identically to the A13-family, which shares cooling hardware and vibration profile. This is a one-dollar fix that prevents the same miner from landing on your bench again next quarter.

10

Swap the AUC3 IIC ribbon for the failing chain if you have a spare. A cracked conductor inside the ribbon's flex section does not show on visual inspection — the only reliable isolation is a swap test. Ribbons across the A13-family (1326 / 1346 / 1366) and the A12-family (1246 / 1266) are compatible; pull a known-good spare from a parts-donor chassis or buy from a parts supplier. If the previously-failed chain enumerates after the ribbon swap, you have confirmed a dead ribbon; zip-tie the new one to the chassis per Step 9 and call it done.

11

Physically swap hashboards between slots to isolate board vs slot. Move the failing chain's hashboard into the slot of a known-good chain, and move the known-good board into the failed slot. Power up. If the failure follows the board, the board is the fault; ship or Tier-3 rework. If the failure stays with the slot, the control-board-side of that chain is the fault (AUC3 port, I2C buffer, local level-shifter, or ribbon socket); Tier-3 or ship. This single bisection test cuts diagnostic time in half and tells you decisively whether you are looking at a hashboard problem or a control-board problem.

12

Isolate one hashboard at a time by unplugging the other two. Kill AC. Disconnect two of three hashboard ribbons. Power up with only one board connected. Read `SYSTEMSTATU` — does it show board_count: 1? Repeat for each of the three boards individually. All three pass alone but fail together means the PSU cannot carry full three-board rail current (replace PSU). One specific board fails alone means that board has localized damage — U1/U2/R8/R9, PMIC, A3218 chip short — and is Tier-3 or Tier-4 repair. None enumerate individually means the control-board / AUC3 is the fault, Tier-3 or Tier-4.

13

Re-flash the factory MM firmware via the AUC3 Web UI if you suspect partial firmware corruption. Verify the image is specifically for the Avalon 1346 — cross-flashing a 1246 or 1366 image onto a 1346 will brick the control board because Canaan firmware signature-checks per-model. Canaan blocks downgrade, so only flash the current correct factory image or newer. Follow the Avalon Firmware Flash via AUC procedure. Do not interrupt the flash once it has started; a failed MM flash turns a CHAIN_FAIL into a full dead-chassis ticket that is only recoverable with a JTAG or a donor control board.

14

Capture the control-board serial log at boot via USB-TTL adapter. Connect an FT232 / CH340 / CP2102 USB-TTL adapter to the control-board UART header (silkscreen varies by AUC3 revision — check the board). Capture the full boot log. MM boot sequence, `MMCRCFAILED` lines tied to a specific chain index, `asic_init_fail: chip N no ACK` traces naming a specific A3218 chip position, IIC init failures, and chain-discovery-handshake timeouts all appear here. The serial log is the single highest-value diagnostic on a CHAIN_FAIL miner where the Web UI is alive but one board is silent — it tells you whether the failure is bus-level, chip-level, or power-sequence-level.

15

Measure U1 / U2 on the failing hashboard with a DMM in diode-check mode. Per the Zeus Mining A11/A12 repair guide (pattern applies across the A-series, including the A13-family), U1 and U2 are the hashboard's primary power-sequence MOSFETs and R8 / R9 are the companion sense/gate resistors. A burnt U1 or U2 from a reversed install sequence (the rule is negative-first on connect, negative-last on disconnect; reverse it and U1/U2/R8/R9 burn instantly) reads dead-short or open in diode mode. All four parts are cheap to replace. If you find a dead U1/U2/R8/R9 on the failing board and you have SMD rework skills, replace and re-test; if multiple boards share the same damage, a feeder event took them all out — stop and ship.

16

Tune CGMiner `--avalon7-aucspeed` and `--avalon7-aucxdelay` if serial logs show intermittent `MMCRCFAILED`. Per the cgminer source, defaults are `aucspeed 400000` (IIC bus clock in Hz) and `aucxdelay 19200` (transfer delay). If the IIC bus is marginal — worn ribbon, vibration-stressed connectors, mixed-vendor ribbon stock — halving `aucspeed` to 200000 or doubling `aucxdelay` to 38400 brings marginal chains back. This tuning knob is undocumented in Canaan materials and lives only in the cgminer CLI — a hallmark of A-series diagnostic lore that survives in the community and in repair-shop notebooks, not in the official docs.

17

Scope the 12 V rail at the failing hashboard's input during boot. A 50 MHz handheld scope captures the rail-up ramp during the MM power sequence. Healthy: clean step from 0 V to 12 V in tens of milliseconds, flat at 12 V thereafter with less than 200 mV of ripple. Damaged PSU or cable: oscillation, overshoot, or an incomplete ramp that never settles. Damaged per-board power sequence: rail comes up and immediately collapses as U1/U2 fail to latch. Compare the failing chain's rail capture side-by-side against one of the known-good chains — one scope capture separates PSU faults from per-board faults more reliably than DMM averaging.

18

Identify the failing A3218 chip position via serial log plus per-chip PVT telemetry. The Avalon 1346 hashboard hosts a long string of Canaan A3218 chips in series on the IIC chain. The serial log's `asic_init_fail: chip N` line names the specific position that is not ACKing; PVT_V and PVT_T arrays from the cgminer API corroborate (the position reports zero voltage / zero temperature while its neighbours report nominal). A single dead chip position stops the entire chain because the chain is serial. If you can narrow the failure to one or two specific chip positions, the board is a candidate for bench-reflow; if the failure is spread across many positions, the board is likely a parts-donor.

19

Reflow the identified A3218 chip position on a preheat + hot-air rework station. This is the hard-end of Tier 3 and demands real skill: pre-heat the bottom side of the hashboard to roughly 150 °C to avoid thermal shock, top-side hot air at approximately 320 °C with a fine nozzle centred on the failing chip, pull just enough heat to remove the chip without lifting neighbouring pads, clean the board pads with braid and fresh flux, place a replacement A3218 (donor chip from a dead-board parts pool — these are not available retail), and reflow down. Re-check continuity with DMM diode mode before re-installing the board in the chassis. If any of that sentence sounded hostile, skip to Step 21.

20

Inspect the control-board input stage for surge damage if Step 12 localized the fault to the control-board side. A surge event often takes out the control-board's 12 V to 3.3 V input MOSFET or the per-chain I2C level-shifter / buffer. Signature: no per-chain data on a known-good hashboard placed in the affected slot, or flicker-then-dark LED behaviour on the control board itself. Under magnification, look for hairline cracks in the MOSFET package, discolouration of solder pads, or cooked silkscreen near the input stage. Replacement is a hot-air rework job on the control board. D-Central carries parts-graded AUC3 control boards and keeps failure-pattern logs across the A11/A12/A13 fleet.

21

Stop DIY and ship to D-Central when a single chain persistently fails after a confirmed PSU swap, a confirmed ribbon swap, and a confirmed MM firmware re-flash — and your Tier-3 tool chest (preheater, hot-air station, DMM diode mode, scope) does not include the skill to work at BGA-adjacent chip level. Shotgun reflowing A3218 positions at home without a preheater or a controlled-temperature station lifts pads, damages adjacent traces, and turns a single-chip fault into a dead board. The economics rarely work out. D-Central's bench isolates each board on a programmable load, verifies each A3218's PVT against a known-good baseline, and only returns boards that pass 24-hour burn-in. You mail us a board that is recoverable, you get back a board that hashes.

22

Ship the miner properly. Pack the chassis in its original Canaan box if you still have it, or double-box with 5 cm or more of foam on every side. Include the PSU so our bench can test your exact stack — a tired PSU that mostly works on your bench may be the real root cause of what looks like a chain-level fault. Include a note listing serial number, observed symptoms (screenshots of the failing `SYSTEMSTATU`, any `PS / ECMM / ECHU / PVT` values you captured, any `asic_init_fail: chip N` serial-log lines), service history (when was it last opened, what was done, what was swapped), and your contact info. Saves our bench diagnostic time and saves you money. Canada-wide shipping standard, US / international welcomed.

23

Discuss the repair-vs-replace economics up front before committing bench hours. The Avalon 1346 is a late-A13 entry / mid model at 104 TH nameplate; a component-level hashboard rework plus burn-in at D-Central is typically CAD $150 - $350 per board, a full chassis rebuild (PSU plus three-board component-level plus control-board plus 24-hour burn-in) runs CAD $800 - $1,500, and a used 1346 on the secondary market runs CAD $1,100 - $2,200 depending on condition and BTC price. The math is not automatic either way. D-Central quotes honestly up front — if the repair math does not work for your power cost and current BTC price, we will tell you before your board hits the bench. Sometimes the right answer is salvage the PSU, sell the healthy boards, and put the cash toward a 1446 or 1466.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.