Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

A1566_VDOM_FAIL Warning

Avalon 1566 – Hashboard Voltage Domain Fail

Hashboard voltage domain failure on the Avalon 1566 — one of the 3-4 PMIC-fed voltage domains has dropped. PVT_Vx reports 0 mV on the affected domain while the others read in the 290-360 mV window; the chip group fed by that PMIC stops hashing simultaneously. Board still enumerates (chassis stays at 3/3 boards) but per-board hashrate drops 25-33% — roughly 15-20 TH/s lost on a ~185 TH/s flagship. Direct precursor to CHAIN_FAIL if the underlying fault progresses.

Warning — Should be addressed soon

Affected Models: Avalon 1566, Avalon 1566H; cross-applies with minor PMIC pinout differences to the broader A15 / A14 family (1346, 1366, 1446, 1466)

Symptoms

  • cgminer API estats on port 4028 reports one of PVT_V0/PVT_V1/PVT_V2/PVT_V3 at 0 mV while the other domains read in the 290-360 mV window
  • MWx array slice corresponding to the dead domain is entirely zero or truncated; live domains on the same board still report normal MW values
  • GHSmm versus GHSavg on the affected hashboard diverges by 25-33% — canonical 'one of three or four domains is dark' signature
  • Realised chassis hashrate drops by ~15-20 TH/s without a corresponding fan-RPM change, PSU alarm, or temperature spike
  • Web UI Device tab still shows Hash Boards: 3 — board is enumerated, just partially producing
  • Thermal camera shows a sharply-bordered cold zone covering 8-12 adjacent chips while the rest of the board is at normal operating temperature
  • Per-chip PVT_T array shows the cold chips holding at near-ambient with no thermal climb after 60 seconds at full chassis hashrate
  • cgminer log shows no CHAIN_FAIL and no iic: no ACK — the management MCU is alive, only the rail telemetry is bad
  • Front-panel status LED shows steady red (overloaded Avalon red — same indicator as several other A15-family faults)
  • Pool side: stratum stable, reject rate flat, share submission rate proportionally lower
  • PSU rail at the AUC input measures 11.9-12.2 V sustained under full load — upstream 12V is fine, fault is on the hashboard
  • Fault appeared after a thermal-paste refresh, hashboard reseat, international shipment, or chassis drop
  • Adjacent domains on the same board are unaffected — the cold zone has hard edges, not a soft thermal gradient
  • Pattern is consistent across reboots — a dropped PMIC does not recover from a power-cycle the way a firmware sensor ghost might

Step-by-Step Fix

1

Pull the full cgminer API response from port 4028 before you touch anything. curl http://<miner-ip>:4028 -d '{"command":"estats"}' -H 'Content-Type: application/json' (or nc if curl is unavailable). Read PVT_V0..3 per chain — find the domain at 0 mV. Save the response to a timestamped text file. This is the most valuable diagnostic on the 1566 platform; the dashboard hides everything that matters and the API hides nothing. The 1566 has no Web UI surface for per-domain voltage — the API is the only path.

2

Hard power-cycle at the PDU for 60 seconds. Soft reboots through the Web UI do not clear every register state on the AUC3 or the on-board MCU; a true power-off does. Re-pull the API after 10 minutes of post-boot hashing. If the dead domain comes back, observe 24 hours before declaring it cleared. If it stays dead, the fault is electrical and the rest of this playbook applies.

3

Verify intake air temperature and chassis airflow. 35 °C inlet maximum on the 1566; clean the intake filter; ensure 15 cm clearance around the chassis on every side. The 1566's PMICs run hotter at baseline than the A14 family (72-88 °C junction at steady state); a stuffy install can push them into thermal-shutdown territory on a margin domain. Thermal is rarely the root cause but is the cheapest variable to control before bench work.

4

Verify firmware via the Web UI and the API. The 1566 is the newest A15-class chassis in the field — stay current with Canaan's most recent stable MM build. Early-production 1566 firmware shipped with documented MM-side ADC/sensor-path instability that can fire false PVT_V zeros. Canaan blocks downgrade — flash forward only, hardwired Ethernet only (community has documented mid-flash brick events on WiFi). Confirm the build works on another 1566 first.

5

Set explicit DNS to 8.8.8.8 / 1.1.1.1 via the Web UI. Default Canaan DNS assumes a China route; outside China, DNS failure breaks firmware handshakes which surface as sensor-path timeouts that can read as PVT flags. Community-documented quirk that crosses the entire Avalon line, not in Canaan official docs. Worth setting on the 1566 because the chassis has only been in NA hands for a relatively short window.

6

Measure the PSU rail at the suspect hashboard's 12V input lug under full load. Multimeter on DC, probe the input lug while the chassis is hashing at nameplate. Expect 11.9-12.2 V sustained. Below 11.8 V with the rest of the chassis fine = PSU output channel for this specific board is sagging — read PS[0] from the API for OC bits 512/1024/2048. The 1566 draws more current per board than the 1466, so PSU sag shows up under 1566 load that wouldn't surface on a 1466. If 12V is fine, the fault is on the hashboard itself; continue.

7

Re-seat the hashboard signal ribbon and the 12V bulk lugs. Power off at the PDU. Disconnect the data ribbon, inspect for bent pins or oxidation, reconnect firmly with the locking-tab fully seated. Inspect the bolted 12V copper lugs — torque to manufacturer spec (verify against your service docs; typical Avalon-class lug torque is 12-16 in-lb), wipe oxidation off both faces with isopropyl. Marginal connections starve borderline PMICs.

8

Swap the suspect board into a known-good slot. Label slots 0/1/2 with tape. Move the suspect board to a known-good slot, restart, run 15-20 minutes, re-pull the API. Fault follows the board = hashboard-level issue, proceed to Tier 3. Fault stays in the slot regardless of which board sits there = control board / AUC3 / backplane issue, jump to Tier 4.

9

Measure line voltage at the panel under full load. Canadian 240 V split-phase: expect 235-245 V. North American 208 V commercial: 202-212 V. Low line voltage forces the PSU to pull more current, increases rail sag on the output channels, and starves marginal PMICs. The 1566's higher chassis power draw makes line-voltage sensitivity worse than older Avalons — a circuit that handled a 1466 cleanly may not handle a 1566 without measurable sag.

10

Thermal-camera scan under load. A FLIR ONE Pro on a phone is enough resolution. Reach steady-state (5-10 minutes), pop the lid, shoot the suspect board. Photograph the cold zone if visible. A sharply-bordered cold band of 8-12 chips confirms a PMIC drop on that domain. The boundary tells you the domain map without service documentation. Save the photo as your bench-work map for Tier 3. The 1566's chip layout is denser than the 1466's, so the cold-zone boundary is sharper on the thermal image.

11

Power-off bench probe sweep of the dead-domain PMIC. Board on the bench, multimeter in continuity mode. Identify the PMIC closest to the cold zone (input-side direction). Probe the PMIC output pad against a chip core-rail pad on the same domain — expect continuity. Probe input MLCCs for partial shorts. Probe FB-network resistors against design value (consult Zeus A11/A12 guide for general topology — Canaan does not publish A15 schematics). Anything wildly out of spec is your first repair target.

12

Powered-on PMIC output measurement. Power the chassis with the lid off, board accessible. Probe the PMIC output pad with a DMM in DC mode against board ground. Expect 290-360 mV (A15 core target — slightly higher than the A14 generation's 290-350 mV window). 0 mV confirms the PMIC has dropped. Non-zero but below target indicates partial PMIC failure or fractured FB-network component.

13

Reflow the PMIC. First repair attempt — sometimes seats a marginal solder joint without a parts cost. Preheat the bottom of the board to ~150 °C on a preheat plate; hot-air the PMIC body topside at 310-330 °C for 25-30 seconds, let cool naturally. Re-apply thermal compound (Arctic MX-6 or Kryonaut). Re-test PVT_V after the board is fully cool plus 30 minutes of hashing. About 30% of PMIC drops resolve at this step.

14

Replace the PMIC if reflow fails. Source the part — Canaan does not publish PMIC part numbers for the A15 family; read it off the IC body and match through Bit2miner / community parts catalogues, or work from a donor 1566 / 1466 / 1446 board with a known-good PMIC at the same position. The A14 and A15 families share PMIC families in most positions but not all — verify pinout before committing the swap. Hot-air the dead PMIC off, clean pads with flux + braid, reflow the new PMIC in place, re-apply thermal compound. Parts cost: $2-10. Bench time: ~30-45 minutes.

15

Replace cracked MLCCs and cooked bulk caps on the affected domain. Visual inspection at 10-20x magnification. Cracked 0402/0603 ceramics get replaced with the same value (100 nF typical for input decoupling, 1 µF / 10 µF for bulk; verify against neighbouring domains as reference). Cooked bulk electrolytics: replace with same value, voltage rating, temperature rating. Total parts cost rarely exceeds $3 per repair.

16

Stop DIY when: PMIC reflow + replacement fails to restore the domain, per-chip continuity sweep finds a shorted A15 chip, two domains on the same board have failed, or any visible PCB damage. At that point you are in chip-replacement, board-level PCB rework, or scrap-the-board territory. Book a D-Central ASIC Repair slot — Tier 4 is also the right call if you do not own hot-air, a preheat plate, and a thermal camera, especially on a 1566 where a botched DIY repair scraps a $550+ hashboard.

17

D-Central bench process: test fixture loads the 1566 hashboard independently of the chassis with a programmable DC supply. Per-domain PMIC sweep with bench scope on switching nodes confirms regulator health independently of the on-board MCU's PVT readings. Failed PMICs replaced with graded salvage stock from donor A14/A15-family boards. Failed A15 chips replaced when isolation confirms chip-side damage — A15 chip stock is harder to source than A14 stock, so D-Central's bench inventory matters here. Cracked caps replaced. 24-hour burn-in at nameplate. Canadian turnaround: 5-10 business days.

18

Ship the board safely. Anti-static bag, double-box with ≥5 cm foam on every side. Include the cgminer API response from Step 1, the thermal-camera photo from Step 10, current firmware version, and a note describing which domain index dropped (0/1/2/3) and any prior repair attempts. Complete context cuts D-Central diagnostic time in half, which cuts your repair cost in half. The 1566 is new enough that we appreciate field-data context — every well-documented 1566 repair we receive helps refine the bench process for the next one.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.