Avalon 1446 – Hashboard Voltage Domain Fail
Warning — Should be addressed soon
Symptoms
- cgminer API estats on port 4028 reports one of PVT_V0/PVT_V1/PVT_V2/PVT_V3 at 0 mV while the other domains read in the 290-350 mV window
- MWx array slice corresponding to the dead domain is entirely zero or truncated; live domains on the same board still report normal MW values
- GHSmm versus GHSavg on the affected hashboard diverges by 25-33% — canonical 'one of three or four domains is dark' signature
- Realised chassis hashrate drops by ~12-16 TH/s without a corresponding fan-RPM change, PSU alarm, or temperature spike
- Web UI Device tab still shows Hash Boards: 3 — board is enumerated, just partially producing
- Thermal camera shows a sharply-bordered cold zone covering 8-12 adjacent chips while the rest of the board is at normal operating temperature
- Per-chip PVT_T array shows the cold chips holding at near-ambient with no thermal climb after 60 seconds at full chassis hashrate
- cgminer log shows no CHAIN_FAIL and no iic: no ACK — the management MCU is alive, only the rail telemetry is bad
- Front-panel status LED shows steady red (overloaded Avalon red — same indicator as several other A14-family faults)
- Pool side: stratum stable, reject rate flat, share submission rate proportionally lower
- PSU rail at the AUC input measures 11.9-12.2 V sustained under full load — upstream 12V is fine, fault is on the hashboard
- Fault appeared after a thermal-paste refresh, hashboard reseat, international shipment, or chassis drop
- Adjacent domains on the same board are unaffected — the cold zone has hard edges, not a soft thermal gradient
Step-by-Step Fix
Pull the full cgminer API response from port 4028 before you touch anything. curl http://<miner-ip>:4028 -d '{"command":"estats"}' -H 'Content-Type: application/json' (or nc if curl is unavailable). Read PVT_V0..3 per chain — find the domain at 0 mV. Save the response to a timestamped text file. This is the most valuable diagnostic on the 1446 platform; the dashboard hides everything that matters and the API hides nothing.
Hard power-cycle at the PDU for 60 seconds. Soft reboots through the Web UI do not clear every register state on the AUC3 or the on-board MCU; a true power-off does. Re-pull the API after 10 minutes of post-boot hashing. If the dead domain comes back, observe 24 hours before declaring it cleared. If it stays dead, the fault is electrical and the rest of this playbook applies.
Verify intake air temperature and chassis airflow. 35 °C inlet maximum; clean the intake filter; ensure 15 cm clearance around the chassis on every side. The 1446's PMICs run hot at baseline (70-85 °C junction at steady state); a stuffy install can push them into thermal-shutdown territory on a margin domain. Thermal is rarely the root cause but is the cheapest variable to control before bench work.
Verify firmware via the Web UI and the API. Stay current with post-20240301-series Canaan stable MM builds for the 1446. Pre-20231201 builds have documented MM ADC/sensor-path instability that can fire false PVT_V zeros. Canaan blocks downgrade — flash forward only, hardwired Ethernet only (community has documented mid-flash brick events on WiFi). Confirm the build works on another 1446 first.
Set explicit DNS to 8.8.8.8 / 1.1.1.1 via the Web UI. Default Canaan DNS assumes a China route; outside China, DNS failure breaks firmware handshakes which surface as sensor-path timeouts that can read as PVT flags. Community-documented quirk that crosses the entire Avalon line, not in Canaan official docs.
Measure the PSU rail at the suspect hashboard's 12V input lug under full load. Multimeter on DC, probe the input lug while the chassis is hashing at nameplate. Expect 11.9-12.2 V sustained. Below 11.8 V with the rest of the chassis fine = PSU output channel for this specific board is sagging — read PS[0] from the API for OC bits 512/1024/2048. If 12V is fine, the fault is on the hashboard itself; continue.
Re-seat the hashboard signal ribbon and the 12V bulk lugs. Power off at the PDU. Disconnect the data ribbon, inspect for bent pins or oxidation, reconnect firmly with the locking-tab fully seated. Inspect the bolted 12V copper lugs — torque to spec (~14 in-lb on the 1446; verify against your service docs), wipe oxidation off both faces with isopropyl. Marginal connections starve borderline PMICs.
Swap the suspect board into a known-good slot. Label slots 0/1/2 with tape. Move the suspect board to a known-good slot, restart, run 15-20 minutes, re-pull the API. Fault follows the board = hashboard-level issue, proceed to Tier 3. Fault stays in the slot regardless of which board sits there = control board / AUC3 / backplane issue, jump to Tier 4.
Measure line voltage at the panel under full load. Canadian 240 V split-phase: expect 235-245 V. North American 208 V commercial: 202-212 V. Low line voltage forces the PSU to pull more current, increases rail sag on the output channels, and starves marginal PMICs. Tired breakers and undersized residential panels are a recurring root cause of intermittent VDOM faults that mimic PMIC failure.
Thermal-camera scan under load. A FLIR ONE Pro on a phone is enough resolution. Reach steady-state (5-10 minutes), pop the lid, shoot the suspect board. Photograph the cold zone if visible. A sharply-bordered cold band of 8-12 chips confirms a PMIC drop on that domain. The boundary tells you the domain map without service documentation. Save the photo as your bench-work map for Tier 3.
Power-off bench probe sweep of the dead-domain PMIC. Board on the bench, multimeter in continuity mode. Identify the PMIC closest to the cold zone (input-side direction). Probe the PMIC output pad against a chip core-rail pad on the same domain — expect continuity. Probe input MLCCs for partial shorts. Probe FB-network resistors against design value (consult Zeus A11/A12 guide for general topology). Anything wildly out of spec is your first repair target.
Powered-on PMIC output measurement. Power the chassis with the lid off, board accessible. Probe the PMIC output pad with a DMM in DC mode against board ground. Expect 290-350 mV (A14 core target). 0 mV confirms the PMIC has dropped. Non-zero but below target indicates partial PMIC failure or fractured FB-network component.
Reflow the PMIC. First repair attempt — sometimes seats a marginal solder joint without a parts cost. Preheat the bottom of the board to ~150 °C on a preheat plate; hot-air the PMIC body topside at 310-330 °C for 25-30 seconds, let cool naturally. Re-apply thermal compound (Arctic MX-6 or Kryonaut). Re-test PVT_V after the board is fully cool plus 30 minutes of hashing. About 30% of PMIC drops resolve at this step.
Replace the PMIC if reflow fails. Source the part — Canaan does not publish PMIC part numbers for the A14 family; read it off the IC body and match through Bit2miner / community parts catalogues, or work from a donor 1446 / 1346 / 1366 / 1466 board with a known-good PMIC at the same position. Hot-air the dead PMIC off, clean pads with flux + braid, reflow the new PMIC in place, re-apply thermal compound. Parts cost: $2-8. Bench time: ~30 minutes.
Replace cracked MLCCs and cooked bulk caps on the affected domain. Visual inspection at 10-20x magnification. Cracked 0402/0603 ceramics get replaced with the same value (100 nF typical for input decoupling, 1 µF / 10 µF for bulk; verify against neighbouring domains as reference). Cooked bulk electrolytics: replace with same value, voltage rating, temperature rating. Total parts cost rarely exceeds $3 per repair.
Stop DIY when: PMIC reflow + replacement fails to restore the domain, per-chip continuity sweep finds a shorted A14 chip, two domains on the same board have failed, or any visible PCB damage. At that point you are in chip-replacement, board-level PCB rework, or scrap-the-board territory. Book a D-Central ASIC Repair slot — Tier 4 is also the right call if you do not own hot-air, a preheat plate, and a thermal camera.
D-Central bench process: test fixture loads the 1446 hashboard independently of the chassis with a programmable DC supply. Per-domain PMIC sweep with bench scope on switching nodes confirms regulator health independently of the on-board MCU's PVT readings. Failed PMICs replaced with graded salvage stock from donor A14-family boards. Failed A14 chips replaced when isolation confirms chip-side damage. Cracked caps replaced. 24-hour burn-in at nameplate. Canadian turnaround: 5-10 business days.
Ship the board safely. Anti-static bag, double-box with ≥5 cm foam on every side. Include the cgminer API response from Step 1, the thermal-camera photo from Step 10, current firmware version, and a note describing which domain index dropped (0/1/2/3) and any prior repair attempts. Complete context cuts D-Central diagnostic time in half, which cuts your repair cost in half.
When to Seek Professional Repair
If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.
Related Error Codes
Still Having Issues?
Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.
