Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

A1166_REBOOT Warning

Avalon 1166 Pro – Frequent Reboot

Avalon 1166 Pro reboots itself on a recurring 5-90 minute cadence with a `BOOTBY[0xNN.xxxxxxxx]` line on every restart. Root causes cluster into four buckets: PSU sag under sustained 3.4 kW load, ambient/thermal trip with `PVT_T > Tmax`, controller watchdog firing on a cgminer hang, or a hashboard chain timeout cascading to a chain-init reset. The BOOTBY prefix narrows which bucket; the `dmesg` + `/var/log/messages` from SSH narrows it to a chip.

Warning — Should be addressed soon

Affected Models: Avalon 1166 Pro — all SKUs (68T, 72T, 75T, 78T, 81T) on the MM313 control board with A3206 / A3210 ASIC family. Adjacent Avalon 1166 (base) and 1246 share the same diagnostic flow on `BOOTBY[0xNN]` reboots, but firmware-regression-specific causes (e.g. `BOOTBY[0x10]` on `22061301_be77c30_ef5defc`) are documented on the 1166 Pro 81T specifically.

Symptoms

  • Miner reboots itself on a recurring 5-90 minute cycle, day after day, regardless of pool, profile, or stock vs custom firmware
  • UI uptime counter never gets above the reboot interval — 1166 Pro has never held uptime for 24+ hours
  • Background log shows a `BOOTBY[0xNN.xxxxxxxx]` line on every restart; prefix is one of `0x02`, `0x03`, `0x10`, `0x11`, `0x12` (not `0x05`/`0x21`)
  • Pool dashboard shows recurring hashrate dropouts on a clock-like cadence — every 30 / 60 / 90 minutes
  • `cgminer-api estats` reports `PVT_T` per hashboard climbing toward 80-85 C in the minutes before each reboot
  • Fans ramp to 100% in the minutes before each reboot, then go quiet for ~90 seconds during the reboot, then ramp again
  • Wall power draw spikes briefly above nameplate at boot (3400 W to 3800 W transient), then settles — hard on a tired PSU
  • PSU output voltage measured at the board connector under load reads below the Avalon DC bus spec — sag is a primary reboot cause
  • `dmesg` on the controller shows USB re-enumeration events (`usb 1-1: USB disconnect` / `new high-speed USB device`) immediately preceding the reboot
  • `/var/log/messages` shows `cgminer` segfault, kernel oops, or `watchdog: BUG: soft lockup` inside the reboot window
  • Hashboard chain communication errors (`ECHU`, `chain timeout`, `MW0=0`) appear in `cgminer-api estats` minutes before the reboot
  • Reboot frequency tracks ambient — hotter day = more reboots, cooler day = fewer reboots — strongly suggests thermal-adjacent cause
  • Reboot frequency tracks load — same firmware behaves at 60T but reboots at 78-81T — strongly suggests PSU sag
  • Power-cycling at the breaker for 60 seconds doesn't clear the loop — fixes that 'stick' only after firmware downgrade or hardware swap

Step-by-Step Fix

1

Capture the BOOTBY code on three consecutive reboots. Open the miner web UI, navigate to the background log, screenshot the `BOOTBY[0xNN.xxxxxxxx]` line on each of the last three reboots. Same prefix three times = consistent cause; the prefix tells you the bucket. Mixed prefixes = multi-cause environment; fix the most-frequent one first. This screenshot is the single most useful piece of diagnostic data you can produce.

2

Verify ambient at the intake grille, not the room. IR thermometer 10 cm in front of the front grille while the miner is hashing at full power. Target ≤ 30 C. If you're at 35 C+, fix the environment (AC, ducting, move the miner) before touching firmware. A 1166 Pro on a 35 C summer day in an unventilated garage will reboot on schedule and there is no software fix for that.

3

Clean the intake filter. Shop-vac the filter, wipe the front grille, verify nothing within 15 cm of the grille is restricting airflow (rack-neighbour exhaust, curtain, dust buildup). A clogged filter walks `PVT_T` up 5-10 C over weeks and is the most-commonly-fixed cause of 'miner started rebooting after a few months.'

4

Hard power-cycle at the PDU for 60 seconds. Not a soft reboot. Full power off, wait 60 seconds for caps to drain, power back. Clears wedged driver state, transient PSU faults, and any latched flags in the MM313. If reboots stop and don't return for 24 h, the cause was transient and you can monitor.

5

Restart the cgminer service via API. `{"command":"ascset","parameter":"0,reboot,0"}` issues a clean MM-level reboot that should log as `BOOTBY[0x05]` (API reboot). If the next reboot logs as `0x05` followed quickly by another non-`0x05` BOOTBY, the underlying cause is unfixed and you need Tier 2.

6

Measure PSU output under load. Multimeter on DC, probe at the PSU-to-board connector while the miner is hashing at full power. Expected: Avalon DC bus rated voltage sustained. Sag > 5% = tired PSU. Common on APW-class PSUs older than 18 months at continuous 80%+ load, or on the wrong PSU model entirely. Swap PSU with a known-good unit and observe 24 hours.

7

Measure wall voltage under load. On 240 V split-phase expect 235-245 V at the outlet during the reboot interval; on 208 V commercial expect 202-212 V. If wall voltage sags below those bands when other loads kick on, your circuit is undersized or shared. The 1166 Pro at 81T needs a dedicated 20 A 240 V circuit minimum. Fix the electrical and the reboot loop often clears without touching the miner.

8

Pull `cgminer-api estats` and identify the worst-thermal hashboard. `telnet <miner-ip> 4028` then `{"command":"estats"}`. Record `PVT_T0/T1/T2` for each board. Force fans to 90% via `{"command":"ascset","parameter":"0,fan-spd,90"}` and re-measure 15 min later. If the worst board still runs > 80 C with fans at 90% and ambient < 30 C, that hashboard has thermal-pad degradation or a thermally-failing chip.

9

SSH the controller and tail the logs. `ssh root@<miner-ip>` (default password on the AUC3 sticker), then `tail -f /var/log/messages` and `dmesg -w` in two terminals. Watch for the next reboot live. Critical lines: `cgminer` segfault, `watchdog: BUG: soft lockup`, kernel oops, `usb disconnect` immediately before reboot, `chain N timeout` cluster. Time-correlated logs are how you pin down the bucket.

10

Re-seat AUC3 + every hashboard cable. Power off at the PDU. Unplug AUC3 USB at both ends; inspect pins for oxidation, pin walk, bent contacts; re-seat firmly with a click. Open the miner. Re-seat every hashboard data ribbon and power connector. Light dielectric grease on the AUC3 USB pins for damp environments (Canadian basement in spring). Power back, soak 24 hours. Resolves a meaningful fraction of 'controller-watchdog reboot' tickets at zero cost.

11

Downgrade MM firmware to a last-known-good build. Verify your build with `{"command":"version"}` — if it's the bitcointalk-confirmed bad build `22061301_be77c30_ef5defc`, downgrade to `22033101_4ec6bb0_49ce84a`. Source the image from canaan.io/support, the public avalonminer.org/firmware-document/ portal, or by emailing avalonsupport@canaan.io. Verify SHA against Canaan's published hash before flashing — wrong image bricks the MM313. Flash via AUC3 + Canaan's official upgrade tool. Soak 24 h.

12

Refresh thermal pads on the worst hashboard. Power off. Open the miner. Remove the top fan shroud. Unscrew the worst-thermal hashboard. Carefully lift — do NOT twist — to avoid cracking the edge connector. Clean old thermal pads with isopropyl alcohol 99% and a plastic scraper. Apply fresh Arctic TP-3 pads (~1.5 mm, matched to original thickness). Re-seat the hashboard, reconnect, close. On a 3-year-old 1166 Pro this step alone shaves 8-12 C off `PVT_T` under load and clears thermal-adjacent reboot loops permanently.

13

Verify and update AUC3 firmware. AUC3 has its own firmware separate from MM313. A mismatched AUC3 / MM313 firmware combo can cause USB re-enumeration glitches that trigger watchdog reboots. Canaan's upgrade tool can flash AUC3 firmware separately. Source the AUC3 image from canaan.io/support. Verify hash. Flash. Re-test.

14

If logs show repeated `chain N timeout` on the same hashboard, that board has a bad chip. Swap it into another slot to confirm the fault follows the board. Two options from there: ship to D-Central for chip-level repair, or run the miner with that hashboard physically removed at reduced hashrate (1166 Pro will boot on 2 of 3 boards, just at proportionally lower TH/s). Reduced-hashrate operation is a viable bridge until the bench can rebuild the bad board.

15

Stop DIY and ship to D-Central when the reboot loop survives two firmware builds + a known-good PSU + fresh thermal pads, when visible damage is present (scorched trace, bulging electrolytic, burnt smell), or when `dmesg` shows kernel oops on every reboot with no external cause. You're in test-fixture territory. Book a slot at d-central.tech/services/asic-repair/. 5-10 business days, Canadian workshop, ships Canada / US / international.

16

Preserve forensic data when you ship. Before packing, export the miner's full background log, record the exact MM firmware build string from `{"command":"version"}`, screenshot the `{"command":"estats"}` output showing per-hashboard `PVT_T`, save the `dmesg` and `/var/log/messages` outputs, and note the reboot cadence (every N minutes at ambient X C). Put all of that in a note with the shipment. Pack hashboards in anti-static bags, double-box with ≥5 cm foam every side, AUC3 wrapped separately. Bench technicians halve diagnostic time with forensic context, which halves your repair invoice.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.