Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

A1366_WATCHDOG Warning

Avalon 1366 – BOOTBY Watchdog Reset

BOOTBY[0x08] kernel watchdog reset on Avalon 1366. cgminer userspace stopped feeding the kernel watchdog (typically because a mining thread hung on a hashboard timeout, a stratum socket, or a deadlock); after ~60 seconds the kernel rebooted the controller. Per Zeus Mining canonical table: 0x08 = watchdog, 0x10 = power event. Don't confuse them.

Warning — Should be addressed soon

Affected Models: Avalon 1366 — 130 TH/s nameplate · A3206-class ASIC chips on three hashboards · MM-family firmware (modified cgminer fork) on MM control board + AUC3 controller over USB / CAN

Symptoms

  • Avalon 1366 reboots itself on a clock — every 60-90 seconds, every few minutes, or under specific load conditions
  • Background log on the next boot prints `BOOTBY[0x08.xxxxxxxx]` — `0x08` = kernel watchdog reset per the Zeus Mining canonical table
  • `/var/log/avalon-cgminer.log` shows `cgminer` thread hang stacks: lines like `thread X stalled`, `chain Y timeout`, or stratum thread blocked on socket I/O before the reboot
  • `dmesg | grep watchdog` on the MM controller shows the kernel watchdog timer expiring — typical interval is 60 seconds without `cgminer` checking in
  • `{"command":"version"}` on port 4028 returns valid `MM Version` data when the miner is up — firmware is alive, the watchdog is firing because cgminer stops responding, not because the kernel itself crashed
  • Pool dashboard shows the miner offline on a repeating cadence — share submission stops dead, then resumes briefly after each reboot
  • `BOOTBY[0x08]` is dominant in recent reboots; you may see occasional `0x10` (power event) interspersed if PSU is also marginal — stacked faults
  • Loop disappears when you drop frequency or reduce voltage — points at silicon-lottery / thermal margin
  • Loop persists at stock frequency on stock firmware — points at hashboard hardware fault or firmware regression
  • `estats` reports all three hashboards detected before the reboot, but `DH%` climbs above 1.6% in the last poll before the watchdog fires
  • Fan PID does NOT ramp before the reboot — distinguishes `0x08` (cgminer hang) from `0x02` (thermal) where fans always spike first
  • You recently flashed an MM firmware update without confirming it against the running build — regression suspected

Step-by-Step Fix

1

Confirm the exact `BOOTBY` prefix from the background log — screenshot the full `BOOTBY[0x08.xxxxxxxx]` line. Cross-reference the Zeus Mining canonical reboot-cause table to confirm `0x08` = kernel watchdog reset. Don't take community reposts at face value — the canonical table is the reference. `0x10` is a power event, not a watchdog reset.

2

Pull 10-20 recent reboot entries to confirm `BOOTBY[0x08]` is dominant rather than mixed with `0x02` (thermal) or `0x10` (power). Mixed prefixes mean stacked faults — work the most frequent first, but knowing the others are present saves a second diagnostic round.

3

Query `{"command":"version"}` on port 4028 to record the current MM firmware build string. Compare against any known-stable build you've flashed previously. A recent unscheduled firmware change is the single most common silent variable on operator-managed Avalons.

4

Query `{"command":"estats"}` and pull `PVT_T`, `DH%`, and per-chain status across all three hashboards from the last poll before the reboot. Healthy 1366 at 130 TH/s nameplate runs most chips below 80 °C with `DH%` under 1.6%. Numbers above those thresholds at the time of reboot point at a thermal or silicon component underneath the watchdog symptom.

5

Verify intake ambient ≤ 30 °C with an IR thermometer pointed at the front grille — not the rack midline. Clean filters if dusty. The 1366 pulls roughly 3.25 kW at the wall; intake restriction walks chip temps up 5-10 °C across a day and pushes marginal silicon into the watchdog hang zone.

6

SSH into the MM controller (or use the web UI's log download) and pull `/var/log/avalon-cgminer.log`. Tail the last 500 lines. Grep for `thread`, `panic`, `timeout`, `chain`, `stratum`. The 60 seconds before the reboot is the watchdog's evidence — the hang stack lives there. Save the log for any future support ticket.

7

Run `dmesg | grep -i watchdog` on the MM controller. You should see kernel messages confirming the kernel watchdog driver pulled the trigger. If `dmesg` shows no watchdog activity but `BOOTBY[0x08]` still prints, the firmware is reporting a soft watchdog event — same diagnostic path, slightly different remediation.

8

Drop mining frequency 5-10% with `{"command":"ascset","parameter":"0,frequency,X"}`, replacing `X` with current minus 25-50 MHz for the 1366's `A3206`-class chips. Reboot once. Soak 1 hour. If the watchdog loop stops, you tuned past silicon-lottery on at least one chip; either accept the lower frequency or escalate to per-chip diagnostics in Tier 3.

9

Switch to a known-good failover pool (Solo CKPool, Public-Pool, or any pool you trust as stable). Observe 1 hour. If the loop stops on the new pool, the original pool's stratum behaviour was hanging the cgminer thread — file a ticket with that pool, stay on the failover until they fix.

10

Reseat all three hashboard data ribbons and power connectors. Power off at the breaker first. Inspect for blackening, oxidation, or bent pins. Listen for the click on reseat. AUC3-side: also reseat the USB cable from the host (Pi or controller PC) to the miner. Dielectric grease on oxidised contacts.

11

Multimeter on DC, probe PSU output at the board connector while the miner is hashing at full nameplate. Expect ≥ 13.8 V sustained on the 12 V rail. Sag below that during load = tired PSU producing brownouts that also cause `cgminer` thread crashes. Swap PSU with a known-good unit before assuming the watchdog hang is purely software.

12

Download Canaan's last-known-good MM firmware for your 1366 hardware revision from `avalonminer.org/firmware-document/` or `support.canaan.io` (login required). Verify the published SHA before you flash — wrong image bricks the MM. Flash via AUC3 using Canaan's official upgrade tool or D-Central's step-by-step guide. Do NOT interrupt mid-flash. Soak-test 24 hours. If the watchdog loop stops, pin this build and refuse auto-updates until an authoritative source confirms the next build is safe.

13

Refresh thermal pads on the hashboard with the highest pre-reboot `PVT_T`. Arctic TP-3 or equivalent at 1.5 mm matched to original. IPA 99% to clean old pads. An 8-12 °C drop on a 3-year-old 1366 is typical and resolves thermal-adjacent watchdog hangs where a chip was slowly walking past its silicon-lottery ceiling under load.

14

Inspect the MM control board and AUC3 for physical damage — scorched traces, bulging electrolytics on the hashboard LDOs, corroded USB pins. 1366s that ran 24/7 through two Canadian winters see enough thermal cycling to fatigue solder and dry electrolytics. Swap the AUC3 with a spare if you have one — a flaky AUC3 USB handshake can produce `cgminer` thread blocks that look identical to a hashboard timeout.

15

If watchdog fires at the same elapsed-seconds value every boot and `cgminer` never reaches the mining loop, suspect NVS or MM filesystem corruption. Reflash the full MM firmware image (not an update — a clean image). After reflash, factory-reset config via the web UI / API, re-enter pool credentials manually, and verify the miner reaches steady-state hashing before declaring victory.

16

Stop DIY and ship when: the watchdog loop persists across two different known-good MM firmware builds, you've reseated and swapped ribbons with no change, PSU rails measure clean under load, and a full reflash + NVS reset hasn't broken the cycle. You're past field-fixable. Book a D-Central Avalon repair slot. We bench-test under programmable load with Canaan's diagnostic binaries, replace MM / AUC3 modules from stocked inventory, and repair hashboards at the chip level. Turnaround 3-7 business days, Canadian workshop, ships Canada / US / international.

17

After any successful repair or firmware fix, pin the working build and start scraping `cgminer-api` proactively. Poll port 4028 every minute. Alert on any `BOOTBY[0xNN]` that isn't `0x05` (API-requested reboot, healthy). Trending `0x08` at low frequency is your week-of-warning before the silicon fails outright — treat the data as the early-warning system Canaan didn't ship.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.