A1366_WATCHDOG Warning

Avalon 1366 – BOOTBY Watchdog Reset

BOOTBY[0x08] kernel watchdog reset on Avalon 1366. cgminer userspace stopped feeding the kernel watchdog (typically because a mining thread hung on a hashboard timeout, a stratum socket, or a deadlock); after ~60 seconds the kernel rebooted the controller. Per Zeus Mining canonical table: 0x08 = watchdog, 0x10 = power event. Don't confuse them.

Warning — Should be addressed soon

Affected Models: Avalon 1366 — 130 TH/s nameplate · A3206-class ASIC chips on three hashboards · MM-family firmware (modified cgminer fork) on MM control board + AUC3 controller over USB / CAN

Réponse rapide

BOOTBY[0x08] kernel watchdog reset on Avalon 1366. First step: Confirm the exact `BOOTBY` prefix from the background log.

Symptoms

Avalon 1366 reboots itself on a clock — every 60-90 seconds, every few minutes, or under specific load conditions
Background log on the next boot prints `BOOTBY[0x08.xxxxxxxx]` — `0x08` = kernel watchdog reset per the Zeus Mining canonical table
`/var/log/avalon-cgminer.log` shows `cgminer` thread hang stacks: lines like `thread X stalled`, `chain Y timeout`, or stratum thread blocked on socket I/O before the reboot
`dmesg | grep watchdog` on the MM controller shows the kernel watchdog timer expiring — typical interval is 60 seconds without `cgminer` checking in
`{"command":"version"}` on port 4028 returns valid `MM Version` data when the miner is up — firmware is alive, the watchdog is firing because cgminer stops responding, not because the kernel itself crashed
Pool dashboard shows the miner offline on a repeating cadence — share submission stops dead, then resumes briefly after each reboot
`BOOTBY[0x08]` is dominant in recent reboots; you may see occasional `0x10` (power event) interspersed if PSU is also marginal — stacked faults
Loop disappears when you drop frequency or reduce voltage — points at silicon-lottery / thermal margin
Loop persists at stock frequency on stock firmware — points at hashboard hardware fault or firmware regression
`estats` reports all three hashboards detected before the reboot, but `DH%` climbs above 1.6% in the last poll before the watchdog fires
Fan PID does NOT ramp before the reboot — distinguishes `0x08` (cgminer hang) from `0x02` (thermal) where fans always spike first
You recently flashed an MM firmware update without confirming it against the running build — regression suspected

Step-by-Step Fix

Confirm the exact `BOOTBY` prefix from the background log — screenshot the full `BOOTBY[0x08.xxxxxxxx]` line. Cross-reference the Zeus Mining canonical reboot-cause table to confirm `0x08` = kernel watchdog reset. Don't take community reposts at face value — the canonical table is the reference. `0x10` is a power event, not a watchdog reset.

Pull 10-20 recent reboot entries to confirm `BOOTBY[0x08]` is dominant rather than mixed with `0x02` (thermal) or `0x10` (power). Mixed prefixes mean stacked faults — work the most frequent first, but knowing the others are present saves a second diagnostic round.

Query `{"command":"version"}` on port 4028 to record the current MM firmware build string. Compare against any known-stable build you've flashed previously. A recent unscheduled firmware change is the single most common silent variable on operator-managed Avalons.

Query `{"command":"estats"}` and pull `PVT_T`, `DH%`, and per-chain status across all three hashboards from the last poll before the reboot. Healthy 1366 at 130 TH/s nameplate runs most chips below 80 °C with `DH%` under 1.6%. Numbers above those thresholds at the time of reboot point at a thermal or silicon component underneath the watchdog symptom.

Verify intake ambient ≤ 30 °C with an IR thermometer pointed at the front grille — not the rack midline. Clean filters if dusty. The 1366 pulls roughly 3.25 kW at the wall; intake restriction walks chip temps up 5-10 °C across a day and pushes marginal silicon into the watchdog hang zone.

SSH into the MM controller (or use the web UI's log download) and pull `/var/log/avalon-cgminer.log`. Tail the last 500 lines. Grep for `thread`, `panic`, `timeout`, `chain`, `stratum`. The 60 seconds before the reboot is the watchdog's evidence — the hang stack lives there. Save the log for any future support ticket.

Run `dmesg | grep -i watchdog` on the MM controller. You should see kernel messages confirming the kernel watchdog driver pulled the trigger. If `dmesg` shows no watchdog activity but `BOOTBY[0x08]` still prints, the firmware is reporting a soft watchdog event — same diagnostic path, slightly different remediation.

Drop mining frequency 5-10% with `{"command":"ascset","parameter":"0,frequency,X"}`, replacing `X` with current minus 25-50 MHz for the 1366's `A3206`-class chips. Reboot once. Soak 1 hour. If the watchdog loop stops, you tuned past silicon-lottery on at least one chip; either accept the lower frequency or escalate to per-chip diagnostics in Tier 3.

Switch to a known-good failover pool (Solo CKPool, Public-Pool, or any pool you trust as stable). Observe 1 hour. If the loop stops on the new pool, the original pool's stratum behaviour was hanging the cgminer thread — file a ticket with that pool, stay on the failover until they fix.

Reseat all three hashboard data ribbons and power connectors. Power off at the breaker first. Inspect for blackening, oxidation, or bent pins. Listen for the click on reseat. AUC3-side: also reseat the USB cable from the host (Pi or controller PC) to the miner. Dielectric grease on oxidised contacts.

Multimeter on DC, probe PSU output at the board connector while the miner is hashing at full nameplate. Expect ≥ 13.8 V sustained on the 12 V rail. Sag below that during load = tired PSU producing brownouts that also cause `cgminer` thread crashes. Swap PSU with a known-good unit before assuming the watchdog hang is purely software.

Download Canaan's last-known-good MM firmware for your 1366 hardware revision from `avalonminer.org/firmware-document/` or `support.canaan.io` (login required). Verify the published SHA before you flash — wrong image bricks the MM. Flash via AUC3 using Canaan's official upgrade tool or D-Central's step-by-step guide. Do NOT interrupt mid-flash. Soak-test 24 hours. If the watchdog loop stops, pin this build and refuse auto-updates until an authoritative source confirms the next build is safe.

Refresh thermal pads on the hashboard with the highest pre-reboot `PVT_T`. Arctic TP-3 or equivalent at 1.5 mm matched to original. IPA 99% to clean old pads. An 8-12 °C drop on a 3-year-old 1366 is typical and resolves thermal-adjacent watchdog hangs where a chip was slowly walking past its silicon-lottery ceiling under load.

Inspect the MM control board and AUC3 for physical damage — scorched traces, bulging electrolytics on the hashboard LDOs, corroded USB pins. 1366s that ran 24/7 through two Canadian winters see enough thermal cycling to fatigue solder and dry electrolytics. Swap the AUC3 with a spare if you have one — a flaky AUC3 USB handshake can produce `cgminer` thread blocks that look identical to a hashboard timeout.

If watchdog fires at the same elapsed-seconds value every boot and `cgminer` never reaches the mining loop, suspect NVS or MM filesystem corruption. Reflash the full MM firmware image (not an update — a clean image). After reflash, factory-reset config via the web UI / API, re-enter pool credentials manually, and verify the miner reaches steady-state hashing before declaring victory.

Stop DIY and ship when: the watchdog loop persists across two different known-good MM firmware builds, you've reseated and swapped ribbons with no change, PSU rails measure clean under load, and a full reflash + NVS reset hasn't broken the cycle. You're past field-fixable. Book a D-Central Avalon repair slot. We bench-test under programmable load with Canaan's diagnostic binaries, replace MM / AUC3 modules from stocked inventory, and repair hashboards at the chip level. Turnaround 3-7 business days, Canadian workshop, ships Canada / US / international.

After any successful repair or firmware fix, pin the working build and start scraping `cgminer-api` proactively. Poll port 4028 every minute. Alert on any `BOOTBY[0xNN]` that isn't `0x05` (API-requested reboot, healthy). Trending `0x08` at low frequency is your week-of-warning before the silicon fails outright — treat the data as the early-warning system Canaan didn't ship.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

All Repair Services

Foire aux questions

What does the A1366_WATCHDOG error mean?

BOOTBY[0x08] kernel watchdog reset on Avalon 1366. cgminer userspace stopped feeding the kernel watchdog (typically because a mining thread hung on a hashboard timeout, a stratum socket, or a deadlock); after ~60 seconds the kernel rebooted the controller. Per Zeus Mining canonical table: 0x08 = watchdog, 0x10 = power event. Don't confuse them. Commonly reported on: Avalon 1366 — 130 TH/s nameplate · A3206-class ASIC chips on three hashboards · MM-family firmware (modified cgminer fork) on MM control board + AUC3 controller over USB / CAN.

Can I fix the A1366_WATCHDOG error myself?

This is a moderate repair that needs some hands-on ASIC experience and the right tools. Start with: Confirm the exact `BOOTBY` prefix from the background log — screenshot the full `BOOTBY[0x08.xxxxxxxx]` line. Cross-reference the Zeus Mining canonical reb... If you are not equipped for board-level work, D-Central can diagnose and repair it at our Laval bench.

How much does it cost to repair?

A DIY repair typically runs $63-$950 CAD depending on which part the fault traces to. D-Central can also diagnose and quote a mail-in bench repair.

What parts might I need to fix this?

Common replacement parts for this fault: Hashboard Thermal Paste (Cyan), Fluke Multimeter 15B+, STASIC Hashboard MultiTester Pro. The exact part depends on diagnosis - measure first.

Key Terms in This Fault

Jump to the full definition of the technical terms involved in this fault:

Hashboard Voltage domain Hashrate PSU Firmware Undervolting Overclocking Heatsink

Related Error Codes

Own your firmware — DCENT_OS (Antminer first)

DCENT_OS is D-Central’s open-source, GPL-3.0 firmware effort, now in public beta on Antminer (SHA-256) hardware — signed S9 and S19j Pro (Zynq/XIL) images are free to download. It is experimental and not production-ready. We build on the shoulders of the open-firmware projects that came before us, and we are starting with Antminer before widening hardware support. If you run Antminer gear, or just want firmware you can fully own and audit, grab the beta image. This is a free public beta, never a pre-order — collection only, we will not email you anything else yet.

I agree to D-Central storing my email to contact me about this. See our privacy policy.

Printable quick-reference cards

ASIC Miner Error-Code Quick-Reference Card — print-to-PDF one-pager
Stratum Share-Rejection Error Decoder Card — print-to-PDF one-pager
ASIC PSU & Connector Pinout Quick-Reference Card — print-to-PDF one-pager

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.

ASIC Repair Services Browse All Errors

Avalon 1366 – BOOTBY Watchdog Reset

Symptoms

Step-by-Step Fix

When to Seek Professional Repair

Foire aux questions

Key Terms in This Fault

Related Error Codes

Antminer - CGMiner / BMMiner Crash

Avalon - AUC USB Connection Lost

Avalon - AUC Controller Failure

Avalon - Firmware Flash via AUC

Avalon 1246 - AUC Communication Error

Avalon 1246 - Firmware Flash Failure

Avalon 1246 - Low Hashrate

Avalon 1166 - Low Hashrate

Avalon 1246 - Hashboard Not Detected

Avalon 1166 - Hashboard Not Detected

Own your firmware — DCENT_OS (Antminer first)

Printable quick-reference cards

Still Having Issues?

Produits, réparations et guides connexes