Passer au contenu

Nous améliorons nos opérations pour mieux vous servir. Les commandes sont expédiées normalement depuis Laval, QC. Questions? Contactez-nous

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

ECHU | ECMM | MM_STATUS | SYSTEMSTATU Info

Avalon Series – MM_STATUS Log Decoder

Avalon MM firmware exposes a CGMiner-compatible JSON API on TCP port 4028. `{"command":"estats"}` dumps telemetry for hashboard enumeration (`SYSTEMSTATU`, `MM_STATUS`), module-management state (`ECMM`), hashboard error correction (`ECHU`), per-chain work (`MW0..2`), per-chip temp and voltage (`PVT_T`, `PVT_V`), PSU telemetry (`PS[0..2]` + 13-bit error bitmap), hashrate (`GHSmm`, `GHSavg`), and reboot cause (`BOOTBY`). Canaan publishes field names but not bit meanings or remediation. This page decodes every field.

Informational — Monitor and address as needed

Affected Models: All Avalon series on MM-family firmware — A1166 Pro, A1246, A1266, A1346, A1366, A1446, A1466, A1566 (and every minor SKU built on the same A3210 / A3206 / A3205 control stack)

Symptoms

  • Web UI shows `running` / `green` but realized hashrate is flat, low, or noisy and you want to know which subsystem to blame
  • `{"command":"estats"}` on port 4028 returns a valid reply but you don't know what the fields mean
  • `ECHU[x x x]` reads non-zero on one or more chains — hashboard error correction tripped
  • `ECMM[x]` reads non-zero — module-management / MM control layer fault
  • `MM_STATUS` reports `WORK_MODE FAULT`, `Error`, `Hot-Recovery`, or anything other than `WORK_MODE IDLE` / `WORK_MODE NORMAL`
  • `SYSTEMSTATU[0] Work: 2` (or `1` / `0`) instead of `Work: 3` — one or more hashboards failed to enumerate
  • `SYSTEMSTATU` shows `Hot-Recovery` after every reboot — firmware retried the chain before soft-shutdown
  • `PS[0]` / `PS[1]` / `PS[2]` all read `0` — AUC3-to-PSU telemetry channel dropped
  • `PS` error bitmap shows any of bits 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 set — a specific PSU fault is flagged
  • `MW0` / `MW1` / `MW2` arrays show large deltas between chains — one hashboard's work-accumulator is lagging
  • `PVT_T` array reports any entry > 85 C — thermal soft-shutdown territory
  • `PVT_V` array shows domain voltage drift > ±40 mV across chips on the same board
  • `GHSmm` (nameplate) and `GHSavg` (realized) diverge by > 10% for sustained time
  • You want the pre-fix `estats` snapshot before shipping a miner to D-Central
  • `BOOTBY[0xNN.xxxxxxxx]` is printing and you want the companion field decoder to pair with the reboot-cause table

Step-by-Step Fix

1

Open a terminal on any host on the same LAN as the miner. Pull the full `estats` reply: `echo -n '{"command":"estats"}' | nc <miner-ip> 4028`. Save the output to a timestamped text file named like `1246-$(date +%F).txt`. On Windows without `nc`, use PuTTY in raw mode to `<miner-ip>:4028` and paste `{"command":"estats"}` + Enter. The full reply is the diagnostic — web UI summaries hide most of the fields this page decodes. If you can't reach port 4028, the fault is network / AUC3 / MM-side — jump to the AUC3 network loss companion.

2

Read `SYSTEMSTATU` first. `SYSTEMSTATU[0] Work: 3` on a 3-board miner = all boards enumerated. `Work: 2` / `Work: 1` / `Work: 0` = enumeration failure — note which slot is missing and jump to the Avalon hashboard-not-detected companion for reseat procedure. `Hot-Recovery` in this field = firmware retried the chain before soft-shutdown, almost always thermal, move to step 5. `Error` in this field = pair with `ECHU` and `MM_STATUS`.

3

Scan `ECHU[a b c]` for non-zero entries. `ECHU[0 0 0]` = chains clean, move on. Non-zero on exactly one chain = swap that board to a different slot over the next 30 minutes and re-pull `estats`. Fault follows the board = bad board. Fault stays in the slot = AUC3 / ribbon / MM control path. Non-zero on all three chains simultaneously = AUC3 / MM path, not the boards themselves. This slot-swap test is the fastest hashboard-vs-controller isolation on any Avalon.

4

Read `ECMM[N]` and `MM_STATUS`. `ECMM[0]` + `MM_STATUS = WORK_MODE NORMAL` or `WORK_MODE IDLE` = MM layer healthy. Non-zero `ECMM` + `WORK_MODE FAULT` = MM control fault, proceed to firmware and AUC3 steps. `WORK_MODE FAULT` with clean `ECMM` but non-zero `ECHU` = chain-level fault, follow step 3. `MM_STATUS = Hot-Recovery` = firmware reduced work on the affected chain before soft-shutdown — thermal path.

5

Scan `PVT_T` array for any entry > 85 C. Shop-vac intake filter. Wipe grille. Verify intake ambient with an IR thermometer at the grille — not the middle of the room, not the hallway. Target ≤ 30 C for A11 / A12, ≤ 32 C for A13+. Above 35 C ambient, thermal fields will never read clean regardless of pad quality or airflow. Fix the room before you open the chassis. Intake dust alone resolves a meaningful percentage of `PVT_T` over-threshold entries.

6

Compare `MW0` / `MW1` / `MW2` across chains. Pull a second `estats` 10 minutes after the first. The three MW arrays should track each other within ±5%. One chain lagging 10-20% = that board has a thermal or voltage fault — pair with `PVT_T` / `PVT_V` on the lagging chain for isolation. Over 20% = hashboard approaching end-of-life or active throttle, move to Tier 3 pad refresh. All three chains lagging proportionally = pool / stratum issue or MM-wide throttle.

7

Query `{"command":"version"}` on port 4028 and compare your MM firmware build string against the last-known-good at `avalonminer.org/firmware-document/` (A1166 Pro: 20220926 family; A1246: 20230424 family; A13+: most recent stable release). If you're on a newer or older build, flash last-known-good over the web UI and soak-test 24 hours. Re-pull `estats` — `ECHU` and `ECMM` should clear if the fault was firmware-side. Do NOT interrupt mid-flash; bricked MM on A13+ is a bench recovery.

8

On A11 / A12 generations, reseat the AUC3 controller USB cable. Power off at the breaker first. Inspect pins for corrosion or blackening before reconnecting. A dab of dielectric grease on oxidised pins has cleared persistent non-zero `ECMM` + `ECHU[N N N]` patterns on multiple three-year-old A1246s in D-Central's repair queue. AUC3 handshake faults masquerade as chain-level faults until you prove them otherwise by reseating the controller.

9

Measure mains voltage under load at the PSU input with a multimeter. Standard Avalon PSUs expect 200-240 V and tolerate down to 195 V briefly. Sag below 195 V during hash bursts trips `PS` bitmap bit 1 (`Input_UV`) and cascades into `ECHU` faults across all three chains simultaneously. Dedicated 240 V circuit strongly preferred over 120 V for Avalon-class power draw — 120 V sags at peak demand and produces multi-chain `ECHU` patterns that look like hashboard faults.

10

Decode the `PS` error bitmap bit by bit. Sum of set bits: 1 (`Input_UV`), 2 (`OT1`), 4 (`OT2`), 8 (`OT3`), 16 (`OC_Pri`), 32 (`UV_out`), 64 (`OC_out`), 128 (`CS_error`), 256 (`OC_IOSA`), 512 (`OC_IOSB`), 1024 (`OC_IOSC`), 2048 (`FAN_error`). Work every set bit — `PS error 2049` = bits 1 + 2048 = mains sag + PSU fan failure, which is a different repair path than either bit alone. A single-bit interpretation misses the real fault pattern.

11

If `PS[0..2]` all read 0, reseat the PSU comm cable (A11 / A12 external PSU) or the internal control-to-PSU harness (A13+ integrated designs). Watch `estats` for 30 minutes — intermittent zero = loose connector, permanent zero = dead AUC3 or dead PSU comm silicon. Cross-brand PSU (e.g., Bitmain APW wired to a 1246) will show permanent `PS[0..2] = 0` because the comm protocol is Canaan-proprietary — use a Canaan PSU from the A1056..A1266 family.

12

Set DNS to `1.1.1.1` + `8.8.8.8` in the miner's network config. Drop Canaan's default `114.114.114.114` Chinese resolver — it hits intermittent latency from North American ISPs and produces `GHSavg` lag cadences that look like chain faults but clear instantly with a DNS swap. Swap your primary stratum pool for 30 minutes to a known-good mirror. If `GHSavg` recovers on the new DNS + pool, the original lag was pool-side, not silicon.

13

Refresh thermal pads on the hashboard whose `PVT_T` array reads highest. Arctic TP-3 or equivalent, 1.5 mm pad thickness matched to original. IPA 99% to clean old residue. An 8-12 C drop in `PVT_T` is typical on a three-year-old A11 / A12 and directly resolves thermal-adjacent `ECHU` and `MM_STATUS = Hot-Recovery` patterns. Newer A13+ hashboards use tighter pad specs — match exactly, don't over-compress, match the specific hashboard revision not just the model.

14

Walk the `PVT_V` array for the problem chain. Chips on the same board should track within ±20 mV. Drift beyond ±40 mV = domain voltage fault. Cross-reference `PVT_V` with `PVT_T` at the same chip position — hot chip + drifted voltage at the same chip position is chip degradation, not a general PSU fault. Mark those chip positions for reflow or replacement. `PVT_V` drift is the chip-level early-warning; log it weekly and you catch degradation six months before thermal failure.

15

Swap hashboards between slots to isolate fault source. Label slots 0/1/2 with tape. Move the suspect board to a known-good slot. Re-pull `estats`. `ECHU` fault follows the board = bad board, reflow or chip-level. Fault stays in the slot = AUC3 / ribbon / MM control path. This 30-minute physical test separates hashboard faults from controller faults faster than any software diagnostic, and saves you the shipping fee on a board that was never the problem.

16

Inspect MM control board and AUC3 controller under a loupe for scorched traces, bulging electrolytics near hashboard LDOs, corroded USB pins (A11 / A12), or cracked MLCCs near the PMIC. Canadian-garage operation through two winters of thermal cycling fatigues solder joints and dries electrolytics; visible damage = Tier 4, not DIY. Photograph the damage for the D-Central repair ticket. Don't attempt reflow on MM or AUC3 without test-fixture capability.

17

Stop DIY when: firmware rollback + AUC3 swap + PSU test all fail, `PVT_T > 85 C` persists after full pad refresh, `ECMM` stays non-zero across two different MM builds, or visible damage on MM / AUC3. You're in test-fixture territory. Book a D-Central Avalon repair slot at https://d-central.tech/services/asic-repair/ — 5-10 business day turnaround, Canadian workshop, ships Canada / US / international. Include the pre-fix `estats` snapshot in the ticket.

18

Ship with the pre-fix `estats` snapshot included. Anti-static bags, double-box with ≥5 cm foam on every side. Include a note listing: every observed non-zero field and its value, reboot cadence if relevant, MM firmware version string, ambient temp, and your contact info. D-Central bench process uses the pre-fix snapshot to skip the first 30-60 minutes of diagnostic, which saves you repair dollars. Pack for drop: the courier will drop your box at least once — assume it.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.