Skip to content

We're upgrading our operations to serve you better. Orders ship as usual from Laval, QC. Questions? Contact us

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

NERDQAXE_HW_SPIKE Warning

NerdQAxe – Hardware Errors Spiking After Overclock

Hardware-error counter (HW%) climbing on the NerdQAxe / NerdQAxe+ four-chip BM1368 chain. Healthy chains sit under 1%; above 2% you're losing hashrate to chips returning bad nonces. Root causes cluster across silicon-lottery margin under overclock, PMIC drift / VCORE sag, paste pump-out on a single chip, EM noise on the UART daisy, and firmware autotune regressions. Fixable between a five-minute frequency rollback and a ~$30 BM1368 chip swap on the bench.

Warning — Should be addressed soon

Affected Models: NerdQAxe (4x BM1368, ~1.7 TH/s class), NerdQAxe+ (4x BM1368, ~2.5 TH/s class). NerdQAxe++ uses BM1370 and reports HW% differently - separate page applies.

Symptoms

  • AxeOS-style dashboard reports `HW%` (or `HW Error %`) above 1% sustained for 30+ minutes, having previously sat under 0.5% on the same firmware and tune
  • Realized hashrate sits 5-20% below nameplate even though all four BM1368 chips show as `4/4 detected` on boot
  • One specific chip position (chip 0/1/2/3) shows a per-chip HW% that's 2-5x the others when expanded in the web UI per-chip view
  • HW% drops immediately after a frequency or voltage change (silicon-lottery margin) - or climbs gradually over weeks (paste pump-out, capacitor drift, PMIC aging)
  • Pool-side rejected shares trending up while stratum subscribe / authorize / submit handshake is otherwise stable
  • One chip position runs visibly hotter (`+5 C` or more above chain average) under thermal camera or IR thermometer
  • HW% follows ambient temperature: cool morning = clean run, mid-afternoon room warm-up = HW% climb
  • PSU rail at the XT30 / barrel jack reads under 12.0V sustained while hashing (multimeter on DC, probe under load - not at idle)
  • Effective hashrate stays low after a stratum reconnect or a pool change - rules out pool-side reject confusion
  • Serial console (`115200 8N1`) shows repeated `HW error` / `nonce mismatch` / `chip N invalid nonce` lines tagged to the same chip number
  • No `Overheat Mode 75 C` shutdown banner, no `Chain init: N/4` chain-break message, no `PSU Error` - this is a chip-quality fault, not thermal or chain-enumeration
  • Onset correlates with: recent overclock push, firmware update changing autotune defaults, hot-running stretch, overdue paste service, or 12+ months of continuous operation

Step-by-Step Fix

1

Cold power-cycle. Pull the 12V at the PSU and the XT30 at the miner. Wait 30 seconds. Power back. Capacitor discharge is faster with both ends open. Re-power and watch HW% for 15 minutes. Sometimes a transient bus glitch or a stratum re-sync after a connection blip presents as a HW% spike that clears on a clean cold cycle. Free, fast, no risk - and it rules out firmware-state weirdness before you start spending diagnostic time on hardware.

2

Roll back the overclock to stock. In the web UI, set frequency and core voltage to the firmware's shipped defaults (490 MHz / 1.10V is the sane starting point on BM1368; some units shipped at 525 MHz). Save, reboot. Observe HW% for 15 minutes. If HW% falls under 1%, your tune was past the worst chip's silicon-lottery limit. Tier 2 will rebuild the OC slower with proper headroom checks.

3

Verify ambient at the miner intake. IR thermometer at the front grille - not room-middle, not the hallway. Target ambient is below 28 C for clean operation. NerdQAxe-class miners on a BM1368 chain start losing silicon margin past 30 C ambient because every chip's junction climbs and the worst chip's margin closes first. A USB desk fan blowing across the case is a meaningful cheap fix if your room runs warm.

4

Update or roll back firmware to a known-good stable build from the bitmaker-mining/NerdQAxePlus repository. Flash via OTA from the web UI, or use the NerdQAxe web flasher if OTA fails. If you're already on the latest and HW% is misbehaving, roll back one stable release - bleeding-edge builds occasionally regress autotune logic. Document the build version that gives clean HW% and pin it in your maintenance log.

5

Cross-pool test. Switch your stratum URL to a different pool (a well-known one like solo.ckpool.org or public-pool.io) for 30 minutes of hashing. If HW% changes meaningfully, pool-side reject behaviour was getting counted into your local HW% display - not actually a hardware fault. If HW% is identical on two pools, you've confirmed it's a hardware-side problem and you can stop chasing pool ghosts.

6

Measure 12V input under load. Multimeter on DC. Probe the XT30 or barrel jack input while the miner is hashing at full power. Expect 12.0V minimum, sustained. If you see 11.6V or lower under load, your PSU is tired or your circuit is undersized. Swap to a known-good 12V / 10A+ brick and retest. A sagging input rail directly drives PMIC under-regulation, which directly drives chain-wide HW% climb.

7

Measure VCORE on the BM1368 rail under load. Multimeter on DC, probe at a chip-bypass-cap pad or accessible test point on the chip-side rail. Expect 1.10V plus or minus 0.05V at stock frequency. If VCORE is sagging below 1.05V while input 12V is clean, the on-board PMIC has drifted or a local cap has failed - the most common gradual-degradation pattern on 12+ month units in D-Central's repair queue.

8

Re-seat the heatsink with fresh thermal paste. Power off, cool, disassemble. IPA-99% clean every BM1368 top and the heatsink contact face. Apply Arctic MX-6 or Thermal Grizzly Kryonaut - rice-grain dot per chip, spread by even torque, not by finger. Reassemble to spec torque pattern. Cold-cycle and retest. Roughly 30% of HW%-spike units that hit D-Central's bench resolve here because dry paste was raising the worst chip's junction temp into its margin-failure window.

9

Rebuild the overclock from stock, slowly. Stock baseline, +10 MHz per step, 15 minutes of stability observation between steps, stop at the step BEFORE any chip's HW% crosses 1%. That's this specific four-chip chain's silicon-lottery ceiling. It varies per unit. Document it in your maintenance log. Don't push past it just because someone in the Discord runs 560 MHz on theirs - silicon lottery is per-die, not per-model.

10

Voltage-frequency curve test. From stock, drop core voltage 10 mV per step while keeping frequency stock; watch HW%. Find the lowest VCORE that still gives clean HW%. Then climb frequency at THAT VCORE. The lowest-VCORE-clean-HW% pair is your unit's efficiency sweet spot - typically a chunk cooler, quieter, and longer-lived than the maximum-frequency tune. The home-miner play, and the play we recommend to operators running for income.

11

Thermal-walk the chain under load. Thermal camera or IR thermometer, log each BM1368 case temperature after 5 minutes of steady hashing. Healthy chain: temperatures within plus or minus 3 C. Suspect chip: +5 to +12 C above chain average and dominating per-chip HW%. Cross-check the hot chip's number against the per-chip HW% breakdown in the web UI. If they match, you've identified your cracked-joint / paste-pump-out / silicon-margin chip.

12

Inspect decoupling caps near the suspect chip. Magnification, good light. Look for cracked MLCCs, lifted pads, or visible discoloration on the small SMD caps surrounding the suspect BM1368. Cracked MLCCs are a common silent failure on 12+ month boards under continuous thermal cycling. Replace with same-package Murata or TDK parts using a hot-air station - not a soldering iron - to avoid lifting adjacent pads.

13

Reflow the suspect chip. Disassemble, flux the suspect BM1368 BGA from the side (no-clean flux wicks under via capillary action), preheat the bottom of the board to ~150 C to reduce thermal shock, top-side hot air at 310-330 C, slow circular motion, ~30 seconds total dwell. Watch for the package to settle a hair as the solder balls reflow. Cool naturally on the preheat for ~3 minutes, then off. Re-paste, reassemble, cold-cycle. Roughly 60% of cracked-joint HW% spikes recover at this step.

14

Replace dried or aged thermal pads if present. Some NerdQAxe assemblies use silicone thermal pads on the VREG / buck-converter section in addition to paste on the chips. Aged pads transfer heat poorly, push the buck-converter junction temp up, and accelerate PMIC drift. Replace with 1.5 mm or 2.0 mm silicone thermal pad of equivalent or higher conductivity. Verify thickness from the existing pads before ordering.

15

Roll firmware to last-known-good for your specific hardware revision. Verify your board revision against the bitmaker-mining/NerdQAxePlus hardware table before flashing. NerdQAxe and NerdQAxe+ chain versions use BM1368 with subtly different chain lengths and pinouts than the BM1370-based NerdQAxe++; flashing the wrong target bricks the controller. Pin a known-good firmware version, document it, only update on a meaningful changelog entry.

16

Stop DIY when: you've reflowed the suspect chip once and HW% returned within 30 days; you see capacitor bulging, lifted pads, or scorching that requires component-level rework; per-chip HW% isolates the same chip POSITION on two units in your fleet; PMIC suspected (VCORE drifts under load even with input 12V clean and capacitors visually intact). Book the D-Central NerdQAxe repair bench - test fixture, programmable load, and salvaged-grade BM1368 inventory live there.

17

D-Central bench process: programmable-load test fixture for the BM1368 chain, per-chip isolation under controlled VCORE and frequency sweep, hot-air rework station for chip replacement using salvaged-grade or new BM1368 chips when reflow doesn't hold, PMIC replacement when buck drift is the root cause, full reflow and re-paste, post-repair 12-hour burn-in at nameplate frequency to confirm HW% stays clean before shipping back.

18

Ship safely. Anti-static bag the unit, pad with at least 5 cm foam on every side in a double-walled box. Include a note with: observed HW% baseline and current value, firmware version, any per-chip HW% screenshots, the OC profile in use, ambient conditions, and your contact info. Diagnostic time is repair time - every minute the bench tech doesn't have to repeat your diagnostic is money saved on your invoice.

When to Seek Professional Repair

If the steps above do not resolve the issue, or if you are not comfortable performing these repairs yourself, professional service is recommended. Attempting advanced repairs without proper equipment can cause further damage.

Related Error Codes

Still Having Issues?

Our team of Bitcoin Mining Hackers has been repairing ASIC miners since 2016. We have seen it all and fixed it all. Get a professional diagnosis.