AMD Strix Halo (Ryzen AI Max+ 395)
AMD · laptop/mini-PC · Released January 2025
AMD's mobile/mini-PC APU with up to 128 GB unified LPDDR5X — the AMD answer to Apple's unified-memory approach.
Hardware spec sheet
| Vendor | AMD |
|---|---|
| Category | APU |
| VRAM / memory | 128 GB |
| Memory bandwidth | 256 GB/s |
| FP16 TFLOPS | — |
| INT8 TOPS | — |
| TDP | 120 W |
| Architecture | Zen 5 + RDNA 3.5 |
| Form factor | laptop/mini-PC |
| Release date | January 2025 |
| Street price (USD) | 2000+ (system) |
| 120V note | Fits in sub-200W laptop/mini envelope — runs on any outlet or USB-C PD adapter. |
AMD Strix Halo (retail name Ryzen AI Max+ 395) launched in 2025 as AMD’s answer to Apple’s unified-memory inference story. A Zen 5 CPU combined with a 40-CU RDNA 3.5 iGPU and an XDNA 2 NPU, all sharing up to 128 GB of soldered LPDDR5X-8000 memory at 256 GB/s. AMD’s path here stands on decades of x86 work plus the Radeon RDNA lineage (RDNA 1 on the 5700 XT in 2019) and the Xilinx-acquired NPU lineage from the XDNA team.
Who it’s for: plebs who want one box that does development, inference, and general computing without the fan noise of a GPU rig. Mini-PC form factor (Framework Desktop, HP ZBook variants) and high-end laptops (ASUS ROG Flow, Razer Blade).
Models it runs comfortably: with 128 GB unified, Llama 3 70B at Q8, Mixtral 8x22B at Q4, Qwen 2.5 72B at Q5_K_M. ROCm and llama.cpp Vulkan backend are the practical runners on Linux; DirectML and ONNX Runtime on Windows. Expect 5–15 tok/s on 70B-class models — slower than a 4090 but the VRAM ceiling is ~5× higher.
Hashcenter notes: fits in sub-200W laptop/mini envelope (configurable TDP 45–120 W). Completely silent or near-silent in most chassis. Runs on USB-C PD or standard barrel adapters. Credit to AMD for bringing unified-memory inference to the x86/Linux ecosystem, and to the ROCm and llama.cpp communities for making the software stack usable.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
