Apple Mac Studio (M3 Ultra)
Apple · desktop appliance · Released March 2025
Apple Silicon's inference appliance: up to 192 GB unified memory at 800 GB/s, runs 70B+ models on a coffee-cup-sized box.
Hardware spec sheet
| Vendor | Apple |
|---|---|
| Category | Appliance |
| VRAM / memory | 128 GB |
| Memory bandwidth | 800 GB/s |
| FP16 TFLOPS | — |
| INT8 TOPS | — |
| TDP | 295 W |
| Architecture | Apple Silicon M3 Ultra |
| Form factor | desktop appliance |
| Release date | March 2025 |
| Street price (USD) | 3999-7999 MSRP |
| 120V note | Runs fine on any normal outlet — 295W total system draw is pleb-friendly. |
Apple launched the M3 Ultra Mac Studio in March 2025 — the inference appliance that quietly changed the game for plebs who want to run frontier-class models without building a rig. The M3 Ultra is two M3 Max dies fused via UltraFusion interconnect, giving up to 192 GB of unified LPDDR5 memory at roughly 800 GB/s. Apple Silicon descends from the A-series iPhone/iPad SoCs (the M1 in 2020 was essentially a scaled-up A14), and the unified-memory architecture traces back to Apple’s 2020 pivot away from Intel.
Who it’s for: professionals, developers, and plebs who value silence, simplicity, and the ability to run 70B+ models without a homelab. Not the fastest tok/s per dollar, but the most capable single device.
Models it runs comfortably: with 128 GB unified memory, Llama 3 70B at FP16, Qwen 2.5 72B at Q8, DeepSeek-R1 distillations. With 192 GB, Mixtral 8x22B at Q6 and Llama 3 405B distillations at Q4. MLX (Apple’s inference framework) keeps improving — credit to Apple’s MLX team and the llama.cpp Metal-backend contributors for making this usable.
Hashcenter notes: 295 W system-wide TDP, completely silent under typical inference load, desktop-appliance form factor (about the size of a stack of coasters). Runs fine on any normal outlet — Apple made this Hashcenter-friendly by default. Prices range \$3999 (96 GB) to \$7999+ (192 GB). Standing on the shoulders of the ARM ecosystem, TSMC, and Apple’s decade-plus Silicon effort.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
