Current

Apple Mac Studio (M3 Ultra)

Apple · desktop appliance · Released March 2025

Apple Silicon's inference appliance: up to 192 GB unified memory at 800 GB/s, runs 70B+ models on a coffee-cup-sized box.

Hardware spec sheet

Vendor	Apple
Category	Appliance
VRAM / memory	128 GB
Memory bandwidth	800 GB/s
FP16 TFLOPS	—
INT8 TOPS	—
TDP	295 W
Architecture	Apple Silicon M3 Ultra
Form factor	desktop appliance
Release date	March 2025
Street price (USD)	3999-7999 MSRP
120V note	Runs fine on any normal outlet — 295W total system draw is pleb-friendly.

Apple launched the M3 Ultra Mac Studio in March 2025 — the inference appliance that quietly changed the game for plebs who want to run frontier-class models without building a rig. The M3 Ultra is two M3 Max dies fused via UltraFusion interconnect, giving up to 192 GB of unified LPDDR5 memory at roughly 800 GB/s. Apple Silicon descends from the A-series iPhone/iPad SoCs (the M1 in 2020 was essentially a scaled-up A14), and the unified-memory architecture traces back to Apple’s 2020 pivot away from Intel.

Who it’s for: professionals, developers, and plebs who value silence, simplicity, and the ability to run 70B+ models without a homelab. Not the fastest tok/s per dollar, but the most capable single device.

Models it runs comfortably: with 128 GB unified memory, Llama 3 70B at FP16, Qwen 2.5 72B at Q8, DeepSeek-R1 distillations. With 192 GB, Mixtral 8x22B at Q6 and Llama 3 405B distillations at Q4. MLX (Apple’s inference framework) keeps improving — credit to Apple’s MLX team and the llama.cpp Metal-backend contributors for making this usable.

Hashcenter notes: 295 W system-wide TDP, completely silent under typical inference load, desktop-appliance form factor (about the size of a stack of coasters). Runs fine on any normal outlet — Apple made this Hashcenter-friendly by default. Prices range \$3999 (96 GB) to \$7999+ (192 GB). Standing on the shoulders of the ARM ecosystem, TSMC, and Apple’s decade-plus Silicon effort.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Models that run on this hardware

Command R+ Command runs at Q8 / FP16 comfortably Llama 3.2 Llama runs at Q8 / FP16 comfortably Qwen 2.5 Qwen runs at Q8 / FP16 comfortably Llama 3.3 Llama runs at Q8 / FP16 comfortably Mixtral 8x7B Mistral runs at Q8 / FP16 comfortably Gemma 3 Gemma runs at Q8 / FP16 comfortably Gemma 2 Gemma runs at Q8 / FP16 comfortably Mistral Small 3 Mistral runs at Q8 / FP16 comfortably

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.

Apple Mac Studio (M3 Ultra)

Hardware spec sheet

Models that run on this hardware

Get it running

Related products, repair, and setup paths