Skip to content

We're upgrading our operations to serve you better. Orders ship as usual from Laval, QC. Questions? Contact us

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Current

Apple Mac Studio (M3 Ultra)

Apple · desktop appliance · Released March 2025

Apple Silicon's inference appliance: up to 192 GB unified memory at 800 GB/s, runs 70B+ models on a coffee-cup-sized box.

Hardware spec sheet

VendorApple
CategoryAppliance
VRAM / memory128 GB
Memory bandwidth800 GB/s
FP16 TFLOPS
INT8 TOPS
TDP295 W
ArchitectureApple Silicon M3 Ultra
Form factordesktop appliance
Release dateMarch 2025
Street price (USD)3999-7999 MSRP
120V noteRuns fine on any normal outlet — 295W total system draw is pleb-friendly.

Apple launched the M3 Ultra Mac Studio in March 2025 — the inference appliance that quietly changed the game for plebs who want to run frontier-class models without building a rig. The M3 Ultra is two M3 Max dies fused via UltraFusion interconnect, giving up to 192 GB of unified LPDDR5 memory at roughly 800 GB/s. Apple Silicon descends from the A-series iPhone/iPad SoCs (the M1 in 2020 was essentially a scaled-up A14), and the unified-memory architecture traces back to Apple’s 2020 pivot away from Intel.

Who it’s for: professionals, developers, and plebs who value silence, simplicity, and the ability to run 70B+ models without a homelab. Not the fastest tok/s per dollar, but the most capable single device.

Models it runs comfortably: with 128 GB unified memory, Llama 3 70B at FP16, Qwen 2.5 72B at Q8, DeepSeek-R1 distillations. With 192 GB, Mixtral 8x22B at Q6 and Llama 3 405B distillations at Q4. MLX (Apple’s inference framework) keeps improving — credit to Apple’s MLX team and the llama.cpp Metal-backend contributors for making this usable.

Hashcenter notes: 295 W system-wide TDP, completely silent under typical inference load, desktop-appliance form factor (about the size of a stack of coasters). Runs fine on any normal outlet — Apple made this Hashcenter-friendly by default. Prices range \$3999 (96 GB) to \$7999+ (192 GB). Standing on the shoulders of the ARM ecosystem, TSMC, and Apple’s decade-plus Silicon effort.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Get it running

  1. 01 Install Ollama →

    Ten-minute local LLM runtime. One binary, zero cloud.

  2. 02 Give it a UI →

    Open-WebUI turns Ollama into a self-hosted ChatGPT.

  3. 03 Which runner? →

    LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.