Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

AI Self-Hosting

Run Coding Agents Fully Offline on Local Models (Air-Gapped, Sovereign)

· · ⏱ 10 min read

Here is a sentence that costs people real money to misunderstand: your coding agent and the model it talks to are two different things. The agent is the harness — the CLI that reads your repo, runs commands, edits files, and loops until the job is done. The model is the brain it phones for every step. Most of the time that brain lives in someone else’s data center, and every line of your private codebase rides the wire to get there.

This post is about cutting that wire. We will be precise about what “offline” actually means, what runs air-gapped today, what hardware you need, and — because accuracy is sacred around here — exactly where the honest tradeoff bites. If you have already self-hosted a model with our Ollama walkthrough, this is the next rung: pointing a real coding agent at that local model and pulling the network cable.

This is one more layer decentralized. Your node validates Bitcoin on your own metal. Your Lightning routes value on your own metal. Now the agent that touches your source code can run on your metal too — no keys uploaded, no proprietary repo streamed to a logging pipeline you do not control.


Why air-gap a coding agent at all

For a lot of work, sending code to a hosted frontier model is fine and the results are excellent. But there are real scenarios where shipping your repo off-box is a non-starter:

  • API keys and secrets. Agents read your environment, your config files, your .env. One careless context dump and a hosted model has seen credentials you would never paste into a chat box on purpose.
  • Proprietary IP. If your codebase is the business — a trading strategy, unreleased firmware, a client’s private system — “we promise we don’t train on it” is a policy, not a guarantee. Air-gapping is a guarantee.
  • Regulated and contractual work. Some contracts and jurisdictions simply forbid third-party data processing. An offline agent keeps the data on hardware you physically control.
  • Sovereignty. Same thesis as running your own Bitcoin node: don’t rent a capability you can own. A metered, surveilled, rate-limited brain that can be deplatformed tomorrow is a dependency. Local weights on your disk are a backup that can’t be revoked.

None of this means hosted models are bad. It means there is a class of work where the right answer is “keep it on my box,” and until recently that was hard to do well. It is getting easier.


The honest reality: which part is actually “offline”

This is the section that most “run Claude Code offline” posts get wrong, so we are going to be exact.

Claude Code is Anthropic’s command-line agent, and by default it calls Anthropic’s hosted Claude models over the network. It does not ship Claude’s weights, and it does not run Claude itself on your laptop. There is no download that puts a frontier Anthropic model on your SSD. So the phrase “Claude Code running fully offline with Claude” is a category error — the Claude model lives in Anthropic’s infrastructure, full stop. Credit where it is due: Anthropic built an excellent agent harness, and the harness is genuinely the valuable part of the workflow.

So what does go offline? You swap the brain, not the harness. The honest air-gapped pattern is:

  • Keep the agent workflow — the read-repo, run-command, edit-file, loop-until-done loop that makes these tools worth using.
  • Replace the hosted model with a local model served from your own machine, through tooling the agent already understands.

The plumbing that makes this work is the OpenAI-compatible local endpoint. Ollama (and LM Studio, and llama.cpp’s own server) can expose your local model at an address like http://localhost:11434/v1 that speaks the same API dialect the cloud uses. Any agent that lets you point at a custom base URL and model name can then talk to your local brain instead of a remote one. That is the whole trick: the agent thinks it is calling an API; the API just happens to be a model running three inches from the CPU.

Diagram: agent CLI → OpenAI-compatible endpoint (localhost:11434/v1) → local model weights on disk. No external network hop.

So when someone says “I run my coding agent offline,” the accurate translation is: I run a local open-weight model via Ollama, and I point an agent harness at that local endpoint. The model is the offline part. The agent is the workflow part. Keep those two ideas separate and everything else clicks into place.


What actually runs offline today

Two pieces have to line up: an open-weight model you can serve locally, and an agent harness that can be pointed at a local OpenAI-compatible endpoint.

The local model (the brain)

You serve the model with the same tools we cover in the self-hosting series. Ollama is the easiest on-ramp — one command pulls a model and exposes the endpoint. For coding work, the open-weight landscape is genuinely strong now: code-tuned releases like Qwen’s coder models, DeepSeek’s coder line, Meta’s Code Llama lineage, Mistral’s code models, and instruction-tuned community fine-tunes from the broader open-weight ecosystem (the Nous Research crowd and many others) all run locally. Pull whichever fits your hardware; we will get to the VRAM math below.

If you have not stood up a local model yet, start with the 10-minute Ollama install, then come back. The runner comparison covers when LM Studio or raw llama.cpp suits you better — all three can expose the OpenAI-compatible endpoint an agent needs.

The agent harness (the workflow)

This is where you choose a CLI that supports a configurable backend. OpenAI’s Codex CLI is the cleanest honest example: it is an open-source coding agent, and because it is built to speak the OpenAI API, you can point it at a local OpenAI-compatible server (Ollama, LM Studio, or llama.cpp’s server) and have it drive a model on your own box. That is a real, supported air-gapped path: open agent + open model + local endpoint. Credit to OpenAI for open-sourcing the Codex CLI, which makes this possible at all.

Other community agent harnesses follow the same pattern — anything that lets you set a custom base URL and model name can be aimed at localhost. The differentiator is always the same question: does this tool let me override the endpoint? If yes, it can go offline. If it hard-codes a vendor’s cloud, it cannot.

And to close the loop on the tool we name in the title: Claude Code is the agent we love, but its model is hosted, so the offline build uses an open agent harness pointed at a local model instead. You keep the idea of the agentic workflow Anthropic popularized; you swap in parts you can run air-gapped. No misrepresentation, no magic download.


The hardware: a GPU box, not a miner

Let us kill the most common confusion before it costs anyone a purchase. This runs on a GPU, not on an ASIC. A Bitcoin miner is fixed-function SHA-256 silicon — no floating-point units, no tensor cores, no VRAM to hold model weights. It physically cannot run a language model, and no firmware changes that. We spelled out exactly why in Can You Actually Run AI on a Bitcoin Miner? The short version: the ASIC hashes Bitcoin; a GPU runs your AI; they live side by side. Offline coding agents belong entirely to the GPU side of the room.

VRAM is the gate. The model’s weights have to fit in GPU memory for usable speed, and code work tends to want larger context windows than a casual chat, which costs additional memory. Rough guidance for code-capable local models at Q4 quantization:

VRAMWhat it runsReality check
8 GB7B–8B code modelsUsable for autocomplete, small edits, single-file reasoning. Tight on context.
12–16 GB13B–14B code modelsThe practical floor for an agent that edits multiple files and holds a real task in its head.
24 GB30B, or quantized largerThe pleb’s sweet spot. A used RTX 3090’s 24 GB is the value pick for serious local coding.
48 GB+Larger quantized modelsDual-GPU or a 48 GB card. Closer to frontier behavior, still not equal to it.

A 24 GB card is where local coding agents stop feeling like a toy. If you want to build the box without paying new-silicon prices, refurbished GPUs are exactly the kind of hardware we keep in the shop — the same place a home miner picks up a used ASIC for the Bitcoin side of the bench. We are not Amazon and we do not promise overnight shipping; we are hand-built, build-to-order Bitcoin mining hackers, and lead times are estimates, not guarantees. Buy the VRAM, not the hype.


The capability gap: be honest about it

Here is the tradeoff, stated plainly because pretending otherwise would make us liars. A frontier hosted model is, today, meaningfully stronger than what you can run on a single consumer GPU. The biggest cloud models have hundreds of billions of parameters and run on hardware no home lab can match. Your local 14B or 30B code model is good — genuinely useful — but it will lose to a frontier model on hard, multi-step, large-context reasoning. That gap is real and it is not closing tomorrow.

So what does air-gapping actually buy you? Not more raw intelligence. It buys privacy, sovereignty, and offline operation:

  • Your code never leaves the building. Full stop.
  • No keys, no rate limits, no metered token bill, no terms-of-service that can change under you.
  • It works with the network cable pulled — on a boat, in a bunker, behind an air gap, during an outage.
  • The capability is yours. It cannot be deplatformed, throttled, or priced out of reach next quarter.

The smart pattern for most people is a split: hosted frontier models for the gnarly, non-sensitive reasoning where raw capability wins, and a local model for the sensitive repos and the offline work where privacy is the whole point. Sovereignty is not “never use the cloud.” It is “always be able to walk away from it.” Local AI is the backup to rented intelligence the same way Bitcoin is the backup to fiat — you keep the option, and the option is the freedom.


Operator note: how we think about this

We are Bitcoin mining hackers. We repair what the industry throws away, we read the datasheets, and we run our own infrastructure because renting it back is the thing we are trying to escape. Local coding agents fit that ethic exactly: own the brain, own the box, own the workflow.

It is the same logic behind DCENT_OS, the open-source mining firmware we are building for industrial Antminers — make the machine yours, with no mandatory dev fee and a 100% GPL-3.0 codebase as our honest beta target, standing on the shoulders of the firmware projects (Braiins, VNish, LuxOS) that proved custom Antminer firmware was possible in the first place. To be precise about scope: DCENT_OS is in closed beta on the S9 today, with the rest of the lineup on the roadmap, and it controls the mining side — it does not run AI, because nothing turns an ASIC into AI silicon. Own your money, own your compute, own your firmware: it is all the same move, made one layer at a time.

If this is your direction, the Bitcoin × AI hub maps the sovereign-compute side, and the sovereignty hub ties the whole “backups” stack — money, communication, and now cognition — together.


FAQ

Can Claude Code run fully offline?

No — not with Claude itself. Claude Code is Anthropic’s agent CLI, and by default it calls Anthropic’s hosted Claude models over the network; Claude’s weights are not something you download and run on your machine. The honest offline path is to take an open agent harness (such as OpenAI’s open-source Codex CLI, or another community agent that lets you set a custom endpoint) and point it at a local open-weight model served via Ollama. You keep the agentic workflow Anthropic popularized; you swap the hosted brain for one you run air-gapped.

How do I point a coding agent at a local model with Ollama?

Serve the model with Ollama, which exposes an OpenAI-compatible endpoint (typically http://localhost:11434/v1). Then configure your agent — for example OpenAI’s Codex CLI — to use that base URL and the local model’s name instead of a cloud provider. Any agent that allows a custom base URL and model can be aimed at localhost the same way. The agent treats your local model exactly as it would a cloud API; the difference is the request never leaves your machine.

Is a local coding model as good as a frontier hosted model?

No, and we won’t pretend otherwise. Frontier hosted models have hundreds of billions of parameters and run on hardware no home lab matches, so they win on hard, multi-step, large-context reasoning. A local 14B–30B code model is genuinely useful but not equal on the toughest tasks. What offline buys you isn’t more raw intelligence — it’s privacy, sovereignty, and the ability to keep working with the network cable pulled. Many people split the difference: cloud for hard non-sensitive work, local for sensitive repos and offline work.

Can I run an offline coding agent on a Bitcoin miner?

No. A Bitcoin ASIC is fixed-function SHA-256 silicon with no floating-point units, no tensor cores, and no VRAM to hold model weights, so it cannot run a language model — and no firmware changes that. Offline coding agents run on a GPU box, not on a miner. The ASIC keeps hashing Bitcoin; a GPU next to it runs your local AI. We cover exactly why in our piece on whether you can run AI on a Bitcoin miner.

What hardware do I actually need to run an agent on a local model?

A GPU with enough VRAM for your model. As a rough guide at Q4 quantization: 8 GB handles 7B–8B models for light edits, 12–16 GB is the practical floor for a multi-file agent, and 24 GB (a used RTX 3090 is the value pick) runs 30B-class models comfortably and is where local coding stops feeling like a toy. Code work also wants larger context windows than casual chat, which costs extra memory, so when in doubt, buy more VRAM. Refurbished cards keep the cost sane.

ASIC Repair Cost Estimator Get an instant repair price estimate for your ASIC miner by model and issue type.
Try the Calculator

Bitcoin Mining Experts Since 2016

ASIC Repair Bitaxe Pioneer Open-Source Mining Space Heaters Home Mining

D-Central Technologies is a Canadian Bitcoin mining company making institutional-grade mining technology accessible to home miners. 2,500+ miners repaired, 350+ products shipped from Canada.

About D-Central →

Related Posts

Bitcoin × AI

An MCP Tool That Pays Per Call Over L402

Gate an MCP tool behind an L402 paywall so any AI agent that calls it pays sats per invocation — no accounts, no API keys, no middleman. The reference shape, what you can sell, and the honest limits.

Start Mining Smarter

Whether you are heating your home with sats, building a Bitaxe, or scaling up — D-Central has the hardware, repairs, and expertise you need.

Start Mining Smarter

Whether you are heating your home with sats, building a Bitaxe, or scaling up — D-Central has the hardware, repairs, and expertise you need.

AI Self-Hosting

Local AI to Babysit Your Rigs: An Offline LLM That Reads Your Miner Logs

Run a local LLM on your own host box to read your miner logs, explain cryptic errors in plain language, and babysit your rigs overnight. It runs on your hardware next to the ASIC, never on the miner, grounded in D-Central error-code data, and it never phones home.

Start Mining Smarter

Whether you are heating your home with sats, building a Bitaxe, or scaling up — D-Central has the hardware, repairs, and expertise you need.

Start Mining Smarter

Whether you are heating your home with sats, building a Bitaxe, or scaling up — D-Central has the hardware, repairs, and expertise you need.

Browse Products Talk to a Mining Expert