Sell Inference for Sats: L402 Paywall API for Miners

If you run a Bitcoin miner, you already operate (or can trivially stand up) a piece of infrastructure most people pay a startup to rent: a Lightning node that can settle payments instantly, globally, for fractions of a cent. The whole conversation around L402, the Lightning-native paywall protocol, has been about the buying side — an AI agent that pays per request. Almost nobody talks about the other half: you can be the one selling. Same box that hashes and heats your room can also stand up a metered API and charge sats per call.

TL;DR — the short answer

You can put a local LLM behind a Lightning paywall and let anyone on the internet pay per call with no signup, no API key, no account. The reference stack is Ollama → a FastAPI /generate endpoint → Aperture (the L402 reverse proxy) → your own Lightning node. The critical honesty: the inference runs on a GPU or CPU you own — never on the SHA-256 ASIC. Your Antminer hashes Bitcoin; a separate GPU does the thinking; your Lightning node collects the sats. A miner-operator already has the power, the cooling, and the node — which is exactly why this loop is easier to close in your basement than in a data center.

The empty half of the L402 story

L402 is an open protocol from Lightning Labs — the team behind lnd — that bolts payments directly onto the web’s plumbing. It reuses HTTP’s long-forgotten 402 Payment Required status code: a server hands the client a Lightning invoice plus an authentication token (a macaroon), the client pays the invoice over Lightning, and the payment preimage becomes the proof that unlocks the resource. No Stripe, no signup form, no monthly minimum. Pay, get the preimage, make your call. That’s it.

Lightning Labs built and open-sourced Aperture precisely for this: it’s an L402-aware reverse proxy that sits in front of any backend API, issues the 402 challenge, verifies the Lightning payment, and then passes the request through. Credit where it’s due — the protocol, the proxy, and the whole pay-per-call pattern are their work. We’re just pointing out the use case the ecosystem keeps skipping.

Because nearly every L402 tutorial is written from the consumer’s chair: “here’s how your autonomous agent pays for an API in Bitcoin.” Useful, but it leaves the supply side wide open. Who’s running the metered endpoint the agent pays? In a sovereign loop, the answer should be another pleb, not a hyperscaler. If you already babysit miners, you’re closer to being that supplier than almost anyone.

First, the accuracy wall: your ASIC cannot run inference

Let’s kill the fantasy before it costs you money. A SHA-256 ASIC — the chip inside an Antminer — is fixed-function silicon. It does exactly one thing: compute SHA-256 double-hashes as fast as physics allows. It has no general-purpose compute units, no floating-point math, no tensor cores, and no usable memory for a model’s weights. You cannot load Llama onto a BM1387 or a BM1368. It is not a slow GPU; it is a chip that physically cannot do the operations a neural network requires. We wrote a whole piece on why that’s true — see why Bitcoin miners should ignore the AI hype and stay on SHA-256.

So when people say “miners are pivoting to AI,” they are not repurposing the ASICs. They’re reusing the three things around the ASICs that are genuinely expensive and hard to get: cheap power contracts, industrial cooling, and permitted real estate. The chips get scrapped or resold; the building hosts GPUs. The same logic applies in your house at small scale: your miner taught you how to feed kilowatts into a room and pull the heat back out. That hard-won operational muscle is the asset, not the hash chip.

The inference in this entire article runs on a GPU or CPU you own — a consumer card, a refurbished workstation card, even a beefy CPU for small models. The ASIC stays in its lane hashing Bitcoin. The Lightning node settles the payments. Three jobs, three pieces of hardware. Conflate them and you’ll build something that doesn’t work.

The reference stack: Ollama → FastAPI → Aperture → your node

Here’s the honest, minimal architecture for selling inference over Lightning. Every component is open-source and runs on hardware you control.

Ollama — the local model runtime. It loads an open-weights model (Llama, Mistral, Qwen, whatever fits your VRAM) onto your GPU and exposes a simple local generation API. This is the part doing the actual thinking, on your silicon, with your electricity. Nothing leaves the box.
FastAPI /generate endpoint — a thin Python wrapper you write that takes a prompt, forwards it to Ollama, and returns the completion. This is your API: you decide the request shape, the model, the token limits, the rate caps. It’s maybe forty lines of code.
Aperture (the L402 reverse proxy) — Lightning Labs’ proxy sits in front of your FastAPI endpoint. When an un-paid request arrives, Aperture answers with 402 Payment Required, a Lightning invoice, and a macaroon. It only forwards the request to your /generate route after the invoice is paid and the preimage checks out.
Your Lightning node — lnd (or a compatible node) generates the invoices Aperture hands out and receives the sats. This is the piece a miner-operator most likely already runs, or can stand up in an evening on the same machine that already monitors the rigs.

A request flows like this: a client (a human’s script, or an autonomous agent) hits your public URL → Aperture returns a 402 with an invoice → the client pays over Lightning → the client retries with the preimage → Aperture verifies and forwards to FastAPI → FastAPI asks Ollama for the completion → the tokens come back. The caller never made an account. You never touched their identity. Money and data moved in one round trip.

Pricing per call, in sats, with no middleman

Because L402 issues a fresh invoice per request, you can price however the work actually costs you. A flat price per call is the simplest. But since you control the FastAPI layer, you can also meter on output: count the tokens the model produced and set the invoice accordingly, so a one-line answer costs less than a thousand-token essay. You can charge more for a bigger model and less for a tiny one, all from the same node.

We’re deliberately not quoting a number. What you can earn per call depends on your model, your electricity, your card, and what the market will bear — and anyone promising you a yield figure is guessing. The honest framing is simply this: each paid call earns sats, settled instantly, with no platform skimming a cut and no chargebacks. Whether that pencils out into real income is your math to run, on your hardware, with your power price. We won’t pretend to know it for you.

One genuinely honest cost note: if you self-host the node, there’s no per-transaction fee going to a payment processor. That’s a factual property of running your own Lightning node — not a savings pitch, just how the rails work when you own them.

Which box does what (the part most guides blur)

Say it again, because it’s the difference between a working setup and a confused one:

The ASIC hashes. Your Antminer keeps mining Bitcoin. It earns block rewards, not inference fees. It never sees a single token. It is fixed-function and proud of it.
The GPU (or CPU) infers. A separate, general-purpose machine you own runs Ollama and produces the AI output. This is the only thing that does AI. If you don’t own a capable GPU, you don’t have an inference business — you have a Lightning node and a plan.
The Lightning node settles. It mints invoices and collects sats. It can live on the same low-power machine that already watches your miners.
The heat is a bonus, not the product. Both the ASIC and the GPU dump nearly all their watts into your room as heat. That’s the same waste-heat story we cover in our work on heating your home with inference, not just hashing — a topic worth its own read.

Why a miner-operator is uniquely positioned for this

Standing up a public, paid API is normally a pain: you need cheap reliable power, somewhere to dump the heat, and a payment relationship with a processor that will eventually ask for your ID and bank details. A Bitcoin miner has already solved the first two and explicitly rejected the third.

Power and cooling are already handled. You sized a circuit, you manage airflow, you know your kWh price cold. Adding one GPU to a room that already eats a few thousand watts is a rounding error operationally.
You already think in sats. A Lightning node isn’t an exotic add-on for you — it’s the natural settlement layer for someone who already holds and moves Bitcoin. The payment rail your AI-rental competitors have to learn is your home turf.
You own the whole stack. No cloud GPU bill that can 10x overnight, no API provider that can deplatform you, no model behind someone else’s terms of service. This is the sovereignty logic that runs through everything we build — the same reason we put our effort into open-source firmware like DCENT_OS, where you own and audit the code on the mining side of the room too.

This is one more backup in a world that keeps centralizing. Bitcoin is the backup to fiat finance. Self-hosted compute is the backup to rented, surveilled intelligence. A node selling inference for sats sits exactly where those two backups overlap: your money, your model, your hardware, your room. We map that whole stack on our sovereignty hub — this is the compute layer of it, made to pay for itself.

Footnote: what carries the packets if the ISP drops?

A paid API is only as sovereign as the network it answers on. If your only path to the internet is a single ISP that can throttle, log, or cut you, you’ve decentralized the money and the compute but left the connectivity rented. That’s why mesh networking is part of the same conversation — a community mesh is the backup to the internet the way Bitcoin is the backup to the bank. We’re not going to pretend you’ll serve a high-traffic inference API over a mesh link today; that’s not the claim. The point is that the resilient version of this loop eventually wants resilient connectivity underneath it, and that layer exists in the sovereignty stack alongside the money and the compute. Build the paywall now; know the network backup is part of the same project.

Standing on shoulders

None of this is ours to take credit for. Lightning Labs designed L402 and shipped Aperture as open source. The Ollama project made running local models genuinely easy. The open-weights model teams put capable models in your hands without a license to rent. Our contribution is narrow and honest: we’re the Bitcoin mining hackers pointing out that the people who already run kilowatts and Lightning nodes are sitting on the supply side of a market everyone else is busy building the demand side of. We didn’t invent the rails. We just noticed your basement is already half-wired for them. And when you want a discovery layer on top of the paywall, Nostr data vending machines (NIP-90) are the emerging marketplace where exactly this kind of paid compute gets bought and sold.

Frequently asked questions

Does the inference run on my Bitcoin ASIC?

No. A SHA-256 ASIC is fixed-function silicon that can only compute Bitcoin hashes — it physically cannot run a neural network. The inference runs on a separate GPU or CPU you own. The ASIC keeps mining; the GPU does the AI; the Lightning node collects the sats. Three different jobs on three different pieces of hardware.

What is L402?

L402 is an open protocol from Lightning Labs that uses HTTP’s 402 Payment Required status code to gate any web resource behind a Lightning payment. The server returns an invoice and an authentication token; the client pays over Lightning; the payment preimage unlocks the resource. It enables pay-per-call APIs with no signup, no account, and no credit card.

What is Aperture?

Aperture is Lightning Labs’ open-source, L402-aware reverse proxy. You place it in front of any backend API — in this case your local LLM endpoint — and it handles issuing the 402 challenge, generating the Lightning invoice, verifying payment, and forwarding the paid request through to your service.

How much can I earn selling inference over Lightning?

That depends entirely on your model, your GPU, your electricity price, and demand — so we won’t quote a figure, because anyone who does is guessing. Each paid call earns sats settled instantly with no processor taking a cut, but whether it pencils into real income is math you run on your own hardware and power costs.

Do I need to run my own Lightning node?

To collect payments natively and keep custody of your sats, yes — the reference stack uses your own node (such as lnd) to mint invoices and receive funds. The upside of self-hosting is direct: there’s no payment processor between you and your customers, and no per-transaction fee leaving your pocket. If you already run a miner, you likely have a machine that can host the node alongside your monitoring.

Is this a get-rich-quick scheme?

No. It’s a way to make hardware you already own do useful, paid work on a sovereign payment rail. It rewards people who already have cheap power, cooling, and a GPU — which describes a lot of Bitcoin miners. The honest pitch is sovereignty and optionality, not a yield.

If you want the broader picture — owning your money, your compute, your firmware, and your connectivity as a stack of backups against a centralizing world — start with our sovereignty hub and the sovereign compute for plebs overview. If you’re building the mining side of that room and want to own the firmware as much as you own the node, meet DCENT_OS, our open-source Antminer firmware in public beta. And if you need the hardware to run a node, browse the shop. No hard sell — just the parts of the loop we actually build and use ourselves.

ASIC Troubleshooting Database 650+ error codes with step-by-step fixes. Diagnose and repair your miner.

Try the Calculator

Bitaxe Hex" width="80" height="80" loading="lazy" style="width:80px;height:80px;object-fit:contain;border-radius:6px;background:#1A1A1A;flex-shrink:0;">

The Bitaxe Hex CAD

Shop Bitaxe Hex

Your Miner Already Has a Lightning Node — Now It Can Sell Inference for Sats

TL;DR — the short answer

The empty half of the L402 story

First, the accuracy wall: your ASIC cannot run inference

The reference stack: Ollama → FastAPI → Aperture → your node

Pricing per call, in sats, with no middleman

Which box does what (the part most guides blur)

Why a miner-operator is uniquely positioned for this

Footnote: what carries the packets if the ISP drops?

Standing on shoulders

Frequently asked questions

D-Central

Related Posts

Don’t Give an AI Agent Your Cold Wallet: Self-Custody Rules for Autonomous Lightning Spending

L’IA souveraine pour les Bitcoiners : un manifeste

GPU Heat vs ASIC Heat: Which Heats a Room Better (Honest Thermodynamics)

Related products, repair, and setup paths

Your Miner Already Has a Lightning Node — Now It Can Sell Inference for Sats

TL;DR — the short answer

The empty half of the L402 story

First, the accuracy wall: your ASIC cannot run inference

The reference stack: Ollama → FastAPI → Aperture → your node

Pricing per call, in sats, with no middleman

Which box does what (the part most guides blur)

Why a miner-operator is uniquely positioned for this

Footnote: what carries the packets if the ISP drops?

Standing on shoulders

Frequently asked questions

D-Central

Related Posts

Don’t Give an AI Agent Your Cold Wallet: Self-Custody Rules for Autonomous Lightning Spending

L’IA souveraine pour les Bitcoiners : un manifeste

GPU Heat vs ASIC Heat: Which Heats a Room Better (Honest Thermodynamics)

Shop Related Hardware

Bitaxe Starter Build — Bitaxe + Case + PSU + Heatsink

The NerdQaxe+

Antminer Loki Edition

The Nerdaxe

Related products, repair, and setup paths