Air-Gapped AI Coding for Canadian Regulated Organizations (PHIPA / Law 25 / Security Clearance)
Bottom line: Canadian organizations working under PHIPA, Quebec Law 25, or federal security-clearance requirements can run a fully offline AI coding assistant — local model server (Ollama or llama.cpp) plus an air-gapped editor extension (Continue.dev) — with zero code leaving the machine. The stack works; the limitations are real (smaller context windows, hardware cost, manual model updates); and the compliance case is strong. This page walks through the regulatory triggers, the verified open-source stack, honest telemetry findings, and what the setup cannot do.
This page covers technical tooling for privacy-conscious software development. Nothing here constitutes legal advice. Regulatory guidance is attributed to public sources and dated where possible; consult qualified legal counsel for your specific compliance posture.
Why regulated Canadian organizations are rethinking cloud coding tools
Cloud-based AI coding assistants have shipped significant security incidents that are directly relevant to regulated environments. Security researchers at Orca disclosed CamoLeak (CVE-2025-59145, CVSS 9.6) in 2025 — a prompt-injection flaw in GitHub Copilot’s Chat mode that allowed silent exfiltration of private source code. A separate passive injection flaw, RoguePilot, was disclosed in February 2026. Cursor received CVE-2025-59944 (remote code execution via MCP configuration poisoning) in the same period. A 2025 analysis found secret-leakage rates running approximately 40% higher in repositories actively using Copilot (6.4% versus a 4.6% baseline) — a figure worth taking seriously in any environment handling personal health information, classified material, or client-confidential code.
These are not theoretical risks. They are the proximate reason that three regulatory regimes converge on the same answer: if the model cannot be isolated from outbound network access, it is not an acceptable tool for processing regulated data.
PHIPA (Ontario)
Ontario’s Personal Health Information Protection Act does not contain an explicit data-residency prohibition as of mid-2026, but it creates a de facto air-gap requirement for developer tooling in healthcare settings. Section 38.1 mandates electronic audit logs for all personal health information systems. The accountability framework holds custodians responsible for third-party processors even when “a machine is involved” — meaning a hospital or clinic whose developers use a cloud coding assistant that uploads code snippets (which may embed PHI strings) to offshore servers has created a custodian-accountability gap. The Ontario IPC has not issued specific guidance on coding assistants at time of writing; organizations should document their rationale and review it with counsel.
Quebec Law 25
Quebec’s Loi 25 (An Act to modernize legislative provisions as regards the protection of personal information, in force fully since September 2023) imposes stricter requirements than the federal baseline. Section 12.1 requires organizations to provide meaningful information about AI decision-making logic when AI is used to make or inform decisions about individuals. Section 14 requires consent to be “manifestly informed and explicit” for AI processing of personal information. Section 3.3 mandates Privacy Impact Assessments for AI systems presenting “high risk to the protection of personal information.” Section 8 requires security measures “adapted to the sensitivity of the personal information.” Penalties reach C$25,000,000 or 4% of worldwide turnover, whichever is higher. For a Quebec software company whose developers use a cloud assistant that uploads code touching personal information, the Section 3.3 PIA trigger is live.
See our dedicated page on Quebec Law 25 and on-premise LLMs for a deeper walk through the compliance architecture.
Federal security clearances (ITSG-33)
Canada’s Communications Security Establishment (CSE) publishes ITSG-33 (IT Security Risk Management: A Lifecycle Approach), the control catalogue used for Government of Canada information systems. Systems processing SECRET or higher classified information require air-gapped development environments — any tool that makes outbound network calls while a developer types is disqualified by definition. The ITSG-33 framework enforces a lifecycle model (Define → Deploy → Monitor → Assess → Identify); AI tooling that cannot be audited at each stage does not fit the model. Defence contractors, intelligence-sector integrators, and federal IT suppliers routinely need to build and maintain code without any egress path.
The verified open-source air-gapped stack
The stack described below is built from projects maintained by their respective open-source communities. D-Central does not maintain these projects; we document them as they stood in mid-2026. Verify current releases before deployment.
Layer 1: Model server — Ollama
Ollama (MIT license) is a local model runtime for macOS, Linux, and Windows. Once a model is pulled, Ollama operates entirely offline — it does not collect telemetry or usage data by default. The server runs on localhost; no traffic leaves the machine unless you explicitly configure a remote endpoint. For air-gapped installation: download the Ollama installer and all required model files (GGUF format) on a networked machine, transfer via approved media, and install on the air-gapped host.
An alternative for CPU-only or resource-constrained environments is llama.cpp (MIT), which runs quantized GGUF models without a GPU and has a minimal binary footprint — useful on government-issue hardware that may lack discrete graphics.
Layer 2: Model selection
As of mid-2026, the open-weight models most frequently recommended for offline code assistance are listed below. Benchmark numbers shift with each release; treat these as orientation, not guarantees.
| Model | Size variants | Approx. VRAM (quantized) | Strengths | License |
|---|---|---|---|---|
| Qwen2.5-Coder (Alibaba / Qwen Team) | 0.5B, 1.5B, 3B, 7B, 14B, 32B | ~5 GB (7B Q4), ~20 GB (32B Q4) | Strong code completion, multilingual, broad language support | Apache 2.0 |
| DeepSeek Coder V2 (DeepSeek AI) | 16B Lite, 236B (MoE) | ~10 GB (16B Q4) | Cost-efficient at 16B, competitive on fill-in-the-middle | DeepSeek license (non-commercial restrictions; verify before use) |
| CodeLlama (Meta) | 7B, 13B, 34B, 70B | ~5 GB (7B Q4) | Mature, well-documented, broad community | Llama 2 Community License |
| StarCoder2 (BigCode / Hugging Face) | 3B, 7B, 15B | ~2 GB (3B Q4) | Trained on permissively licensed code; strong provenance story for legal-sensitive orgs | BigCode OpenRAIL-M |
License note: Some open-weight models carry use restrictions (non-commercial clauses, prohibited use lists). Verify the applicable license before deploying in a commercial or government-contract context. StarCoder2 is often the safest choice for organizations with strict IP-provenance requirements, because its training data is drawn from permissively licensed repositories.
Layer 3: Editor integration — Continue.dev
Continue (Apache 2.0) is an open-source VS Code and JetBrains extension that connects your editor to a local (or remote) model. For air-gapped deployment:
- Download the latest
.vsixbundle from github.com/continuedev/continue/releases on a networked machine. - Transfer to the air-gapped host via approved media.
- Install: VS Code → Extensions → “Install from VSIX…”
- In
config.json, set"allowAnonymousTelemetry": falseand point the model endpoint tohttp://localhost:11434(Ollama default).
Telemetry caveat (verified, hedge applies): Continue.dev’s public GitHub issue tracker (issue #2082) documents a case where telemetry calls to PostHog still fired despite allowAnonymousTelemetry: false being set. The project maintainers acknowledged this as a bug. As of mid-2026, the recommended mitigation for high-assurance environments is to set the configuration flag and enforce outbound network blocking at the OS or firewall level — do not rely on the application flag alone. Network-level egress control is the correct layer for air-gap enforcement regardless of application settings.
What the stack can and cannot do
| Capability | Air-gapped local stack | Cloud assistant (Copilot / Cursor cloud) |
|---|---|---|
| Code completion inline | Yes (latency depends on hardware) | Yes (low latency, remote GPU) |
| Code explanation / chat | Yes | Yes |
| Context window | Typically 8k–32k tokens (mid-2026 local models); 32B+ models can reach 128k | 128k–1M+ tokens (varies by product) |
| Codebase-wide indexing | Partial — Continue.dev has local embeddings; large monorepos exceed practical limits | Full (vendor-hosted vector stores) |
| Internet doc lookup during coding | No — fully offline | Yes (Copilot references docs) |
| Model knowledge cutoff | Fixed at download time; manual update cycle | Rolling updates from vendor |
| PHI / classified data safety | Strong — no egress path when properly enforced | Dependent on vendor DPA, jurisdiction, and incident history |
| Audit trail | All inference local; log to local SIEM | Vendor-managed; audit log access varies by tier |
| GPU hardware cost | One-time capital (consumer: ~C$800–C$3,000 for a capable GPU, as of mid-2026; subject to change) | SaaS subscription (~US$10–US$40/month/seat at time of writing) |
GPU and subscription prices are indicative as of mid-2026 and subject to change. See our cloud vs. local AI TCO analysis for a full cost model.
Setting up the stack: a verified sequence
Step 1 — Prepare transfer media on a networked machine
On a machine with internet access, download all components and verify hashes before transfer:
- Ollama installer for your target OS (from github.com/ollama/ollama/releases)
- Your chosen model in GGUF format from Hugging Face (e.g.,
qwen2.5-coder:7bviaollama pull qwen2.5-coder:7bthen export the blobs) - Continue.dev
.vsixfrom github.com/continuedev/continue/releases - VS Code installer (if not already on the air-gapped host)
Step 2 — Install and configure on the air-gapped host
# Install Ollama (Linux example)
sudo install ollama-linux-amd64 /usr/local/bin/ollama
# Import pre-pulled model blobs
# (blobs stored under ~/.ollama/models/ on the download machine)
# Copy the models directory to the same path on the air-gapped host.
# Verify Ollama starts and sees the model
ollama list
# Install Continue.dev in VS Code
code --install-extension continue-*.vsix
Step 3 — Harden the config
In Continue.dev’s config.json (~/.continue/config.json):
{
"allowAnonymousTelemetry": false,
"models": [
{
"title": "Qwen2.5-Coder 7B (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
]
}
Then enforce at the OS level: block localhost:11434 from any outbound-proxy path, and add a firewall rule denying all egress from the development workstation. The application-layer flag is a useful signal; the network layer is the enforcement boundary.
Step 4 — Verify no egress
On Linux, use ss -tulnp to confirm only localhost listeners. Monitor outbound connections for 30 minutes of active coding with a tool like tcpdump -i any dst not 127.0.0.1 or your organization’s network monitoring solution. Document this test result for your Privacy Impact Assessment or ITSG-33 system security plan.
Honest limits: what the local stack cannot solve
- Large codebase context: Most practical local configurations top out at 8k–32k tokens of effective context. A 500,000-line monorepo cannot be indexed and queried the way cloud tools with million-token windows can handle it. Workarounds (chunked retrieval-augmented generation via local embeddings) add complexity and reduce coherence.
- Documentation lookups: The model cannot fetch current API docs, package changelogs, or CVE databases. Developers need separate offline documentation mirrors or must context-paste manually.
- Model freshness: Air-gapped models have a knowledge cutoff fixed at download time. Keeping models current requires a controlled media transfer process on each update cycle — a procedural overhead that must be built into the operating procedure.
- Agentic / multi-step tasks: Agentic frameworks (automated PR review, multi-file refactoring agents) that require web access or external API calls do not function offline. Local-only agents are feasible but their scope is narrower.
- Hardware cost and procurement lead time: A development workstation capable of running a 32B-parameter model at useful latency typically needs 24 GB of VRAM or more. Procuring specialized hardware through government channels can take months. The 7B models are usable on consumer-grade hardware but produce lower-quality completions on complex tasks.
Frequently asked questions
Is Ollama fully offline once installed?
Yes. Ollama does not collect telemetry or phone home by default. Once a model is downloaded with ollama pull, the runtime operates entirely on localhost. The recommended practice for regulated environments is to additionally block all egress at the network layer rather than relying solely on the application’s offline behaviour.
Does Continue.dev send code to the internet?
When configured with a local Ollama endpoint and "allowAnonymousTelemetry": false, Continue.dev is designed to send no code off-device. A verified bug (GitHub issue #2082) showed that telemetry to PostHog could still fire in some configurations despite the flag. The safe posture for high-assurance environments: set the flag and enforce egress blocking at the OS or firewall level. Do not rely on the application setting alone.
Which model size is right for a developer workstation?
The 7B parameter models (Qwen2.5-Coder 7B, CodeLlama 7B) run on ~8 GB of VRAM — achievable on a consumer GPU as of mid-2026. Quality is adequate for single-file completion and short refactoring tasks. The 32B models produce noticeably better output on complex logic and cross-file reasoning but need ~20 GB of VRAM in a 4-bit quantized configuration. For teams, a shared local inference server (a single GPU workstation running Ollama) can serve multiple developer endpoints within the air-gapped network segment.
Does this setup satisfy Quebec Law 25?
A fully local, air-gapped stack eliminates the personal-information transmission risk that makes cloud coding tools problematic under Law 25 — no code or data leaves the machine, so there is no third-party processor relationship created by the development toolchain. You will still need a Privacy Impact Assessment under Section 3.3 if the AI system makes or informs decisions about individuals; the assessment for a local coding assistant is expected to be straightforward given the absence of data egress. This is not legal advice; consult counsel for your specific context. See also: Quebec Law 25 and on-premise LLMs.
What about PHIPA-regulated healthcare developers in Ontario?
For developers working with systems that process personal health information, the local stack removes the cloud-processing accountability gap. Ontario’s PHIPA requires electronic audit logs for PHI systems; local inference can be logged to an on-premise SIEM. The key step is documenting that the development toolchain has been assessed and that no PHI can be exfiltrated via the AI coding layer. This is not legal advice; engage your privacy officer.
Can I use this for federal government (ITSG-33 / Protected B / Secret) work?
The local, air-gapped stack is architecturally compatible with air-gapped development environments required for SECRET-classification work under ITSG-33. Compatibility does not mean automatic certification — you will need to document the system security plan, conduct the threat and risk assessment, and have the configuration reviewed by your organization’s ITSA/ISSO or by the Communications Security Establishment’s guidance frameworks. The key requirement is that the model server, model files, and editor extension all be installed from verified, hash-checked media with no runtime egress.
Is GitHub Copilot available in an offline mode?
As of April 2026, GitHub added a BYOK (bring-your-own-key) mode where setting COPILOT_OFFLINE=true routes inference to a locally running model endpoint rather than GitHub’s servers, disabling telemetry for that session. This is a newer configuration and has less deployment history than the Ollama/Continue.dev stack. Review the current GitHub documentation and your vendor agreement before relying on this for regulated workloads.
What is the difference between air-gapped and simply using a “private” cloud AI plan?
A private or enterprise cloud AI plan typically means your code is processed on dedicated cloud infrastructure with contractual data-processing restrictions — it is not air-gapped. The code still leaves your machine and crosses a network to a vendor’s servers. For regulatory frameworks that prohibit or strongly discourage off-premise processing of sensitive data (PHIPA, Law 25 s.8, ITSG-33 classified systems), only on-premise local inference eliminates the transmission risk. Enterprise cloud plans reduce risk relative to consumer tiers but do not eliminate it.
Where D-Central fits
D-Central works with Canadian organizations on the infrastructure layer of digital sovereignty — hardware selection, on-premise AI deployment, and the architectural choices that keep sensitive data under organizational control. If you are assessing an air-gapped AI coding stack for your team or need help specifying the hardware, we can help. Pricing is quote-only; no standard price list applies to regulated-environment builds.
Related resources:
Related products, repair, and setup paths
- self-hosted AI for Bitcoiners hub
- plebs guide to self-hosted AI
- install Ollama in 10 minutes
- LM Studio vs Ollama vs llama.cpp
- connect local AI to Home Assistant and Obsidian
- self-hosted AI troubleshooting
- repurpose mining hardware into an AI hashcenter
- local AI model leaderboards
Last reviewed June 15, 2026.
