Data Poisoning

Sovereign AI

Data poisoning is an attack in which an adversary controls part of the data used to train a machine-learning model, corrupting what the model learns. NIST defines it succinctly as "a poisoning attack in which an adversary controls part of the training data." Unlike attacks at inference time, poisoning strikes during training or fine-tuning, before the model is ever deployed — which means the finished artifact carries the compromise inside its weights, invisible to a casual test.

How attackers poison data

Common methods include seeding malicious samples into web-scraped datasets, flipping or forging labels on existing records, and compromising the data-collection or labelling pipeline itself. The web-scale route is the one that scales: modern foundation models train on enormous crawled corpora, so an attacker who can publish content that will predictably be scraped — pages, code repositories, forum posts — gets a write path into future models without touching anyone's infrastructure. NIST's adversarial machine-learning taxonomy groups the outcomes into availability attacks that broadly degrade accuracy, targeted attacks that cause specific misclassifications while leaving everything else intact, and backdoor attacks that plant a hidden trigger which activates only on a chosen input pattern.

Why it is hard to catch

Backdoor and targeted poisoning are especially insidious because overall performance can look completely normal on standard benchmarks; the flaw only surfaces when the trigger appears, and the trigger is whatever the attacker chose. Research has also shown that the amount of poisoned data needed can be surprisingly small relative to the size of the training set. Because frontier models are trained on vast, web-scraped corpora that no one fully audits, poisoning is a live concern for any organisation that trains on uncurated data — and it compounds downstream: if you run fine-tuning on top of a compromised base model, or fine-tune a clean model on a poisoned dataset, you inherit the problem either way.

Detection research offers partial remedies — dataset deduplication and filtering, outlier and influence analysis to flag training samples that move the model suspiciously far, and canary evaluations designed to trip known trigger styles — but every published defense has published counter-attacks, and none provides a clean bill of health for a corpus of billions of documents. In practice, defenders rely on provenance and process at least as much as on scanning: knowing where data came from beats trying to prove a negative about data of unknown origin.

The self-hoster's defense

For anyone running their own AI stack, the lesson is supply-chain discipline — the same instinct a Bitcoiner applies to verifying firmware before flashing it. Concretely:

Prefer open-weight models from established publishers, fetched from their canonical source, with checksums verified.
Curate and provenance-check any dataset you fine-tune on; treat "found on the internet" data as untrusted input.
Keep a clean baseline model and a fixed evaluation set so you can diff behavior after every training run.
Remember that your RAG corpus is a softer target than the weights: poisoning the documents a model retrieves achieves similar ends with far less effort, and shades into prompt injection.

None of this makes poisoning impossible, but it shrinks the attack surface from "the entire internet" to "sources I chose and can audit" — which is the whole point of sovereignty. A model whose training inputs you control, or at least can enumerate, is trustworthy in a way no opaque hosted service can be. D-Central documents these threats as part of running trustworthy AI on your own hardware. See also red-teaming (AI) for how defenders probe for planted behavior, and synthetic data for one way teams reduce dependence on scraped corpora.

Data poisoning is an attack in which an adversary controls part of the data used to train a machine-learning model, corrupting what the model learns.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners