Definition
Data poisoning is an attack in which an adversary controls part of the data used to train a machine-learning model, corrupting what the model learns. NIST defines it succinctly as "a poisoning attack in which an adversary controls part of the training data." Unlike attacks at inference time, poisoning strikes during training, before the model is ever deployed.
How attackers poison data
Common methods include seeding malicious samples into web-scraped datasets, flipping or forging labels on existing records, and compromising the data-collection or labelling pipeline. NIST's adversarial machine-learning taxonomy groups the outcomes into availability attacks that broadly degrade accuracy, targeted attacks that cause specific misclassifications, and backdoor attacks that plant a hidden trigger which activates only on a chosen input.
Why it is hard to catch
Backdoor and targeted poisoning are especially insidious because overall performance can look completely normal on standard tests; the flaw only surfaces when the trigger appears. Because frontier models are trained on vast, web-scraped corpora that no one fully audits, poisoning is a live concern for any organisation that fine-tunes or trains on uncurated data.
For self-hosters, the lesson is to control your data supply chain: prefer vetted sources, verify provenance, and keep clean baselines. D-Central documents these threats as part of running trustworthy AI. See also red-teaming (AI) and synthetic data.
In Simple Terms
Data poisoning is an attack in which an adversary controls part of the data used to train a machine-learning model, corrupting what the model learns.…
