Almost anyone can poison a machine learning (ML) dataset to change its behavior and output substantially and permanently. With careful, proactive detection efforts, organizations can save weeks, months or even years of work they would otherwise use to recover from data source poisoning.
What is data poisoning and why does it matter?
Data poisoning is a type of adversarial ML attack that maliciously tampers with datasets to mislead or confuse a model. The goal is to make it respond incorrectly or behave in unintended ways. In fact, this risk could harm the future of AI.
As AI adoption spreads, data poisoning becomes more common. Deliberate manipulations have increased the frequency of model hallucinations, inappropriate responses, and misclassifications. Public trust is already down – only 34% strongly believe they can trust technology companies with AI governance.
Examples of Machine Learning Dataset Poisoning.
Although there are several types of poisons, they share the goal of influencing the output of an ML model. Typically, each involves providing false or misleading information to change behavior. For example, one could insert an image of a speed limit sign into a dataset of stop signs to trick a self-driving car into misclassifying road signs.
VB event
AI Impact Tour – NYC
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance the risks and rewards of AI applications. Request an invite to a special event below.
Request an invitation.
Even if an attacker cannot access the training data, he can still tamper with the model, exploiting its ability to adapt its behavior. They can enter thousands of targeted messages at once to reduce its ranking process. Google tested this a few years ago when attackers launched millions of emails simultaneously to trick its email filter into misclassifying spam as legitimate correspondence.
In another real-world case, user input permanently changed the ML algorithm. Microsoft launched its new chatbot “Tay” on Twitter in 2016, trying to mimic the conversational style of a teenage girl. After just 16 hours, he had posted more than 95,000 tweets – most of which were hateful, discriminatory or offensive. The enterprise quickly discovered that people were collecting massive amounts of inappropriate input to change the model’s output.
Common dataset poisoning techniques
Poisoning techniques can fall into three general categories. The first is data set tampering, where someone maliciously modifies the training material to affect the model’s performance. An injection attack – where an attacker injects false, offensive or misleading data – is a common example.
Label flipping is another example of tampering. In this attack, the attacker modifies the training material to confuse the model. The goal is to misclassify it or grossly miscalculate it, ultimately altering its performance significantly.
The second type involves manipulation of the model during and after training, where attackers make incremental changes to affect the algorithm. A backdoor attack is an example of this. In this event, someone poisons a small subset of the dataset—after release, they signal a specific trigger to cause unintended behavior.
The third type involves manipulation of the model after deployment. An example of this is split-wave poisoning, where someone controls a source and creates an algorithmic index and fills it with false information. Once the ML model consumes the newly modified resource, it will adapt to the poisoned data.
Importance of proactive detection efforts
With respect to data poisoning, being proactive is crucial to portraying the integrity of an ML model. Unintended behavior from a chatbot can be offensive or insulting, but poisoned cybersecurity ML applications have far more serious implications.
If someone accesses an ML dataset to poison it, it can seriously undermine security — for example, leading to misclassification during threat detection or spam filtering. Since tampering is usually gradual, no one will discover an attacker’s presence for an average of 280 days. To prevent them from going unnoticed, firms must be proactive.
Unfortunately, malicious tampering is incredibly straightforward. In 2022, a research team discovered that they could poison 0.01% of the largest datasets — COYO-700M or LAION-400M — for just $60.
While such a small percentage may seem insignificant, a small amount can have serious consequences. A mere 3% dataset poisoning can increase the ML model’s spam detection error rate from 3% to 24%. Active detection efforts are necessary considering that seemingly minor tampering can be catastrophic.
Methods for detecting poisoned machine learning datasets
The good news is that organizations can take several steps to secure training data, verify dataset integrity, and monitor for anomalies to reduce the potential for poisoning.
1: Data sanitization
Sanitization refers to “cleaning” the training material before it reaches the algorithm. This includes dataset filtering and validation, where one filters out anomalies and outliers. If they see data that looks suspicious, incorrect or inauthentic, they remove it.
2: Model monitoring
After deployment, a company can monitor its ML model in real time to make sure it doesn’t suddenly exhibit unintended behavior. If they notice a sharp increase in suspicious reactions or errors, they can look for the source of the poison.
Anomaly detection plays an important role here, as it helps identify poisoning incidents. One way to implement this technique is to create a reference and auditing algorithm with their public model for comparison.
3: Source Security
Securing ML datasets is more important than ever, so businesses should only source from trusted sources. Additionally, they must verify authenticity and integrity before training their model. This detection method also applies to updates, as attackers can easily poison previously indexed sites.
4: Updates
Routinely sanitizing and updating the ML dataset reduces split-wave poisoning and backdoor attacks. Ensuring that the information the model trains on is accurate, relevant, and consistent is an ongoing process.
5: Validation of user input
Organizations must filter and validate all input to prevent users from changing the model’s behavior with targeted, widespread, harmful contributions. This detection method reduces the loss of injection, split-wave poisoning and backdoor attacks.
Organizations can prevent dataset poisoning.
Although ML dataset poisoning can be difficult to detect, a proactive, coordinated effort can significantly reduce the likelihood that manipulations will affect model performance. In this way, enterprises can improve their security and protect the integrity of their algorithms.
Zac Amos is a features editor at ReHack, where he covers cybersecurity, AI and automation.
Data decision makers
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technical people working with data, can share data insights and innovation.
If you want to read about cutting-edge ideas and the latest information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider submitting an article of your own!
Read more from DataDecisionMakers