This new data poisoning tool lets artists fight back against generative AI

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways. 

The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth. MIT Technology Review got an exclusive preview of the research, which has been submitted for peer review at computer security conference Usenix.

AI companies such as OpenAI, Meta, Google, and Stability AI are facing a slew of lawsuits from artists who claim that their copyrighted material and personal information was scraped without consent or compensation. Ben Zhao, a professor at the University of Chicago, who led the team that created Nightshade, says the hope is that it will help tip the power balance back from AI companies towards artists, by creating a powerful deterrent against disrespecting artists’ copyright and intellectual property. Meta, Google, Stability AI, and OpenAI did not respond to MIT Technology Review’s request for comment on how they might respond. 

Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows.

The team intends to integrate Nightshade into Glaze, and artists can choose whether they want to use the data-poisoning tool or not. The team is also making Nightshade open source, which would allow others to tinker with it and make their own versions. The more people use it and make their own versions of it, the more powerful the tool becomes, Zhao says. The data sets for large AI models can consist of billions of images, so the more poisoned images can be scraped into the model, the more damage the technique will cause.

targeted attack

Nightshade exploits a security vulnerability in generative AI models, one arising from the fact that they are trained on vast amounts of data—in this case, images that have been hoovered from the internet. Nightshade messes with those images.

Artists who want to upload their work online but don’t want their images to be scraped by AI companies can upload them to Glaze and choose to mask it with an art style different from theirs. They can then also opt to use Nightshade. Once AI developers scrape the internet to get more data to tweak an existing AI model or build a new one, these poisoned samples make their way into the model’s data set and cause it to malfunction.

Poisoned data samples can manipulate models into learning, for example, that images of hats are cakes, and images of handbags are toasters. The poisoned data is very difficult to remove, as it requires tech companies to painstakingly find and delete each corrupted sample.

The researchers tested the attack on Stable Diffusion’s latest models and on an AI model they trained themselves from scratch. When they fed Stable Diffusion just 50 poisoned images of dogs and then prompted it to create images of dogs itself, the output started looking weird—creatures with too many limbs and cartoonish faces. With 300 poisoned samples, an attacker can manipulate Stable Diffusion to generate images of dogs to look like cats.

A table showing a grid of thumbnails of generated images of Hemlock attack-poisoned concepts from SD-XL models contrasted with images from the clean SD-XL model in increments of 50, 100, and 300 poisoned samples.

Generative AI models are excellent at making connections between words, which helps the poison spread. Nightshade infects not only the word “dog” but all similar concepts, such as “puppy,” “husky,” and “wolf.” The poison attack also works on tangentially related images.

Read More


By: Melissa Heikkilä
Title: This new data poisoning tool lets artists fight back against generative AI
Sourced From:
Published Date: Mon, 23 Oct 2023 17:26:40 +0000


Exit mobile version