What the CSAM incident in Wisconsin tells us about prompt blocking

In May 2024, a Wisconsin man was arrested on criminal charges of producing, distributing, and possessing AI-generated images of minors engaging in sexually explicit conduct and transmitting similar sexually explicit AI-generated images to minors.

An incident in Wisconsin in which a man used generative AI to create child sexual abuse material (CSAM) for distribution among minors came to light. The news was surprising not only because of the nature of the crime, but also because the defendant used AI to create the offensive material – something that, if AI companies are to be believed, should have been prevented by the security measures surrounding AI.

Charges against the accused

The man was charged with producing, distributing and possessing obscene visual depictions of minors engaged in sexually explicit acts and distributing obscene material to minors under the age of 16. These images were created using a text-to-image generative artificial intelligence (GenAI) model called Stable Diffusion, developed by Stability AI. According to the ruling, law enforcement caught the defendant bragging to a 15-year-old about creating these images using a GenAI and sending them in direct messages on Instagram.

Manipulation of prompts to retrieve CSAM content

The ruling highlighted how the defendant entered text prompts into Stable Diffusion to generate images based on its parameters. He also used “specific ‘negative’ prompts” – prompts that tell the GenAI model what not to include in the generated content – to avoid creating images depicting adults. In addition, the review of the defendant’s electronic devices showed that he also used a graphical user interface and special add-ons created by other Stable Diffusion users who specialized in creating CSAM material. The defendant combined these tools to generate photorealistic CSAM content.

“Additional evidence from the laptop suggests that he used extremely specific and explicit commands to create these images,” the ruling states.

What does this mean in terms of CSAM prevention?

To some extent, the incident raises questions about the effectiveness of blocking prompts. In April, OpenAI spoke of blocking prompts as “additional content protection” for non-account experiences, without mentioning the specific prompts blocked. Previously, in March, Microsoft blocked “Pro Choice,” “Pro Choce” (sic), and “Pro Life” prompts on its artificial intelligence tool Copilot after an AI engineer raised concerns about the AI’s image-generating content. However, the Wisconsin man’s actions have made it abundantly clear that blocking prompts does little to hinder wrongdoers; the defendant even asked minors if they would like personalized images of the controversial material.

Even the US court stated: “Although AI companies have promised to make it harder for perpetrators to create images of sexually abused minors using future versions of GenAI tools, such steps will do little to prevent sophisticated perpetrators like the defendant from running earlier versions of these tools locally on their computers without being noticed.”

Gautham Koorma, a machine learning engineer and researcher at UC Berkeley, argued that despite such incidents, immediate blocking remains an important security measure.

“Instant blocking is a valuable measure to prevent abuse of AI models. However, it is not foolproof, as users could bypass these measures with techniques such as jailbreaks. Large technology companies such as OpenAI and Microsoft implement and maintain these filtering mechanisms with their own teams to ensure their effectiveness. Unfortunately, open-source models such as Stable Diffusion have weak security mechanisms that can be easily disabled. Sites such as CivitAI that host fine-tuned Stable Diffusion models for pornography and CSAM further exacerbate this problem,” he said.

When asked if there might be alternatives to immediate blocking, Koorma referred to the Stanford Internet Observatory’s report “Generative ML and CSAM: Implications and Mitigations,” which recommended monitoring computer-generated CSAM or CG-CSAM production forums and adding the perceptual hashes of the material to separate hash sets.

“This could allow platforms to detect and remove CG-CSAM content uploaded in the future; platforms themselves could also contribute to these hash sets, as is currently the case with other hash-sharing systems… This material can also be analyzed for trends in models and parameters and potentially used to train detection models,” the report said. It also suggested expanding industry classification and categorization systems to include additional criteria for determining the severity of CG-CSAM. For example, the system could check whether content:

• Computer generated

• Indistinguishable from photographic representations

• Depicts explicit sexual activities

• Is based on an existing person or a known victim

The Wisconsin incident also raises concerns about end-to-end encryption

The court noted that the defendant was caught only because Instagram detected and reported the objectionable material the man sent to a minor via direct message on a single day in October 2023. In addition, the court noted that since October 2023, Meta has started implementing end-to-end encryption by default on its Facebook and Messenger platforms and has indicated that Instagram’s direct messaging feature will follow soon after.

“If the default end-to-end encryption on Instagram were enabled, it is very likely that no one, including Meta, other than the recipient of that message would be able to detect what the defendant was sending via direct message. The same is true for Telegram, the use of which the defendant has discussed with others, and a number of other encrypted messaging applications.”

The incident reopens a debate around end-to-end encryption and whether platforms should provide backdoors for law enforcement. A version of this discourse is currently being seen in Indian courts, where the Indian government is asking social media platform WhatsApp to “enable identification of the first originator of the information (message),” essentially removing end-to-end encryption (E2EE). However, the platform and many similar platforms continue to argue in favor of E2EE, stating that removing such a security measure would severely compromise user privacy.

In 2022, Business for Social Responsibility (BSR) stated in its report that news content can be scanned in an E2EE environment using one of several new hash-based solutions called “client-side scanning.” However, even with these techniques, there is still a risk of misuse by governments. In addition, hash-based solutions cannot effectively moderate nuanced content, meaning they can only be used against clearly violative content such as CSAM and not against dis/misinformation.

When asked for his opinion on the issue, Koorma said, “End-to-end encryption (E2EE) provides basic privacy but makes it harder to detect illegal content like CSAM. A balanced approach involves developing technologies that can detect and report CSAM while respecting user privacy. Collaboration between technology companies, policymakers, and law enforcement is critical to developing protocols that protect privacy without compromising security. Proposals like the EARN IT Act, which holds technology companies accountable for CSAM by making their immunity contingent on compliance with best practices, and the STOP CSAM Act, which increases transparency and accountability, aim to address these issues but raise concerns about potentially weakening encryption and security.”

Also read:

Charges against the accused

Manipulation of prompts to retrieve CSAM content

What does this mean in terms of CSAM prevention?

The Wisconsin incident also raises concerns about end-to-end encryption

Related Posts

4 shark attacks on South Padre Island during 4th of July celebrations

NTSB derailment investigation raises new concerns about detectors, tank cars and Norfolk Southern

At least 1 dead after severe storms hit Louisiana and other southern states