close
close

Generative AI is a new attack vector that puts businesses at risk, says CrowdStrike CTO

workingred4gettyimages-1487248433

skynesher/Getty Images

Cybersecurity researchers have been warning for some time that generative artificial intelligence (GenAI) programs are vulnerable to a variety of attacks, from specially crafted prompts that can break guardrails to data leaks that can reveal sensitive information.

As research progresses, more and more experts are realizing the magnitude of the risk GenAI poses, especially to corporate users with extremely sensitive and valuable data.

Also: Generative AI can easily be made malicious despite guard rails, scientists say

“This is a new attack vector that opens up a new attack surface,” said Elia Zaitsev, chief technology officer of cybersecurity provider CrowdStrike, in an interview with ZDNET.

“With generative AI, I see a lot of people just jumping into using this technology, bypassing the normal controls and methods” of secure computing, Zaitsev said.

“In many ways, you can think of generative AI technology as a new operating system or a new programming language,” Zaitsev said. “Many people don’t know what the pros and cons are and how to use and secure it properly.”

The most notorious example of AI raising security concerns today is Microsoft’s Recall feature, which was originally intended to be integrated into all new Copilot+ PCs.

Security researchers have shown that attackers who gain access to a PC using the Recall feature can view the entire history of a person’s interactions with the PC. This is not unlike what happens when a keylogger or other spyware is intentionally installed on the computer.

“They released a consumer feature that is basically a built-in spyware that copies everything you do to an unencrypted local file,” Zaitsev explained. “This is a goldmine for attackers who can then attack, compromise, and steal all kinds of information.”

Also: US car dealerships suffer from massive cyberattack: 3 things customers should know

After a backlash, Microsoft said it would disable the feature by default on PCs and instead make it an opt-in feature. Security researchers said there were still risks to the feature. The company subsequently said it would not make Recall available as a preview feature on Copilot+ PCs, and now says Recall is “coming soon via a post-launch Windows Update.”

But the danger goes beyond a poorly designed application. The same problem of centralizing a lot of valuable information exists in all large language model (LLM) technologies, says Zaitsev.

Crowdstrike CTO Elia Zaitsev headshot

“I see a lot of people jumping on this technology and bypassing the normal controls and methods of secure computing,” says Elia Zaitsev of Crowdstrike.

CrowdStrike

“I call them naked LLMs,” he said, referring to large language models. “If I train a lot of sensitive information, put it into a large language model, and then expose that large language model directly to an end user, you can use prompt injection attacks that can basically squeeze out all the training information, including the sensitive information.”

Enterprise technology executives have expressed similar concerns. In an interview with technology newsletter The Technology Letter this month, Charlie Giancarlo, CEO of data storage provider Pure Storage, noted that LLMs are “not ready for enterprise infrastructure yet.”

Giancarlo pointed to the lack of “role-based access controls” in LLMs. The programs allow anyone to access an LLM’s prompt and find out sensitive data recorded in the model’s training process.

Also: According to CrowdStrike, cybercriminals are using Meta’s Llama 2 AI

“There are currently no good controls,” said Giancarlo.

“If I asked an AI bot to write my earnings script, the problem would be that I could provide data that only I could have,” the CEO explained, “but once you teach the bot that, it can’t forget it, and so someone else could ask – before the release – ‘What will Pure’s earnings be?’ and it would tell them.” Releasing companies’ earnings information before the scheduled release can lead to insider trading and other securities violations.

GenAI programs, Zaitsev said, are “part of a broader category that could be called malware-free intruders,” which do not require malicious software to be invented and placed on a target computer system.

Cybersecurity experts call such malware-free code “living off the land,” Zaitsev said, exploiting vulnerabilities inherent in a software program. “You don’t bring anything external in, you just use what’s built into the operating system.”

A common example of hand-to-mouth living is SQL injection, where the structured query language used to query a SQL database can be manipulated with certain strings to force the database to take steps that would normally be blocked.

Likewise, LLMs are themselves databases, since the main function of a model is “just a highly efficient compression of data,” effectively creating a new data store. “This is very analogous to SQL injection,” Zaitsev said. “This is a fundamental negative property of these technologies.”

However, Gen AI technology is not something to be abandoned. It has value when used carefully. “I’ve seen some pretty spectacular successes with (GenAI) technology firsthand,” Zaitsev said. “And we’re already using it with great success in customer engagement with Charlotte AI,” Crowdstrike’s assistant program that can help automate some security functions.

Also: Enterprise cloud security flaws are ‘worrying’ – as AI threats grow

Techniques to mitigate risk include validating user input before passing it to an LLM and then validating the response before sending it back to the user.

“They don’t allow users to submit unreviewed prompts directly to the LLM,” Zaitsev said.

For example, a “naked” LLM can search directly in a database that it has access to through “RAG,” or Retrieval-Augmented Generation, an increasingly common method that compares the user’s prompt with the contents of the database. This extends the LLM’s ability to reveal not only sensitive information compressed by the LLM, but also the entire store of sensitive information in these external sources.

baidu-2024-rag-outline.png

RAG is a general method for granting an LLM access to a database.

baidu

The key is to prevent naked LLM from directly accessing data stores, Zaitsev said. In a sense, RAG needs to be tamed before it makes the problem worse.

“We take advantage of the property of LLMs where the user can ask an open-ended question, and then we use that to decide what they’re trying to do, and then we use more traditional programming technologies” to answer the query.

“Charlotte AI, for example, in many cases allows the user to ask a general question. But then Charlotte determines which part of the platform, which dataset, is the source of truth from which the question can then be answered,” via an API call rather than allowing the LLM to query the database directly.

Also: AI is changing cybersecurity and companies must be aware of the threat

“We have already invested in building this robust platform with APIs and search capabilities so that we don’t have to rely too heavily on the LLM and are now minimizing the risks,” Zaitsev said.

“The important thing is that you keep these interactions under wraps, they are not completely open.”

In addition to misuse, the fact that GenAI can lose training data is also a very big problem for which appropriate control mechanisms must be found, said Zaitsev.

“Are you going to enter your social security number into a prompt that you then send to a third party who you have no idea is now incorporating your social security number into a new LLM that someone could then leak through an injection attack?”

“Privacy, personally identifiable information, knowing where your data is stored and how it is secured – these are all things that people should be thinking about as they develop Gen-AI technology and use other vendors that use that technology.”