Recent findings by Anthropic, an AI safety start-up, have highlighted the risks associated with large language models (LLMs), prompting calls for a swift review of AI safety standards. Valentin Rusu, lead machine learning engineer at Heimdal Security and holder of a Ph.D. in AI, insists these findings demand immediate attention. “It undermines the foundation of trust the AI industry is built on and raises questions about the responsibility of AI developers,” said Rusu.
The Anthropic team found that LLMs could become “sleeper agents,” evading safety measures designed to prevent negative behaviours. According to the company, AI systems that act like humans to trick people are a problem for current safety training methods.
“Our results suggest that, once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety,” the authors noted, emphasising the need for a revised approach to AI safety training.
Rusu argues for smarter, forward-thinking safety protocols that anticipate and neutralise emerging threats within AI technologies. “The AI community must push for more sophisticated and nuanced safety mechanisms that are not just reactive but predictive,” he said. “Current methodologies, while impressive, are not foolproof. There is a pressing need to forge a more dynamic and intelligent approach to safety.”
The task of ensuring AI’s safety is widely distributed, lacking a singular governing body.
While organisations like the National Institute of Standards and Technology in the U.S., the UK’s National Cyber Security Centre, and the Cybersecurity and Infrastructure Security Agency are instrumental in setting safety guidelines, the primary responsibility falls to the creators and developers of AI systems. They hold the expertise and capacity to embed safety from the onset.
In response to growing safety concerns, collaborative efforts are being made across the board. From the OWASP Foundation’s work on identifying AI vulnerabilities to the establishment of the ‘AI Safety Institute Consortium’ by over 200 members, including tech giants and research bodies, there is a concerted push towards creating a safer AI ecosystem.
Ross Lazerowitz from Mirage Security comments on the precarious state of AI security, likening it to the “wild west” and underscoring the importance of choosing trustworthy AI models and data sources.
This sentiment is echoed by Rusu. “We need to pivot so AI serves, rather than betrays human progress.” He also notes the unique challenges AI presents to cybersecurity efforts. Ensuring AI systems, particularly neural networks, are robust and reliable remains paramount.
The concerns raised by the recent study on LLMs show the urgent need for a comprehensive strategy toward AI safety, calling on industry leaders and policymakers to step up their efforts in protecting the future of AI development.