The future of hunting down security flaws could be multiple LLMs working together

An abstract image of a lock against a digital background, denoting cybersecurity.
(Image Credit: TheDigitalArtist / Pixabay) (Image credit: Pixabay)

The future of penetration testing and vulnerability hunting will most likely not be with AI, but rather AIs - as in multiples, security experts have warned.

Researchers from the University of Illinois Urbana-Champaign (UIUC) found a group of Large Language Models (LLMs) outperformed single AI use, and significantly outperformed the ZAP and MetaSploit software.

"Although single AI agents are incredibly powerful, they are limited by existing LLM capabilities. For example, if an AI agent goes down one path (e.g., attempting to exploit an XSS), it is difficult for the agent to backtrack and attempt to exploit another vulnerability," noted researcher Daniel Kang, "Furthermore, LLMs perform best when focusing on a single task." 

Effective system

The shortcoming of AI hunting for vulnerabilities is, at the same time, its biggest strength - once it goes down one route, it cannot backtrack and take a different route. It also performs best when it focuses on a single task.

Hence, the group designed a system called Hierarchical Planning and Task-Specific Agents (HPTSA), which consists of a Planner, a Manager, and multiple agents. In this system, a planner surveys the app (or website) to try and determine which exploits to explore, and then assigns them to a manager. The manager then delegates different avenues to different agent LLMs.

While the system might sound complex, in practice it has proven to be quite effective. Out of 15 vulnerabilities tested in the experiment, the HPTSA exploited 8 of them. A single GPT-4 agent exploited just 3, which means that HPTSA was more than twice as effective. In comparison, ZAP and MetaSploit software couldn’t exploit a single vulnerability.

There was one instance in which a single GPT-4 agent performed better than HPTSA, and that was when it was given a description of the vulnerability in the prompt. That way, it managed to exploit 11 out of 15 vulnerabilities. However, this requires the researcher to carefully craft the prompt, which many people might not be able to mimic. 

The prompts used in this experiment will not be shared publicly, and will only be given to other researchers upon request, it was said.

Via Tom's Hardware

More from TechRadar Pro

TOPICS

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.

Read more
An abstract image of digital security.
Identifying the evolving security threats to AI models
An image of network security icons for a network encircling a digital blue earth.
Why effective cybersecurity is a team effort
A stylized depiction of a padlocked WiFi symbol sitting in the centre of an interlocking vault.
Sounding the alarm on AI-powered cybersecurity threats in 2025
An abstract image of digital security.
Looking before we leap: why security is essential to agentic AI success
ThreatLocker CEO Danny Jenkins speaking at ZTW25
“It’s made our jobs harder, not easier” - ThreatLocker CEO Danny Jenkins on AI
An abstract image of a lock against a digital background, denoting cybersecurity.
Why AI is playing a growing role in helping SOC teams keep up with cyber threats
Latest in Security
cybersecurity
Chinese government hackers allegedly spent years undetected in foreign phone networks
Data leak
A major Keenetic router data leak could put a million households at risk
Code Skull
Interpol operation arrests 300 suspects linked to African cybercrime rings
Insecure network with several red platforms connected through glowing data lines and a black hat hacker symbol
Multiple routers hit by new critical severity remote command injection vulnerability, with no fix in sight
Code Skull
This dangerous new ransomware is hitting Windows, ARM, ESXi systems
An abstract image of a lock against a digital background, denoting cybersecurity.
Critical security flaw in Next.js could spell big trouble for JavaScript users
Latest in News
DeepSeek
Deepseek’s new AI is smarter, faster, cheaper, and a real rival to OpenAI's models
Open AI
OpenAI unveiled image generation for 4o – here's everything you need to know about the ChatGPT upgrade
Apple WWDC 2025 announced
Apple just announced WWDC 2025 starts on June 9, and we'll all be watching the opening event
Hornet swings their weapon in mid air
Hollow Knight: Silksong gets new Steam metadata changes, convincing everyone and their mother that the game is finally releasing this year
OpenAI logo
OpenAI just launched a free ChatGPT bible that will help you master the AI chatbot and Sora
An aerial view of an Instavolt Superhub for charging electric vehicles
Forget gas stations – EV charging Superhubs are using solar power to solve the most annoying thing about electric motoring