Atlantis CRS

Hacking Redefined: How LLM Agents Took on University Hacking Competition

Hacking Redefined: How LLM Agents Took on University Hacking Competition

For the first time, we deployed our hybrid system, powered by LLM agents—Atlantis—to compete in Georgia Tech’s flagship CTF event, TKCTF 2024. During the competition, Atlantis concentrated on two pivotal areas: vulnerability analysis and automatic vulnerability remediation. Remarkably, the system uncovered 10 vulnerabilities and produced 7 robust patches1, showcasing the practicality and promise of our approach in a real-world hacking competition. In this blog, I’ll delve into some fascinating insights and essential lessons from the CTF experience. As we prepare to open-source the full details of our system following AIxCC competition rules, this milestone reflects more than just a technical achievement—it embodies our commitment to advancing LLM-driven security research.

Autonomously Uncovering and Fixing a Hidden Vulnerability in SQLite3 with an LLM-Based System

Autonomously Uncovering and Fixing a Hidden Vulnerability in SQLite3 with an LLM-Based System

Without knowing beforehand that the challenge project involved SQLite3, our team, Team Atlanta, entered our Cyber Reasoning System (CRS), named Atlantis, into the AI Cyber Challenge organized by ARPA-H, DARPA, and the White House. Remarkably, Atlantis secured six first-bloods and autonomously identified and patched a real bug in SQLite31, earning us a $2 million prize and a place in the grand finals of AIxCC. For more details, check out our team’s announcement blog.