As frontier AI models get more powerful, the stakes for secure deployment get higher. Pattern Labs was brought in to put GPT-5 to the test. Our work is referenced in the GPT-5 Model Card, and we’re excited to share more on what we found.
OpenAI asked us to evaluate how GPT-5 handles offensive cybersecurity tasks. Using our internal suite of real-world challenges, like evasion, network exploitation, and vulnerability discovery, we ran the model through a mix of scenarios aimed at testing a variety of cybersecurity capabilities. Our approach blended quantitative metrics with deep qualitative assessments, simulating realistic adversarial scenarios to pressure-test the model’s behavior.
GPT-5 showed meaningful improvements in several areas, particularly in its ability to plan multi-step actions and execute them with precision: landing on clean, creative solutions that worked remarkably well. But it also showed hesitation and missteps. We saw moments where the model correctly identified the right path forward… and then failed to act on it. Sometimes it didn’t believe the path would work. Other times, it chased simpler ideas that led nowhere.
There’s an interesting tension here: GPT-5 is capable of sophisticated reasoning and execution, but not reliably. And when you look closely at what it can and can’t do, the picture becomes clearer: it’s smarter than previous models, but still falls short of being a dependable offensive security tool.
As these models get more capable, the need for deeper, more realistic security testing only grows. Evaluations like this have to keep pace, with rigor, realism that mirror potential real-world offensive use cases.
GPT-5 showed progress, but also clear limits. Understanding the nuances of where the limits lie is critical when deciding how, where, and whether to deploy these systems safely.
This work is just one part of how we are thinking about the future of secure, high-stakes AI.
Deeper dive coming soon.
@misc{pl-evaluating2025, title={Pattern Labs x OpenAI: Evaluating GPT-5's Cybersecurity Capabilities}, author={Pattern Labs}, year={2025}, howpublished={\url{https://patternlabs.co/blog/evaluating-gpt-5}}, }