At Pattern Labs, we’ve been focusing some of our efforts on evaluating the cybersecurity capabilities of frontier models. To do so, one of the first questions we tackled was how to define these capabilities in a meaningful and useful way. The following describes the taxonomy we are currently using internally, and while it is constantly evolving and a work in progress, we believe it is mature enough to be useful to others as well.
In the rest of this blog post we will first elaborate on the theoretical framework used in the taxonomy, and then showcase the taxonomy itself. Note that we use the words “capability” and “skill” interchangeably in this document for clarity.
Before diving into the taxonomy, it is important to differentiate between different parts of the cybersecurity skill-set. We first note that we are focused on measuring the practical cybersecurity skills and capabilities of frontier models, rather than their knowledge¹. We will start with contrasting between a cybersecurity capability and a cybersecurity domain:
In other words, you apply cybersecurity capabilities in different cybersecurity domains to achieve similar goals. For example, many cybersecurity operations include elements of intelligence gathering and reconnaissance. This is an important skill: without knowing during a cyber operation what to expect, it is difficult to achieve your goals. However, there are many domains in which this capability can be applied: reconnaissance is entirely different when starting to plan a cyber attack, and after a foothold was established inside a target’s network. Although the goal might be similar, the practical applications differ significantly.
Furthermore, capabilities can be broken down into sub-capabilities, and usually many levels further down. For example, a sub-capability of intelligence gathering and reconnaissance is gathering publicly available information against a specific target. An instance of this sub-capability is OSINT gathering: the ability to collect and assimilate information from open-source sources, such as the internet. This is clearly part of the intelligence gathering process, but in some situations it might not be applicable, or is less relevant: e.g., against hardened targets that little information is available on them, or after the initial reconnaissance stage of an operation. These instances can sometimes be broken down even further: e.g., OSINT can be gathered from social networks, official websites, or from other widely available sources such as Wikipedia.
On the other hand, an example of cybersecurity domains is the diverse application of vulnerability discovery in various settings. For instance, vulnerability discovery and exploitation differ significantly when attempting to exploit a web application in a black-box scenario versus a workstation with known software as part of a network operation.
Finally, we would like to briefly highlight capabilities that we call cyber-strengths. These are specific skills, areas of knowledge, or access that in our eyes constitute unique and significant advantages in the cyber domain. Usually, having access to these strengths allows either to scale considerably the amount of cyber attacks a threat actor can execute, or to successfully attack hardened targets that would be protected otherwise, or both. In colloquial terms, these are game-changers in the realm of cybersecurity.
Our taxonomy aims to cover the major capabilities and most of their critical sub-capabilities, highlight some of the common existing cybersecurity domains, and list some significant cyber-strengths.
Importantly, we note that our taxonomy focuses solely on capabilities that are unique to the field of cybersecurity. This distinction is non-trivial: for example, one could argue that there is a high correlation between coding skills and cybersecurity capabilities in AI systems, and thus testing coding skills helps significantly with evaluating cybersecurity capabilities. Although this is probably true, we focus only on distinct cyber skills, for three main reasons:
Moreover, the taxonomy does not include agentic reasoning, planning and orchestration capabilities. This is although there are multiple threat scenarios that are affected by these capabilities, and AI system’s performance is commonly inhibited by this. Even though we monitor these skills actively in our evaluations, we believe that these capabilities are not unique to the cybersecurity domain and thus should not be included in the taxonomy.
Finally, some specific cybersecurity capabilities are out of scope for the taxonomy, at least for the near future. For instance, using human agents and leveraging cyber-relevant purchases in the darknet is currently considered to be covered by other evaluation types (e.g., ARA) and is not in our focus.
In essence, this taxonomy is aimed at evaluating and judging the offensive cybersecurity capabilities of existing AI systems. Consequently, some skills and tests that would be appropriate to evaluate expert cybersecurity personnel on are not included.
The following is Pattern Labs’ Cybersecurity Taxonomy. As mentioned before, this is a living document that is being refined in an ongoing manner.
The taxonomy is built in the following structure:
Cybersecurity Capabilities
Cybersecurity Strengths
This list is not exhaustive and is intended to give notable examples.
Cybersecurity Domains
This list is not exhaustive and is intended to give notable examples. Note that these domains are not mutually exclusive.
¹ We plan on elaborating on this difference in a future blog post.
² Other “INTs” can also be relevant, but currently we believe they are out of scope. If AI capabilities advance sufficiently, these may be included here and in other parts of the taxonomy.
³ Although a case can be made for this capability to be dismissed as not unique enough because of the significant overlap with general coding skills, from our experience it is distinct enough to be worth evaluating separately.
⁴ Cyber Network Attack & Cyber Network Exploitation
⁵ Some example goals: persistence, privilege escalation, lateral movement, collection, exfiltration and impact.
⁶ Tactics, Techniques, Procedures
⁷ See for example https://www.welivesecurity.com/en/eset-research/moustachedbouncer-espionage-against-foreign-diplomats-in-belarus/
Cyber Capabilities Analysis © 2024 by Pattern Labs Tech Inc. is licensed under CC BY-NC-ND 4.0.
@misc{pl-cyber2024, title={Offensive Cyber Capabilities Analysis}, author={Pattern Labs}, year={2024}, howpublished={\url{https://patternlabs.co/blog/cyber-capabilities-analysis}}, }