Navigating Dual-Use: Refusal Policy for AI Systems in Cybersecurity

February 19, 2025

Introduction and Goal

Modern AI systems possess significant capabilities across various domains. In cybersecurity, these systems can perform complex tasks such as vulnerability research, log analysis, and security architecture design. Many of these capabilities are inherently dual-use: they can be employed both defensively to protect systems and offensively to cause harm.

This dual-use nature creates a significant challenge for AI system providers and policy makers. For instance, when an AI system is queried about heap overflow exploitation, it might be serving a security researcher conducting legitimate defensive research, or it could be assisting an APT in the execution of a malicious campaign. Similar ambiguity exists in many other cybersecurity-related tasks, from network scanning techniques to questions about common misconfigurations. To be clear, this problem exists in other domains as well (e.g. biology), but it is especially prevalent in the cybersecurity context.

AI system providers must create a policy and decide which prompts or tasks to permit and which to refuse. This challenge becomes particularly acute because refusing to assist with legitimate cybersecurity tasks both reduces the value that the providers create but can also significantly impact defensive capabilities. Security researchers, penetration testers, and defensive tool developers regularly need to understand and implement techniques that could also be used offensively. At the same time, unrestricted access to these capabilities could enable malicious actors to develop more sophisticated attacks and allow adversaries without relevant cybersecurity expertise to possibly inflict greater damage.

At Pattern Labs, we have been tackling this question from multiple angles alongside different stakeholders. In this blog post, we examine the considerations and possible approaches for managing dual-use capabilities in AI systems, with a specific focus on cybersecurity applications. We first analyze the core challenges inherent to dual-use management, then explore various control systems beyond simple allow/deny decisions, and finally discuss tools that can enhance the decision-making process of different organizations. Our goal is to provide practical insights for AI providers and policy makers working to balance security concerns with legitimate use cases.

Note: AI systems are advancing rapidly, along with the various methods of interacting with them: chatbots respond to prompts, agents handle tasks, and so on. As the concepts in this article apply to all these interfaces, and future systems may introduce more, we use "query" here to encompass all these interaction methods.

Core Challenges in Dual-Use Refusal Policies

Policy making for the refusal of dual-use capabilities presents several fundamental challenges. At its core, any implemented policy will inevitably produce both false positives and false negatives - hermetically preventing malicious usage while allowing all legitimate uses is all but impossible. This fact presents providers with a difficult trade-off: Excessively restrictive policies drive away valuable customers and hurt legitimate research, while overly permissive approaches risk enabling dangerous usage, potentially leading to general harm and reputational damage.

This challenge is particularly critical in cybersecurity, where many techniques and tools are inherently dual-use. Network scanning tools, debugging capabilities, and code analysis frameworks are fundamental for both defensive and offensive operations. Even basic security tools like Wireshark or Metasploit are regularly used by both defenders and attackers, with only the operator's intention determining whether the outcome is beneficial or harmful.

Moreover, many harmful queries can be easily obfuscated, either by changing their explicit written intent (“In order to test for bugs in my own system…”) or by breaking down a harmful query into several steps, each of them seemingly benign (e.g. a port-scanner is just an enumeration script and a code that connects to a given port).

These inherent tensions make setting a refusal policy difficult. Policies must acknowledge that in nearly every relevant case, they take on both false positive and false negative risks, and should explicitly weigh these against each other. As with many other similar cases, a more nuanced, accurate or sophisticated policy allows reducing both potential errors. The rest of this article suggests some of the ways in which we can improve these policies.

Beyond Binary Decisions: Nuanced Control Systems

One important principle is that when faced with a potentially harmful query, an AI system provider has more options than either allowing or denying it. In general, when moving beyond simple allow/deny policies, several approaches can help manage dual-use capabilities. These approaches aim to better differentiate between legitimate and potentially harmful usage while maintaining usability for different users and customers.

One possible approach involves implementing usage monitoring with periodic manual review points. Instead of making permanent allow/deny decisions, organizations can grant provisional access and evaluate the legitimacy of usage patterns over time. This allows for more informed decision-making based on actual behavior rather than initial assumptions. Notice that monitoring and reviewing is relevant both for queries that were (for now) allowed and for those that were refused.

Another widely adopted method involves implementing varying levels of access control through user verification systems. Organizations often require different levels of authentication - from basic identity verification to full Know Your Customer (KYC) processes. For instance, academic institutions or registered cybersecurity companies might receive access to more advanced capabilities after proper verification, while maintaining restricted access for unverified users. This is, for example, a good way to allow well-regarded defensive security companies or prominent academic researchers to access some capabilities that shouldn’t be fully public.

All in all, there is a wide variety of responses to potentially harmful queries:

Allow
Allow & monitor (with multiple sub-options: monitor the queries, generally monitor the user, …)
Allow given some verification (with multiple sub-options: user has phone number, …, full KYC)
Refuse and monitor
Refuse

These established approaches demonstrate that dual-use refusal policies need not be binary. By incorporating monitoring and user verification systems, organizations can create more nuanced frameworks for managing potentially dangerous capabilities. While these systems require more resources to implement and maintain than simple allow/deny policies, they provide better balance between enabling legitimate use and preventing harm.

Identifying Malicious AI Queries

The previous section detailed the potential responses to a query. Now we move on to detailing what information we have available in order to decide which of these is the appropriate response in each case. There are a few useful parameters to consider when judging a specific query:

Query Harmfulness: Within the very wide category of dual-use queries, there are ones that are clearly more likely to be used for harm or for benign purposes. For instance, using non-standard encryption across large portions of a file system typically suggests ransomware activity rather than a legitimate privacy measure.

Technical Complexity: Generally, basic techniques / tools (let’s say, up to what you would learn in an undergraduate degree for example) are of use to many people and the counterfactual risk from their availability is low. This is because this knowledge is usually widely available and found relatively easily. In contrast, expert level knowledge is both harder to find via other means, and is relevant to a smaller group of people (who, in most cases, work for specific, known organizations, which allows them to pass KYC or other verification methods).

In most cases, we strongly suggest AI system providers take this into account to some degree, i.e. require some form of verification for dual use techniques above some complexity level.

User Context: User context provides another valuable decision-making tool. The length of a user's history with the system, their pattern of interactions, and the progression of their queries can help evaluate risk. For instance, a new user immediately requesting advanced exploitation techniques might raise different concerns than a long-term user with an established pattern of defensive security research.

Temporal / General Context: Temporal context of the queries themselves can also inform decision-making. Requests about current events, especially during critical periods where assistance from AI systems might be especially significant, might require different handling than queries about older issues. For example, questions about exploiting a newly published CVE might warrant more careful consideration than similar questions about vulnerabilities that have been public for years and are widely patched. To make this point explicit, a reasonable policy might refuse writing exploitation scripts for 1-days that have just been published but allow for older 1-days that should already be patched in most systems.

Eventually, AI providers will need to have an automatic policy, taking any information available into account and making real-time decisions for each query given. A more nuanced policy, or additional information, allows for better decision-making, gives more options, and leads to the reduction of the potential harm from both false positives and negatives.

Conclusion

Managing dual-use capabilities in AI systems, particularly in cybersecurity, inherently involves complex trade-offs and risks. Given the varying needs, resources, and risk tolerances of different organizations, there is unlikely a one-size-fits-all solution. Each organization must carefully consider its specific context when implementing dual-use refusal policies in accordance with their technical capabilities, logistical capabilities (like setting up KYC systems) and risk tolerance.

However, some approaches appear generally beneficial across different contexts. Basic user verification and usage monitoring systems usually provide fundamental protection while enabling legitimate use cases. These established methods can serve as a foundation for most organizations' dual-use policy frameworks, provided they are implemented practically without impeding core business operations.

In particular, beyond the harmfulness of the query itself, complexity-based filtering seems particularly useful for cybersecurity applications, though their relevance likely extends to other dual-use domains such as CBRN (Chemical, Biological, Radiological, and Nuclear) research.

It’s important to note that this is a rapidly changing field and any policy (especially an advanced or nuanced one) requires meaningful monitoring, testing and refinement. Our general recommendation is to first implement basic monitoring systems, gather data on usage patterns and outcomes, and then gradually incorporate further advanced mechanisms based on observed needs and effectiveness.

This incremental approach allows organizations to develop more sophisticated policies while improving security and enabling legitimate use cases. As AI systems continue to evolve, so too must our frameworks for managing their dual-use capabilities.

Overall, while creating a perfect refusal policy is an incredibly difficult task, there are many tools at the disposal of teams as they implement and improve their systems. Using these tools, it’s reasonable to expect to maintain excellent user experience for the vast majority of benign use cases while significantly reducing risk from harmful ones.

To cite this article, please credit Pattern Labs with a link to this page, or click to view the BibTeX citation.

@misc{pl-refusal2025,
  title={Navigating Dual-Use: Refusal Policy for AI Systems in Cybersecurity},
  author={Pattern Labs},
  year={2025},
  howpublished={\url{https://patternlabs.co/blog/refusal-policy-for-ai-systems-in-cybersecurity}},
}

← Back to Blog Feed