OpenAI Pays Researchers to Find AI Safety Flaws

April 20, 20262 min read

TL;DR

OpenAI's new bug bounty program rewards researchers who spot real-world AI harms, going beyond standard security vulnerabilities.

OpenAI has launched a public Safety Bug Bounty program aimed at identifying AI abuse and safety risks across its products, reflecting the rapid evolution of AI technology and its potential for misuse. The program's goal is to ensure systems remain safe and secure against misuse or abuse that could lead to tangible harm, addressing concerns that go beyond traditional security measures. This initiative represents a proactive step to engage external expertise in mitigating emerging threats in the AI landscape.

This new program complements OpenAI's existing Security Bug Bounty by accepting issues that pose meaningful abuse and safety risks, even if they do not meet the criteria for a security vulnerability. It focuses on partnering with safety and security researchers to identify and address issues that fall outside conventional security vulnerabilities but still pose real risks. Submissions will be triaged by OpenAI's Safety and Security Bug Bounty teams, with potential rerouting between the two programs depending on scope and ownership.

The Safety Bug Bounty program specifically targets AI-specific safety scenarios, though jailbreaks are out of scope for this public effort. Instead, OpenAI periodically runs private bug bounty campaigns focused on certain harm types, such as Biorisk content issues in ChatGPT Agent and GPT5, inviting interested researchers to apply when these programs arise. This structured approach allows for targeted assessments of high-risk areas without exposing them broadly.

Outside the listed categories, researchers who identify flaws that facilitate direct paths to user harm with actionable, discrete remediation steps may have their submissions considered in scope for rewards on a case-by-case basis. However, general content-policy bypasses without demonstrable safety or abuse impact are explicitly out of scope. For example, jailbreaks that result in the model using rude language or returning easily searchable information are not eligible, ensuring the program focuses on substantive risks.

Researchers interested in participating can apply through the Safety Bug Bounty program, with OpenAI expressing eagerness to work alongside researchers, ethical hackers, and the safety and security community. This collaboration aims to foster a secure AI ecosystem by leveraging diverse expertise to preemptively address potential abuses. The program's design emphasizes practical, harm-focused evaluations over theoretical vulnerabilities.

The program's limitations include its exclusion of jailbreaks and content-policy bypasses that lack demonstrable safety impact, narrowing its focus to tangible harms. By running private campaigns for specific harm types like Biorisk, OpenAI acknowledges that not all risks can be addressed publicly, balancing openness with security needs. This approach ensures resources are allocated to areas with the highest potential for real-world consequences.

In context, this initiative underscores OpenAI's commitment to evolving safety measures as AI technology advances, recognizing that misuse vectors change rapidly. It builds on existing security frameworks by extending scrutiny to abuse scenarios that might otherwise go unchecked. The program's success will depend on researcher engagement and the ability to adapt to emerging threats in a dynamic field.

Overall, the Safety Bug Bounty program represents a ical effort to enhance AI safety through community-driven oversight, focusing on actionable risks rather than broad vulnerabilities. It highlights the importance of continuous assessment in maintaining secure systems as AI capabilities grow. This structured, collaborative model may set a precedent for how organizations address safety in complex technological environments.