<aside> ℹ️ This session features at PAIRS 2026 Online on 17th February 2026 17:15 UTC. Registered participants will receive zoom links to join the session via e-mail.

</aside>

[https://drive.google.com/file/d/1K_q3Qe1NGkQymNCSzSB_uMOb5Z7ZXGIw/view?usp=drivesdk](https://drive.google.com/file/d/1K_q3Qe1NGkQymNCSzSB_uMOb5Z7ZXGIw/view?usp=drivesdk)

Abstract

AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to assess the breadth and magnitude of the AI risk surface. Researchers from the field of systems design have developed several effective sociotechnical AI evaluation and red teaming techniques targeting bias, hate speech, mis/disinformation, and other documented harm classes. However, as increasingly sophisticated AI systems are released into high-stakes sectors, our current evaluation and monitoring methods are proving less and less capable (Butler et al 2024, Schwartz et al 2025).

Delivering AI safety and security is, in theory, a duty shared by AI application developers and public entities alike (Weidinger et al 2023). Yet, in practice, AI labs have been the only actors regularly leading scaled AI evaluations or red teaming exercises, in large part because they are thought to be the only AI actors with the resources and talent to do so. Conversely, despite perceived responsibilities, public entities have not typically contributed to this function directly through technical interactions during the AI design, development, or deployment stages, producing a “responsibility gap” in which developers are in fact also their own assessors, a dynamic that is clearly ripe for conflicts of interest. Pioneering new approaches to close this “responsibility gap” are therefore now more urgent than ever (Schwartz et al 2025).

In this poster, we propose one such approach, the participatory public AI red-teaming exercise, and discuss early results of its prior pilot implementations. The first in-person public demonstrator exercise was held in conjunction with the Conference on Applied Machine Learning in Information Security (CAMLIS), 2024. We review the operational design and results of this exercise, the prior National Institute of Standards and Technology (NIST)’s Assessing the Risks and Impacts of AI (ARIA) pilot exercise, and a third hybrid in-person and virtual exercise conducted with the Singapore Infocomm Media Development Authority (IMDA) in partnership with nine research institutes across South, Southwest, and East Asia.

Ultimately, we argue that these types of exercises can empower public entities to deliver their responsible AI obligations and, in so doing, meaningfully advance the ethical development, deployment, and governance of AI. First and foremost, they draw upon the diversity of an entire national jurisdiction (or entire regions) to supply a diverse sample of capable red teamers that provide authentic grounding in the norms, values, and discourses that construct different deployment contexts. Second, by partnering with public entities, this type of exercise more meaningfully ensures that “civil society retains its agenda-setting power over the development and deployment of AI” (UNESCO 2022). Third, by involving AI application developers, this type of exercise allows for better real-world assessment of socioculturally complex harms surfaces, which benefits both adoption/alignment efforts as well as localized security functions. Finally, we believe that these exercises provide validation of a scalable model that most AI-developing states or regions can employ to ensure that AI systems developed for their communities are safe and more closely aligned to that community’s values.