<aside> ℹ️ This presentation at PAIRS 2026 Online on 17th February 2026 12:15 UTC. Registered participants will receive zoom links to join the session via e-mail.

🎙️ Thread on PAIRS Discussion Server (Discord) (register first)

</aside>

Abstract

One of the most common arguments made by developers of AI systems concerning their motivation for improving model capabilities in regional languages is that such processes allow for the “preservation” of such languages in the face of an ever-advancing digital world. Arguments such as these have been made by all large technology companies and AI labs–none are unique. Although some communities have responded positively to this argument and have successfully negotiated certain concessions regarding community participation in linguistic modelling, many have issued sharp critiques of these practices as well.

These critiques have typically taken two forms. On the one hand, communities view the modelling and subsequent commodification of their language to be inherently exploitative, as those communities are only included in “tokenistic” interactions (Arnstein 1969) in such projects. On the other, communities observe that they are not the intended beneficiaries of regional language modelling–that value accrues primarily to those who develop, deploy, and maintain such systems, which are often commercialized. Anecdotally, many AI systems designers and developers also recognize these claims. Some are champions of them as well. Yet misapplication of participatory methods can in fact compound exploitative dynamics, leading even well-meaning attempts to achieve just AI system design to backfire.

This paper offers a critique of such practices in the context of expanding language modelling capabilities across the globe. It does so to provide a more formal description of this particular pitfall in participatory AI research and to gestures towards some alternate avenues that members of this community might consider validating. The argument proceeds in three stages.

First, building upon recent work by Mohammed, Png, and Isaac (2020) and Muldoon and Wu (2023), I describe in greater depth the reasoning behind the second type of critique of industry-led regional language modelling. I do this primarily by recalling recent historical research investigating the colonial origins of humanitarianism–specifically the notion of “preservation” or “protection” of communities thought to be “doomed” to dissolution in the face of an ever-advancing industrial modernity (e.g. Fassin 2011, Lester and Dussart 2014). Second, and armed with this analogy, I translate this into actionable foresight regarding the risks and harms produced by prevailing methods of regional language modelling, including arguments made to garner support for community participation in these efforts. Here, the paper discusses early findings of ongoing discourse analysis I began conducting in October of 2025. Third, and finally, the paper proposes conceptual and practical modifications to current practices of AI system design and development to support new languages that require input (and, I argue, more genuine consent) from participant communities.