Recommended reading: Excavating AI

I hadn’t run across this until today when Michal Wozniak shared it on Mastodon, but it’s an excellent read: Excavating AI: The Politics of Images in Machine Learning Training Sets.

One of the “standard” image sets used for training models to do image recognition is ImageNet, originally published in 2009. It contains more than 14 million images which were categorized by humans via Amazon’s Mechanical Turk project. The challenge, of course, is that when you use humans to create the training data, all the implicit (and explicit) bias of the human trainers is trained right into the data.

[ImageNet] provides a powerful and important example of the complexities and dangers of human classification, and the sliding spectrum between supposedly unproblematic labels like “trumpeter” or “tennis player” to concepts like “spastic,” “mulatto,” or “redneck.” Regardless of the supposed neutrality of any particular category, the selection of images skews the meaning in ways that are gendered, racialized, ableist, and ageist. ImageNet is an object lesson, if you will, in what happens when people are categorized like objects. And this practice has only become more common in recent years, often inside the big AI companies, where there is no way for outsiders to see how images are being ordered and classified.

While I don’t think the current AI tech boom will sustain in the long term, certain applications are very useful and probably will stick around. As we employ systems that are trained, we must always interrogate the assumptions and biases that have gone into that training.

There is much at stake in the architecture and contents of the training sets used in AI. They can promote or discriminate, approve or reject, render visible or invisible, judge or enforce. And so we need to examine them—because they are already used to examine us—and to have a wider public discussion about their consequences, rather than keeping it within academic corridors. As training sets are increasingly part of our urban, legal, logistical, and commercial infrastructures, they have an important but underexamined role: the power to shape the world in their own images.

It’s worth reading the whole thing.