Inside a Hot-Button Research Paper: Dr. Emily M. Bender Talks Large Language Models and the Future of AI Ethics

Bender co-authored the paper with members of Google's AI Ethics team.

February 1, 2021

• 8 min read

On Dec. 2, Dr. Timnit Gebru, the co-lead of Google’s Ethical AI team, tweeted that she had been fired after a dispute over a research paper. The next week, search engine traffic for “large language models”—the focus of the contested paper—went from zero to 100 (literally).

Recap: Large language models (e.g. GPT-3) are key in generating human-like text, and use cases vary widely—from analyzing tomes of medical research to auto-generating email responses. Generally, the more parameters they're trained on, the smarter they appear.

The paper, dubbed “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜”, has now been accepted into a global computer science conference. We chatted with Dr. Emily M. Bender, its co-author and a professor of linguistics at the University of Washington, about top-level takeaways—and the future of AI ethics.

This interview has been edited for length and clarity.

This paper blew up quickly, and there’s a lot of sensitivity around large language models right now. The headlines around the paper contributed to this, but in your view, why did it strike a nerve with so many people?

The paper got a lot of attention because Google connected it to firing Dr. Gebru. If you read the paper it’s sort of like, “Why would they fire her over this?” But it’s a part of the story, and so it attracted a lot of attention that, in the ordinary course of events, it wouldn’t have attracted.

To the extent that the attention is about the paper’s contents, I think it's because large language models are technologically cool. There's a lot of cleverness that goes into designing the model architecture so that it can take advantage of very large collections of texts. And once they can do that, they can do all these sort of amazing-seeming parlor tricks. And there's been a lot of genuine organic excitement, and hype, about that.

This paper is pushing back and saying, “First of all, it’s not doing everything you're claiming is doing; these language models are not understanding language,” and secondly, it's not a harmless game—there's actual possible dangers here.

And in most of the work celebrating these language models, there's a lot of talk about the benefits and relatively little attention paid to the possible harm.

The reason I say most of the work, and not all of it, is that the OpenAI folks did put a lot of thought into this. They designed their release process around making it possible to explore what the downsides are, by putting it slowly out into the world, and they've written a thoughtful paper about that.

What would you say to critics who argue the paper doesn’t consider the benefits and risks of large language models equally?

One of the reactions to the paper has been, “This is very one-sided. You can't talk about the risks without talking about the benefits.” That’s surprising, since so much work in this area only focuses on the benefits.

When you're writing a research paper, you set a research question to answer. Ours wasn't “What are the relative risks and benefits of language models in general?” Instead, it was in the context of this rush to making ever-bigger language models without very much consideration of the risks (with the footnote that OpenAI is considering some of them). We wanted to pull together all the risks in one place.

Another reason it doesn't necessarily make sense to have "What are the risks and benefits of large language models?" as a research question is that large language models aren't themselves a specific application. So we can identify some of the risks looking at the general picture, but that trade-off equation, how these things balance each other out, has to be answered in the context of what we’re using them for—and how we’re using them.

What are the most important findings that you’d point to right away?

There's a few different ones. One has to do with the cost of doing the computation. So when research in a field is driven by a methodology that is costly in terms of the size of the dataset, in terms of the compute infrastructure required to do it, and in terms of the energy required, that cuts out certain participants from the research.

So if the whole field is focused on this one research methodology, and somebody doesn't have access to that level of compute power, or somebody wants to work on a language for which you can't amass that much data, then they’re just cut out.

Then there’s the takeaway around environmental concerns. In computer science, it is really easy to conceptualize what we're doing as something that is abstract and only exists in this abstract logical space, but actually running the training on one of these things and then deploying it in the world is a physical process that takes physical electricity.

One way to keep it in mind is to basically make it visible in research papers: What was the cost of doing this in terms of compute time, in terms of electricity? That way, the field is positioned to incentivize work that is greener by requiring less-intensive compute power.

How about pattern recognition?

That’s the third takeaway. At the base, this technology is about pattern recognition, and the training data is the source of the patterns that the system learns to recognize. It reproduces those patterns when it’s in its deployment context. And that is really important as a takeaway for understanding what language models do and don't do.

When the models spit out this well-formed looking English, we as speakers of English take it in and apply meaning to it. And so it's really easy to be fooled into thinking that the computer says something, when in fact the computer just reproduced the patterns.

That's where the “stochastic parrots” metaphor comes from—it’s haphazardly fitting together what it’s seen in training data. So it’s about not being fooled by that as a person interacting with computers, but then also putting some thought into the patterns that it's matching and whether we want those out in the world—taking action by generating text or working as classification systems—given that the training data has lots of patterns that we don't want to reproduce? These are patterns of systems of oppression, racism, sexism, transphobia, and so on.

Given all this, what can we expect from corporate-sanctioned AI ethics in the near future? What can we reasonably hope to get out of it?

It’s easy to say, “Oh, corporations could never do this work—they shouldn't even be pretending to.” But that does a discredit to the researchers who are employed by the corporations doing really fantastic work. There's people whose work I really respect in these positions.

So it’s valuable to have people who are at the corporations doing the work, because they have a visibility into things that someone on the outside doesn't have. But if we want to get to a place where the technology we build serves humanity broadly—and not just the interests of people who are making the most money off of it—we’ll need an all-hands-on-deck situation.

We need people on the outside, who have academic freedom, looking into it.
We need people training students to have a framework to think about these questions later, if they’re going into the industry.
And we need informed people in government to design appropriate regulations.

What’s down the line for Google’s own ethics research, in your eyes?

Google is not behaving in a way that's consistent with that picture that we'd like to get to. So: what does that mean? Is there a way for Google to redeem itself and to actually support the fantastic researchers who are still there? One thing Google could do: have a very transparent procedure around approving papers that is specifically restricted to protecting IP, for example. To explain that and create self-accountability around it—and create visibility into that accountability—to support the groups that they have there.

Timnit Gebru and Margaret Mitchell were co-leads of this ethical AI team at Google that were doing great work. Now Dr. Gebru has been pushed out, and you’ve probably seen Dr. Mitchell’s name in the headlines too. The people they’ve hired, fostered, and nurtured are still there and still doing great work, but they’re also hampered by the stress of this whole situation.

Google has an opportunity to do better in this respect, to set up a situation where people can do their research, and I think in the end it's in Google's interest to do that. For long-term business viability, you don't want to be doing something that breaks the public trust.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.