AI

DeepMind releases trio of papers on large language models

The Alphabet-owned research group dug into different aspects of the popular but controversial AI technique.
article cover

Francis Scialabba

3 min read

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.

On Wednesday, DeepMind—the Alphabet–owned AI research company—published not one, not two, but three research papers about this year’s most hotly debated AI tool: large language models.

  • As a reminder, these models are typically trained on an enormous amount of text and underpin services from Big Tech companies like Google to startups like Grammarly.

The highlights

Bigger isn’t always better: “Gopher” is the name of DeepMind’s new, 280 billion-parameter language model. (Generally, in the NLP world, more parameters = higher performance metrics. For reference, OpenAI’s GPT-3 has 175 billion parameters, while some newer models created by Google and Alibaba range from one to 10 trillion parameters.) With this paper, DeepMind wanted to test when scaling up a model’s size makes it perform better—and when that’s not the case.

  • The results: In DeepMind’s tests of different-sized models, relatively bigger models were better at comprehending written text, checking facts, and IDing toxic language. But their larger size didn’t necessarily make them any better at tasks involving common-sense or logic.
  • DeepMind also found that no matter a model’s size, it had the tendency to reflect human biases and stereotypes, repeat itself, and confidently spread misinformation.

Smaller isn’t necessarily worse: DeepMind also introduced a new model architecture dubbed RETRO, or Retrieval-Enhanced Transformer. It’s all about training large language models more efficiently (read: faster and cheaper). Think of RETRO as a David and Goliath situation, in that its performance can compete with neural networks 25 times larger, according to DeepMind.

  • RETRO’s internet retrieval database is 2 trillion words collected from the web, books, news, Wikipedia, and GitHub, and it reportedly helps with AI explainability—researchers can see the text passages it used to generate a prediction, rather than its decisions being an inexplicable “black box.”

The tech’s risks aren’t going anywhere: In the last paper, DeepMind released a classification of language-model-related risks, categorized into six areas: 1) discrimination, exclusion, and toxicity 2) information hazards 3) misinformation harms 4) malicious uses 5) human-computer interaction harms and 6) automation, access, and environmental harms.

  • Risk mitigation is one of the biggest concerns, according to the paper—for instance, figuring out how to address language models learning biases that could harm people.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.

T
B