Skip to main content
AI

Top AI models may have hit a wall. That may not matter for some businesses

Reports say Big Tech is seeing diminishing returns on training.
article cover

Illustration: Anna Kim, Photo: Adobe Stock

5 min read

In the two years since ChatGPT thrust large language models (LLMs) into the mainstream, generative AI has seen a flurry of advances—take it from a publication that’s tried to keep up with them all.

But some of that progress may be slowing in coming months, according to recent reports that have prompted much discussion in the AI world over the last couple weeks. At least three of the top AI companies—OpenAI, Google, and Anthropic—may be hitting a wall of diminishing returns as they try to scale up the next iterations of flagship models, according to reports in The Information, Bloomberg, and Reuters.

For years, LLM development has hinged on the idea that the more data and computing power is shoveled into training AI systems, the smarter these models will be. But tech companies are now reportedly hitting roadblocks—they’ve nearly run out of human-authored training data, accelerating costs are yielding disappointing results, and energy crunches have set back training, according to Reuters.

All of that could have big consequences for the future of AI development, as companies may decide to home in on more specialized tasks or pursue different kinds of gains, experts told Tech Brew. Tech stocks that have soared on AI promises may adjust accordingly.

But for businesses that are still trying to harness generative AI for everyday tasks, the news doesn’t necessarily mean much, these experts said. Many companies are already finding that the massive models on the market are often bigger than they need for their purposes, according to Arun Chandrasekaran, a distinguished VP analyst with Gartner.

“I haven’t spoken to a single CIO who’s told me, ‘I want a 10 trillion parameter model rather than an 8 trillion parameter model,’” Chandrasekaran said. “I think the models that are out there are far beyond the use cases that they deploy. Their challenges are more around, ‘How do we make this cost-effective for the use cases we have? How do we make these models very specific and specialized for the domains and the use cases where we are deploying? How do we reduce hallucinations?’”

More specialized: There may be more room to improve models in a phase of development called post-training, which comes after the model has already been pretrained on massive troves of raw data, according to Sophie Lebrecht, chief operating officer at the Allen Institute for AI (AI2). Post-training encompasses methods like human-feedback reinforcement learning and fine-tuning on certain tasks.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.

“We’re seeing a sort of saturated performance at the pre-training level, but I think there’s still a ton of opportunity and headroom at post-training,” Lebrecht said.

AI2 released a new open-source model family last week that also aims to democratize post-training by making data, tools, and techniques available for developers to post-train foundation models on specialized tasks.

OpenAI has moved its Orion model into post-training, according to Bloomberg, but it’s not yet ready for release.

Lebrecht expects this type of differentiation to be more important as pre-training progress starts to matter less.

“If I’m reading x-rays in a hospital, I don’t necessarily need to know how many legs a centipede has,” Lebrecht said. “So as we start to define what are going to be these key use cases, we start to hill-climb against particular domains or verticals or use cases. And we’re not saying, ‘Hey, do we have this sort of generalized one foundation model that fits everything?’”

We have models at home: Timothy Martin, EVP of product and development at startup Yseop, which makes language generation tools for the science industry, said his company is more focused on shaping existing foundation models using techniques like retrieval-augmented generation (RAG)—a type of architecture grounded in searching and navigating documents—and agents, or AI systems that can perform specialized tasks.

“We are interested in the next generation of advances with respect to reasoning and some of these other capabilities, but these state-of-the-art models today are quite good, and we can do a lot with them,” Martin said. “If you’re on the trek for AGI, that [slowdown is] a big deal. But if you take a look at companies, product companies using generative AI right now, they are still working on a lot of cool things.”

Gartner’s Chandrasekaran said he’s also seen companies moving in the opposite direction, latching onto smaller language models that offer shorter response lags and lower costs.

“That’s one of the reasons why we’ve seen interest in small language models and mid-sized language models, where these models are not as versatile as these large language models, but they’re able to deliver task-specific accuracy at a much lower cost and at a much better speed,” Chandrasekaran said. “Those are the things I think CIOs deeply care about.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.