The open-source AI race has heated up stateside, just days after China’s DeepSeek stunned Silicon Valley with its shoestring-budget models.
The Allen Institute for AI (AI2) recently released a new model it claims can beat or hold its own with DeepSeek V3 and OpenAI’s GPT-4o across several benchmarks. The release is a larger version of the model that the nonprofit lab unveiled last November with nearly six times the number of parameters (405 billion vs. 70 billion).
At a moment when DeepSeek has thrust open-source AI into the spotlight, AI2 hopes that the model will demonstrate that US-based open-source companies are also chipping away at the performance gap between open and closed models.
“We only have a really, really limited number of open-source US-based models, like our model, and also [Meta’s] Llama and maybe a couple more,” Hannaneh Hajishirzi, senior director of NLP research at AI2, told Tech Brew. “So we showed that applying [our training regimen] on a US open language model, you can actually close the gap, or even outperform DeepSeek V3.”
The same tricks: AI2’s models use a novel reinforcement learning technique—training by way of “rewards” and “punishments” for right and wrong outputs—in which the model is taught to solve math or other problems with verifiable answers. DeepSeek used similar reinforcement learning techniques to train its models on reasoning tasks.
“It is pretty much, I would even argue, identical,” Hajishirzi said. “It is very simple… we had it in this paper in late November and DeepSeek came after us. Someone was asking me, ‘Did they actually copy what you did?’ I said, ‘I don’t know. It was so close that each team could come up with this independently.’ So, I don’t know, but it’s open research. A lot of these ideas could be shared.”
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
Many LLMs already train in part on reinforcement learning combined with human evaluators who judge the quality of output, which AI2 also does as part of its post-training regimen. But reinforcement learning without these human judges is newer to LLMs, though already common in fields like robotics and self-driving cars.
As LLMs delve further into reasoning tasks that require a chain of thought, reinforcement learning will likely become an even more important tool, Hajishirzi said.
Next frontiers: AI2 is also studying DeepSeek’s technical papers in an effort to determine which of their other efficiency measures might be replicable.
“They did very amazing engineering work to make training much more efficient,” Hajishirzi said. “A couple of our engineers are actually very carefully looking at everything that they’ve done, and we are trying to see if those things can be transferred to our models.”
Hajishirzi said AI2 is also exploring how it can expand its reinforcement learning techniques beyond problems with easily verifiable answers by looking for ways of measuring the model’s success on qualitative outputs.
“It will be very interesting when we start applying this type of reinforcement learning on tasks or settings that are not directly verifiable,” she said. “So how can we say, ‘Oh, OK, after these steps, AI successfully did this, but then measuring success would be difficult.’ So it is interesting to think of how to evaluate success in tasks that we do not have the final kind of verification step. A lot of people are now excited and thinking about that."