Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
After months of anticipation, Google’s gambit to gain an edge on upstart rival OpenAI has garnered mixed reactions.
While Google claims its long-awaited Gemini system outperforms OpenAI’s GPT-4 across dozens of benchmarks, it only does so by slim margins, and much of the multimodal functionality that differentiates it won’t be available for months due to a gradual rollout.
The search giant entered the next stage of that extended debut last week with the announcement that it was making Gemini Pro available to enterprise developers.
As developers begin to experiment with the new models in Google AI Studio and the company’s machine learning platform, Vertex AI, they’ll be able to put some of Google’s claims to the test for the first time.
But details of the initial rollout—like an impressive demonstration video that turned out to be pretty much faked—have left some doubts over the capabilities of an AI system that has been buzzed about as Google’s hope for beating OpenAI for much of the last year.
Ramayya Krishnan, a professor of management science and information systems at Carnegie Mellon University, told Tech Brew he found the capabilities of the models available now a bit underwhelming. While Krishnan is excited about the possibilities around multimodality—or the ability for a model to interchange between voice, text, imagery, and other media—many of those features are not yet widely available.
“Gemini wasn’t a huge leap over GPT-4; it was a little better, but not by a large amount,” Krishnan said, adding that it would have been “ideal for them to release a Bard that could have done the multimodal…because that would have shown something like a big step forward.”
Others have noticed that the initially released version of Gemini was getting basic facts wrong, according to a roundup of reactions in TechCrunch.
Krishnan also said that the limited edge over OpenAI could be a sign that performance from models trained on massive amounts of data might be plateauing. Krishnan said he was intrigued by Gemini Nano—the smallest of the three versions of the models released—for its ability to run directly on a mobile device without a network connection.
“It’s smaller; it’s not as powerful as Gemini Ultra or ChatGPT,” Krishnan said. “But it can do a lot of stuff…Imagine if you’re a worker out in the field…in the military…deployed without a network, you can actually carry on your hand a device that has a full-fledged AI running on that device without the connection necessarily to the network, which I think is an interesting new development.”
Despite those lingering doubts, Krishnan said he expects multimodality to be a promising new chapter for AI that will yield new ways to link digital applications with the physical world—like, say, an AI guiding someone assembling IKEA furniture.
“You could imagine, if multimodal AI inputs and outputs with reasoning were built, that would really push those applications in a significant way,” Krishnan said.