Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
ChatGPT now has company in the tier of all-purpose AI bots that can juggle audio, imagery, and text.
Google released its long-awaited Gemini large language model (LLM) this week, kicking off what the company claims is a “new era” for its AI offerings. The search giant reported that the multimodal AI—meaning it can understand image, text, video, and audio input—can slightly outperform OpenAI’s GPT-4 on tasks like mathematical reasoning, understanding documents, and deciphering infographics.
The company is likely hoping that Gemini, which actually consists of a set of three different versions of the same LLM (Ultra, Pro, and Nano), will give it an edge or at least keep it competitive in the race to own the next phase of generative AI for consumers, in which experts say multimodality will be key. The debut follows OpenAI’s release of a similarly dexterous version of ChatGPT in September.
Google CEO Sundar Pichai claimed in an announcement accompanying the rollout that the news is just the start of the “Gemini era.”
“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” Pichai said in the announcement. “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company.”
Google will gradually roll out the model throughout its consumer products and developer tools. A version will serve as the backbone of the Bard chatbot starting this week, with a more advanced iteration set for early next year. Another variation of the model will also power new features in the Google Pixel 8 Pro phone, including summarization and smart reply. The tech will also be available for Android developers on an early access basis.
The New York Times reported that the version of Gemini that can analyze images and audio would not likely be available until next year.
The update comes just over a year since ChatGPT first gained viral attention, kicking off a craze around all things generative AI. Since then, Google has largely been playing catch-up to upstart OpenAI and its backer, Microsoft. Google released its own Bard chatbot months later, as Microsoft sought to carve into Google’s core business with a GPT-powered version of its Bing search engine.
Google now claims that its latest Gemini models, developed by its DeepMind research unit, can outperform the most advanced version of GPT-4 across 30 of 32 benchmarks tested across categories like reasoning, math, image, video, and coding. Most of the margins by which it beats the rival system are slim, however.
Pichai said building out the Gemini system will be a major ongoing undertaking for the company. “I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it,” Pichai said.