Google integrated a native comprehension of video, audio, and images into its Bard AI chatbot, unveiling a groundbreaking model named Gemini on Wednesday in numerous countries, exclusively in English.
Early access to its enhanced artificial intelligence capabilities will be granted to owners of the Google Pixel 8 phone.
Gemini AI Features
Gemini’s current features encompass text-based chat functionalities, offering advancements in complex AI tasks like a document summarization, reasoning, and programming code generation.
Google anticipates a forthcoming significant expansion with multimedia capabilities, including the ability to interpret hand gestures in videos and decipher a child’s dot-to-dot drawing puzzle.
This imminent evolution is poised to redefine the boundaries of AI engagement, promising a more nuanced and versatile user experience.
Note from the CEO
Sundar Pichai, CEO of Google and Alphabet, shares a note about this incredible new AI integration into Google’s Gemini platform, he says:
“Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it.
“AI has the potential to create opportunities – from the everyday to the extraordinary – for people everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning, creativity, and productivity on a scale we haven’t seen before.”
Multimodal Artificial Intelligence
Gemini is the result of large-scale collaborative efforts by teams across Google. It was built from the ground up to be a multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.
Gemini is also a flexible AI model, able to efficiently run on everything from data centres to mobile devices. Its capabilities may significantly enhance the way developers and enterprise customers build and scale with AI.
As a result of rigorous testing of the new AI model and a score of 90%, the performance was found to exceed current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.
It is believed that the model will outperform human experts on MMLU which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.
Comparisons between ChatGPT and Google Gemini
Gemini AI emerges as a formidable rival to ChatGPT, presenting the potential to reshape the landscape of expansive language models. A comprehensive examination of the technical report is encouraged for a thorough understanding.
In terms of availability, ChatGPT stands as the more accessible option, with an established presence across various platforms and APIs.
It caters to both free users, with limited features, and those opting for paid plans to access extended functionalities. Conversely, Google’s Gemini remains in developmental stages, lacking public availability.
Speculations indicate a potential model with diverse access options, mirroring Google’s standard AI product structure.
The ease of use sets these models apart, as ChatGPT boasts a user-friendly interface and a straightforward API, facilitating a seamless initiation for beginners.
Gemini, with its advanced capabilities, may demand a higher level of technical proficiency, although specifics about its interface and API configuration remain undisclosed.
Concerning integration with other services, ChatGPT has already established connections with platforms like Discord and Telegram, fostering accessibility across various user communities.
On the contrary, Gemini’s integration capabilities are presumed to be limited initially, but with Google’s expansive infrastructure, seamless integration with diverse Google products and services is anticipated in the future.
Accessibility tools play a crucial role, and ChatGPT incorporates text-to-speech and speech-to-text options, enhancing usability for individuals with different abilities.
While Gemini’s accessibility features are yet to be unveiled, Google’s commitment to inclusivity suggests the incorporation of diverse accessibility tools upon release.
In terms of cost, ChatGPT adopts a freemium model, offering free access with limited features and premium plans for additional functionalities.
The pricing structure for Gemini remains undisclosed, but it is expected to align with other Google AI products, potentially featuring free access to basic features and tiered paid plans for advanced functionalities.