ai2

AI News Ticker

Gemini Live Gives You AI With Eyes

Gemini Live Gives You AI With Eyes
Gemini Live Gives You AI With Eyes

Gemini Live Gives You AI With Eyes, and It's Awesome

Exploring Google's New Camera Mode for Real-Time Visual Interaction

Introduction: Gemini Gains Sight

Google has been enhancing its AI capabilities, recently rolling out the Gemini Live camera mode to all Android phones using the Gemini app, following a brief exclusivity period for Pixel and Galaxy devices. This update effectively gives the Gemini AI the ability to 'see' through your phone's camera, recognizing objects in real-time. It's more than just a novelty; users can engage in a conversation with Gemini about the objects it identifies, ask questions, and even share their screen for analysis. This integration of visual input into the conversational AI experience marks a significant step forward, blurring the lines between the digital and physical worlds.

Gemini Live interface on a phone

Gemini Live can now see, and it's wild. (Image: James Martin/CNET)

Core Functionality: More Than Just Object Recognition

When initiating a live session in the Gemini app, users can now enable a live camera view. This allows for a natural, spoken conversation with the AI about whatever the camera is pointed at. CNET's Blake Stimac tested the feature and was particularly impressed by its ability to recall information. After giving Gemini a 15-minute tour of his apartment, he asked where his scissors were. Gemini responded, "I just spotted your scissors on the table, right next to the green package of pistachios. Do you see them?" The AI was correct, having silently identified the scissors and their location earlier in the session without any specific prompt about them. This recall ability, reminiscent of Google's earlier Project Astra demos, suggests a deeper level of environmental awareness than simple object identification.

Google suggests Gemini Live can assist in various real-world scenarios, from navigating complex environments like train stations to identifying the contents of food items or providing detailed information about artwork. This conversational approach distinguishes it from tools like Google Lens, offering a more interactive and fluid experience, akin to talking with a knowledgeable companion rather than performing a visual search. The interaction feels casual, a significant improvement over the more rigid structure of the older Google Assistant.

Screenshot of conversation with Gemini Live

A look at part of a conversation with Gemini Live about the objects it was seeing. (Image: Blake Stimac/CNET)

Getting Started and Broader Context

Using the feature is straightforward: start a live session with Gemini, enable the camera view, and begin talking. Google has highlighted the feature in recent Pixel Drop communications and created a dedicated page for it on the Google Store.

Gemini Live builds upon Google's experimental Project Astra, aiming to push generative AI beyond text and voice prompts into real-time visual understanding. This development aligns with broader industry trends where AI tools are rapidly gaining new skills, including video generation and enhanced processing power. Apple's Visual Intelligence, released in beta, represents a similar effort in integrating visual understanding into AI interactions. The potential impact is significant, potentially changing how we interact with information and the environment around us by seamlessly merging digital intelligence with physical perception.

Putting Gemini Live to the Test

Initial tests showed Gemini Live's impressive accuracy. It correctly identified a specific gaming collectible (a stuffed rabbit from American McGee's Alice) in one instance. In another test within an art gallery, it not only identified an unusual sculpture (a tortoise on a cross) but also instantly translated adjacent Kanji characters, surprising the users. Stimac decided to stress-test the feature further using his collection of obscure horror-themed collectibles.

Stuffed rabbit collectible from American McGee's Alice

This was the first object tested with the new Gemini Live feature, impressively recognized as being from American McGee's Alice. (Image: Blake Stimac/CNET)

The results of these stress tests were mixed, revealing both the power and the current limitations of the technology. Gemini could be remarkably accurate, sometimes identifying not just an object but also specific details like its origin (e.g., a limited edition Destiny 2 item from a specific event). However, it also struggled, particularly with more obscure items or over longer sessions.

Performance Quirks and Limitations

The testing revealed several quirks and frustrations. Gemini sometimes performed worse as a session continued, potentially trying to incorrectly apply context from previously identified objects to new ones. This led Stimac to limit sessions to one or two objects for better results. At times, the AI would be significantly off-base, requiring multiple hints to arrive at the correct answer. It also occasionally seemed to pull context from entirely different past sessions, misidentifying objects based on previous conversations (e.g., repeatedly guessing items were from Silent Hill after discussing the author's dedicated display case).

More significant bugs were also encountered. Gemini occasionally hallucinated information, referencing characters from unreleased game titles by merging details incorrectly. Another frustrating bug involved the AI repeating an incorrect answer even after being explicitly corrected by the user. Restarting the session sometimes helped, but not always. Stimac found a workaround: restarting the live session from within an older chat history where Gemini had previously identified an item correctly seemed to improve performance for related queries, suggesting context management is still evolving.

Potential and Conclusion: A Glimpse of the Future

Despite the current inconsistencies and bugs, the core capability of Gemini Live – having a real-time, spoken conversation with an AI about the visual world – is undeniably impressive and feels futuristic. The potential applications are vast, ranging from practical assistance like identifying plants, translating signs, or getting help with DIY tasks, to more serendipitous discoveries like finding misplaced objects. While the technology is still maturing and requires refinement, Gemini Live offers a compelling preview of how AI might soon integrate seamlessly with our perception, providing context and information about the physical world in truly interactive ways.

Comments:

Add a commentً: