Google’s latest AI model uses a web browser like you do

Google has unveiled a new AI model called Gemini 2.5 Computer Use, which is designed to navigate and interact with the web via a browser. This model uses "visual understanding and reasoning capabilities" to analyze user requests and carry out tasks, such as filling out and submitting forms. The model can be used for UI testing or navigating interfaces without direct API or connection access. It is part of Google's broader efforts in developing AI agents that can complete complex tasks on their own, similar to OpenAI's ChatGPT Agent and Anthropic's Claude AI model with "computer use" capabilities. Google claims that its computer use model outperforms leading alternatives on multiple web and mobile benchmarks. Unlike other models, Gemini 2.5 Computer Use only has access to a browser, not an entire computer environment. The model currently supports 13 actions, including opening a web browser, typing text, and dragging and dropping elements. Gemini 2.5 Computer Use is available to developers through Google AI Studio and Vertex AI, and a demo is available on Browserbase, where users can watch the model complete tasks like playing a game of 2048 or browsing Hacker News.
Source: For the complete article, please visit the original source link below.