- Hugging Face has debuted an AI tool to navigate the web on your behalf
- The Open Computer Agent uses a real web browser to perform tasks such as getting directions or reserving tickets
- The agent and its open source demo can see what’s on screen, click buttons, fill in forms and move step by step through tasks like a human
Hugging Face has introduced its own intake of the growing number of semi-independent AI agents that can run online errands for people. The new and free (if limited) open computer agent is like having a personal assistant who lives in your web browser.
Part of the company’s ongoing “Smolagents” initiative, the open computer agent can engage with websites and apps you would, and handle an invisible mouse and keyboard to implement requests. AI can open a browser, write things in forms, click buttons and more. Ask it to find instructions and it goes to Google Maps, enter the origin and destination and show you the route as a dutiful digital driver.
You can try it yourself with the live demo. Fair warning, its popularity causes some delays and errors due to an backlog.
We launch computer use in Smolagents! 🥳-> As vision models become more skilled, they will be able to operate complex agent workflows. Especially Qwen-VL models that support built-in grounding, ie.6 May 2025
Agent AI
The open computer agent is another philosophy of an idea that has led to similar tools such as Openais Operator, Browser, Proxy 1.0 and Opera’s browser operator. Like these tools, Hugging Face’s AI agent is about being an active participant instead of a passive source of information.
Like browser use, Open Computer is Agent Open Source, which means anyone can see how it works and builds on top of it, or at least fine-tune it to niche use cases. The agent is the start of something more flexible, not a finished product with a million legal disclaimers. It also means that the demo is just what a demonstration, not a polished package. It can get things wrong and require you to jump in for login and captcha testing.
Booking tickets, checking shop times, searching searches, looking at directions and clicking through menus are all things that many people would like to be able to do with a single natural language prompt. It’s one thing to ask chatgpt how to find cheap flights. It’s another to see a tool Go to a travel site, roll through lists and try to click “Book Now.”
It can be deficient and far from flashy but open computer agent represents an approach to AI that can become as common as the now ubiquitous AI image generators.