- OpenAI has officially launched its first AI Agent: Operator
- It works in a web browser to complete tasks for you, and it’s out now as a limited research preview
- The operator can make a dinner reservation, fill out a form and perform other web tasks
OpenAI is always looking for the next big thing to add to ChatGPT, and after months of rumors, including a report earlier this week that teased a launch, the tech giant’s first AI agent is here. Operator is designed to perform web tasks for you, all at the touch of a button.
The Operator is essentially a Computer Using Agent (CUA) that uses the GPT-4o’s visual skills to browse and search the web. This means it can understand the context of what it needs to search for, and thanks to its multi-modality, it understands what it sees as it searches. It is available now as a research preview for ChatGPT Pro subscribers in the US.
Operator is described as “an agent that can use its own browser to perform tasks for you.” OpenAI released a demo that shows Operator surfing the web like we (that is, we humans) do. You can ask the Operator to book a dinner reservation, fill out a cumbersome long form, order groceries from a service, or even book a flight. It can use OpenTable to find and book a reservation at a restaurant, as shown in the demo through its steps.
Look at
Operator is a ‘research sample’, so know that it is in its early days. OpenAI imposes some limitations. We haven’t had the chance to go hands-on yet, but it certainly looks impressive. This is OpenAI’s first foray into the world of AI agents, which is likely to be the theme of the year in artificial intelligence.
OpenAI writes in a blog post announcing Operator that it “is one of our first agent AIs capable of doing work for you independently – you give it a task and it will do it.” This suggests that not only are there other agents in the pipeline – Altman confirmed this during the live demo – but that they’re all based around the idea of doing things for you – a big step in the quest to make AI even more useful, giving us some time left.
Operator is powered by the new Computer Using Agent (CUA) model, which pairs GPT4o’s vision skills with advanced reasoning. All of this comes together to allow the operator to understand and use elements of a browser – the search bar, various buttons and on-screen content.
OpenAI explains that “The operator can ‘see’ (through screenshots) and ‘interact’ (using all the actions a mouse and keyboard allow) with a browser,” allowing it to functionally use a browser to perform a task. That’s pretty neat, especially if it works with a high success rate, and according to the blog post, it can self-correct.
However, as with most new AI tools and skills, it will likely take some time before this becomes truly useful in the real world. It will also require OpenAI to open it up to more people, though as an early study it’s still an impressive demo.
For now, if you’re in the US and subscribe to ChatGPT Pro, you can try it out on OpenAI’s website. OpenAI CEO Sam Altman teased that it would eventually arrive in other countries and be added to the ChatGPT Plus subscription. As we remember from some of the announcements from the 12 Days of OpenAI, Europe will probably take a little longer.