OpenAI has 3 new AI voice models that ChatGPT maker says will ‘unlock a new class of voice apps for developers’

OpenAI has launched three new artificial intelligence (AI) models
They are for real-time voice tasks: reasoning, translation and transcription
Each one is designed to be integrated into developers’ AI apps

If you’re a regular ChatGPT user, you might be aware that you don’t need to interact with the artificial intelligence (AI) chatbot solely through text — it can also speak to you and take your voice requests. Now ChatGPT maker OpenAI has announced three new voice models that it believes will “unlock a new class of voice apps for developers.”

Each AI voice model is designed for a different purpose, including in-depth reasoning, translation and transcription. If you’re looking for a voice model along those lines, they might be worth a try.

According to OpenAI, the new models include the following:

Latest videos from

“GPT-Realtime-2, our first voice model with GPT-5 class reasoning that can handle more difficult requests and move the conversation along naturally.
“GPT-Realtime-Translate, a new live translation model that translates speech from 70+ input languages into 13 output languages while keeping pace with the speaker.
“GPT-Realtime-Whisper, a new streaming speech-to-text that transcribes speech live as the speaker speaks.”

OpenAI’s news post explains that the company has seen developers use AI voice models in three different ways: by asking the AI to perform a task; by having AI explain a situation (such as a travel delay) to the user; and by having conversations in the user’s local language.

It is these use cases that OpenAI is trying to solve with its new voice models. Each is designed for developers to use in their own apps, and all three are available as part of OpenAI’s Realtime API. GPT-Realtime-2 will cost $32 per one million input tokens and $64 per one million output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

A person uses ChatGPT's voice mode on their phone.

(Image credit: OpenAI)

If you’re looking for an AI model capable of deep reasoning and adapting to conversational streams, OpenAI says the new GPT-Realtime-2 option is for you. Developers can use it to check multiple sources at once, adjust its tone depending on user input, use more advanced levels of reasoning, and analyze specialized terms (such as proper nouns and terms used in healthcare and manufacturing).

Translation apps, on the other hand, can set GPT-Realtime-Translate to use real-time conversion speech. Users will be able to speak their own language and have it translated and transcribed without delay. This model works with over 70 input languages and 13 output languages.

And if you want the audio to be transcribed quickly and accurately, there is GPT-Realtime-Whisper. This model is useful for creating captions, meeting notes and summaries while conversations are in progress, OpenAI says, meaning “live products can feel faster, more responsive and more natural.”

If you want to try out any of the new models, they are available on OpenAI’s Playground page. And if you use Codex, OpenAI has created a prompt that directly adds GPT-Realtime-2 to the agentic encoding platform.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

The best laptops for all budgets

Must Read

Leave a Comment Cancel Reply