AI voices usually aim to be realistic in a friendly way and mimic relaxed, happy, helpful people. But a new open source model called Dia leans into the more emotional spectrum of voices, including some really intense screams.
Dias creators on Nari Labs are a small group, but have given AI voices the opportunity to sound like a somewhat melodramatic artist who is capable of making realistic laughter, coughing, neck clearing, sniffing and yes, shouting.
You may not think shouting is a big thing for AI at this point, but screams are hard to decay. It can’t just speak out loud; It’s a completely different speech state.
Emotionally expressive speech is a hole in most AI voices. Reading a bedtime story is easy for a voice model. However, it is much harder for it to sound like it’s trying to reassure a friend, or as if it just saw something shocking. Most commercial models avoid sounding robot by leveling the tone of voice that does not leave room for that kind of sound asymmetry from speaking emotionally.
DIA does not treat -verbal communication as part of the performance. It knows that “(cough)” is not something to be ignored or read literally. It knows that a scream is not just a higher line. And it performs these things with a level of timing, pitch modulation and respiratory control that make them feel more real.
An enterprising user even used it to recreate a bit of the famous Leroy Jenkins sketch that was performed on World of Warcraft.
It is not to say that Openai, Eleven Labs, Google, Sesame and others have not produced amazing AI voice models. You can customize Openai’s advanced voice state to talk to different emotions, and Ellevelabs is good at interpreting capitalization and punctuation to adjust speech, but it is not the same as smoking surprising or wheezing of laughter.
Sesame is especially good at sounding and responding as a real person, but even its models fail against cheerful and generally positive behaviors.
Of course, realism is subjective and you may be able to work pretty quickly that DIA is an AI voice. Then false screams and laughing are also beautiful human sounds to do in the right context.
Two undergrades. One still in the military. Zero financing. A ridiculous target: Build a TTS model that competes with NoteBooklm -Podcast, Elevenlabs Studio and Sesame Csm.Nog … We pulled it off. Here is how 👇 pic.twitter.com/8cfjsegcixApril 21, 2025
Scream for ai
What makes this a bigger story than just “Ai Voice Teaches a Party Trick” is what it signalizes to the wider race in AI for emotional intelligence.
We quickly enter an era where it will not be enough for your assistant to say the right thing; It will be necessary to say that in the right way. Think customer support bots that sound really sad, teachers who sound encouraging instead of instructional, and characters in the game that convey sincerity.
Of course, giving AI the power to emit convincing make it more convincing and thus potentially more manipulative. If emotional speech can only be another AI tool, more than a few people may want to scream themselves.
Still, I can imagine something funny to write a ghost story to DIA to not only read but perform, scream and all.