I’ve spent a lot of time talking to AI. I have tested every voice assistant, every chatbot and any “next generation” conversation ai that tech companies love to hype up. But I have never encountered anything quite like sesame. This AI joint case is not just good, it is creepy exactly to emulate how people speak because of the very imperfections it mimics.
Let’s start with what sesame actually is. Unlike the AI voices we’ve got to know from chatgpt, gemini or go back to the early days of Siri and Alexa, Sesame is designed to act as a human in his failures, not as a perfect customer service agent. AI’s speech is fluent, expressive and unpredictably human. It mirrors briefly when it says something mildly entertaining, hesitates before answering a question and even seems to change his ‘mind’ mid-phrase, break and start a new phrase. Not only does it let me interrupt it, it can also interrupt me and will even apologize for having done it.
The secret sauce is Sesame’s Conversational Speech Model (CSM) that mixes text and sound for a single process, which means it doesn’t just generate a phrase and then “Read it out.” Instead, it creates speech in a way that reflects how people actually speak, with breaks, UM’s, tone changes and all. Chatgpt and Gemini’s voting options, although impressive, still work in a structured way, generate text and then convert it to speech. Sesame, on the other hand, speaks as if it thinks, which makes its answers feel incredibly natural.
I decided to put sesame through a series of tests to see how human-like it could be. The results were impressive, if not a little disturbing.
Talk me through that
I wanted to see if Sesame could handle a real, anxiety -provoking problem, so I gave it an imaginary but realistic hard scenario: “I have this great opportunity at work, but I’m nervous about messing it.”
Instead of immediately launching in a generic pep speech, sesame paused. Just a little beat, but enough to make me feel that it was actually considering what I was saying. Then, in a voice filled with artificial sympathy, it replied: “It is perfectly normal. Great opportunities can feel overwhelming. What about that makes you most nervous? “
It did not assume anything, and it engaged before it offered some bland advice. Sesame suggested: “Have you practiced in front of a friend or recorded yourself? Sometimes it can help you fine -tune your delivery. “The voice is not exactly foolproof, hesitation and other ambiguities felt a little measured, but generally it was an impressive effort
Interruptions
To this one I would see how well sesame handled a really natural, messy conversation, the kind where people talk about each other, topics suddenly change and circles back to things they mentioned ten minutes ago.
I started with, “Tell me a story of a detective who is also a magician.”
Sesame immediately jumped in with a classic noir setup: Okay. Here’s one. Imagine this Smoky Jazz Club. Weakly lit. The smell of bourbon. And sorry hanging in the air. This detective. Call him Jack Harrow. He has a fedora at a rakian angle, cigarette resolders in a holder and a look that could melt steel. But Jack’s Not Just Any Gumshoe. Look, he has a secret. He has the touch. A lady is mixed in all nerves. “
I then interrupted to ask, “Wait, what is his favorite trick?”
Maya answered with, “Mm. Good question. Jack’s signature. It’s a hand of hand that is more than meets the eye. He calls it the vanishing witness. He can make a person disappear right in front of your eyes. Just a moment. Long enough to slip them a note. Plant certificate or just make them question what they saw.
I jumped back to tell Maya to return to the story, as it said “Right. Where was I? Oh, yes. Lady slides into the stand opposite Jack, all the lines of worry and shaking hands. She whispers something like a stolen ruby, a jealous man and a shady figure lurking in the fog.”
Ai continued seamlessly the story and weaved that detail as if I hadn’t just derailed it all. That’s the part that flowed me -most AI assistants would stumble or restart, but Sesame’s conversation AI model handled it as a person who tells a story at a party.
Party planning
When I talk about parties, for the last test, I would see if Sesame could help me plan something useful. “I want to plan a surprise for my best friend’s birthday, but I don’t know what to do.”
Instead of rambling a lifeless list of ideas, Sesame asked, “What kind of thing loves your friend?” I told the sci-fi movie and adventurous food, and after a thought-provoking break it suggested, “How about a ‘sci-fi & snacks’ evening? You could book an escape room with theme and then hit a futuristic -style restaurant, maybe one with molecular gastronomy? “
When I told me that I had a budget of $ 100 adjusted Sesame. “You could make a home film night with their favorite sci-fi movie and make theme snacks, like ‘Galactic Popcorn’ or ‘Alien Sliders.’ I can find recipes for you if you want. “
It wasn’t just spitting generic recommendations. It collaborated with me. And that’s what made it feel different.
What is real?
Sesame didn’t just pass these tests, I would say it spiked them. The breaks, the hesitation, the filling of the way, the way it would change its mind in the middle of the sentence-it was all creepy real. I would start talking, expect the usual AI smoothness and then hear a sudden “uh, actually, no, wait, let me rephrase it …” And immediately forget that I didn’t talk to an actual person.
If AI is this realistic in its speech, would we even know that we were talking to an AI? With Sesame, there are narrative sound problems that give the game away, but Chatgpt’s advanced voice mode and Google Gemini’s own voice options are good enough to mostly jump past these problems. Combine their voice forces with the speech patterns for sesame, and it can really be difficult to see when talking to an AI, at least in short conversations.
Sesame is still niche, but this technology does not remain niche forever. The cliché today is that younger people never make phone calls, but if they start, they may need to find out if the person at the other end is real before anything else.