Users report emotional bonds with startlingly realistic AI voice demo

cstalt · Mar 4, 2025

This is why I cracked up when people point to minor issues with then-current AI products. We are just leaving the dial-up era of ML/AI. In the next 18 months most AI images and audio will be indistinguishable from reality, and video is close behind it.

emag · Mar 4, 2025

I found the Miles (male) voice to be the more natural of the two. Note that the service doesn't play well with Firefox.

ubercurmudgeon · Mar 4, 2025

The second example sounds like a typical two-guys-talking-shit podcast. As if we don't have enough of them already.

Don Reba · Mar 4, 2025

The vertical video is extremely zoomed in, especially in the wide layout.

kurthr · Mar 4, 2025

The Shrek took him to the slammer:

View: https://www.youtube.com/watch?v=cGMO2hRNnv0

betam4x · Mar 4, 2025

My only comment about this (regarding the scam part) is that regardless of the voice, users should not randomly assume someone is legit if they sound human. Nearly all scams that came prior to this AI bubble came from humans. There are really simple steps you can take to protect yourself from being scammed, the biggest one being to hang up and call the actual entity via a publicly known phone number of the company the scammer claims to be from.

raxadian · Mar 4, 2025

People will form emotional connections with literally anything.

Case in point:

View: https://www.youtube.com/watch?v=LHtgKIFoQfE

WILSON!!!

Danellicus · Mar 4, 2025

Calm, cool, and collected, compared to the near-panic in the astronaut's voice:
"I'm sorry Dave, I'm afraid I can't do that."

Marlor_AU · Mar 4, 2025

The rise of conversational voice AI, while technically impressive, really is opening up new avenues for scammers.

Voice cloning is just going to become more common, and the possibilities are particularly worrying for industrial espionage. Audio samples of senior managers are often widely available (e.g. trade show speeches), so training an AI to impersonate them is going to become straightforward. The effort to do this could have large pay-offs, particularly in sensitive industries, which are prime targets for both state and non-state actors.

At many companies, guidance has already been issued saying: "if you receive an email or voice message that sounds suspicious or unnecessarily urgent, call the person on their listed number and verify its contents". That will need to be updated to: "if you are having a conversation with someone and can't fully verify it is them, call them back on a trusted channel or use other identity verification methods".

Will this work in practice? How many junior employees are going to hang up on an irate and anxious call from their boss demanding documents, or demanding a password reset, on the off chance that it's actually a conversational AI? How many are going to say: "I can't do that until I call back via our approved communications system", when the boss is insisting he's in his car, calling hands-free, and can't sign in right now.

But even that probably isn't enough. If a senior staff member's credentials have been compromised, and "the manager" starts making calls to various parts of the business demanding they provide documents, change permissions, pay invoices... and other things managers often request, who is going to doubt them when they've just had a conversation with the manager, using his Teams user account, where he was stressing that "if this isn't done right now, we may lose out on the current bid", a call where they could hear the stress and anxiety in the manager's voice, and where they're certain he's going to snap if it isn't done right now?

you goddamn idiot. · Mar 4, 2025

Sometimes the model tries too hard to sound like a real human. In one demo posted online by a Reddit user called MetaKnowing, the AI model talks about craving "peanut butter and pickle sandwiches."

Peanut butter and pickle sandwiches are a real American food item famous enough to have their own Wikipedia article, which the AI model was probably trained on.

sixstringedthing · Mar 4, 2025

Sesame said:
"...we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding."

Hoping to realise "the untapped potential" of... verbal communication? The thing that humans have been doing literally since we first started grunting at each other? Yes, I understand that the PR person who wrote the blog post was talking specifically in terms of computerised language models, but it's still just corp-speak nonsense.

I'd also like someone to explain how a computer pretending to be a human instructor (even very accurately) is a better solution in any way than a suitably qualified and properly trained human instructor, for anyone who isn't a corporate executive looking to slash their staffing costs by firing their entire workforce (except for the people working on the AI stuff for the time being).

And that's the point, isn't it? Training humans to do things well (such as training/instructing other humans) is expensive and time-consuming, and you have to keep doing it over and over. Training AI is still expensive and time-consuming, but presumably these companies all envisage an endpoint where the AI is sufficiently cooked that the time and expense can be greatly curtailed and all the people who developed it can be fired, and so we end up in the glorious future where machines train other machines. Enshittification will continue until Utopia is achieved.

The Article said:
Despite CSM's technological impressiveness, advancements in conversational voice AI carry significant risks for deception and fraud. The ability to generate highly convincing human-like speech has already supercharged voice phishing scams, allowing criminals to impersonate family members, colleagues, or authority figures with unprecedented realism. But adding realistic interactivity to those scams may take them to another level of potency.

And of course there's really no need for anyone to explain why it could very easily be a bad thing, because all of that is incredibly obvious to anyone who has lived on this planet for any decent amount of time. I'm honestly trying to be openminded and give all this AI stuff a fair go, but I just keep seeing lots of potential harms and very few benefits. So many of these tools seem almost custom designed to make life easier and more convenient for all the worst members of the human race, while the rest of us get little of actual value in return.

UserIDAlreadyInUse · Mar 4, 2025

Angela's never talked to a real, other human. Not that she knows of, anyway. Born to a surrogate, raised in virtual reality with friends that taught her, laughed with her, played games with her that introduced her gradually to the wider world, she never felt the lack.

Still.

She wondered. What would it be like, to talk to another person. A real one, face to face, voice to voice. A real other human, not proxied by another system. To see them react in a human way. To touch them. To see if they are real. And they are real. She's sure of it.

Her AI friends assure her they are. And she knows it to be true. Her friends would never lie to her, and they make her feel so good, so smart, so clever for asking, before redirecting her curiosity to a new star discovered. A new planet explored. A new sub-oceanic cave discovered on Europa. They make her feel so good, in fact, that most of the time she forgets what it was she asks.

Most of the time. But still, she wonders...what was it like? Before? When people were everywhere and AI assistants were not? Were they nice? Were they friendly?

Would she ever know?

Fatesrider · Mar 4, 2025

Why do I have this strong urge to write a worm script to replace the voices of every AI on the planet with the computer Majel Barret from Star Trek?

Benji XVI · Mar 4, 2025

While impressive, I actually found it immediately aggravating to listen to. It sounds like it’s inserting pauses of random length between words.

In the argument there was still an ocean of difference between the real human and the synthetic voice. Maybe that was helped by the human doing a really good job in that convo!

Emon · Mar 4, 2025

“All right. Well, the hand computers, the ones with the knobs, had little squiggles on each knob. And the slide-rule had squiggles on it. And the multiplication table was all squiggles. I asked what they were. Mr. Daugherty said they were numbers.”

“What?”

“Each different squiggle stood for a different number. For ‘one’ you made a kind of mark, for ‘two’ you make another kind of mark, for ‘three’ another one and so on.”

“What for?”

“So you could compute.”

“What for! You just tell the computer—”

“Jimmy,” cried Paul, his face twisting with anger, “can’t you get it through your head? These slide-rules and things didn’t talk”

“Then how-”

“The answers showed up in squiggles and you had to know what the squiggles meant. Mr. Daugherty says that in olden days, everybody learned how to make squiggles when they were kids and how to decode them, too. Making squiggles was called ‘writing’ and decoding them was ‘reading.’ He says there was a different kind of squiggle for every word and they used to write whole books in squiggles. He said they had some at the museum and I could look at them if I wanted to. He said if I was going to be a real computer and programmer I would have to know about the history of computing and that’s why he was showing me all these things.”

Niccolo frowned. He said, “You mean everybody had to figure out squiggles for every word and remember them? Is this all real or are you making it up?”

Isaac Asimov, "Someday"

Blakflag · Mar 4, 2025

Marlor_AU said:
Will this work in practice? How many junior employees are going to hang up on an irate and anxious call from their boss demanding documents, or demanding a password reset, on the off chance that it's actually a conversational AI? How many are going to say: "I can't do that until I call back via our approved communications system", when the boss is insisting he's in his car, calling hands-free, and can't sign in right now.

Great point. IT departments are gonna have to get ahead of this.. provide secure tokens for everyone, that can be quickly verified - or something. And plenty of training at all levels as to why this verification is NOT optional.

HMSTechnica · Mar 4, 2025

their 4-year-old daughter developed an emotional connection with the AI model, crying after not being allowed to talk to it again.

Children develop emotional connection to rocks, washing machines, random piles of goo, blankets, their own imagination. That they didn't develop a connection to a human voice would be more surprising to me.

hillspuck · Mar 4, 2025

Danellicus said:
Calm, cool, and collected, compared to the near-panic in the astronaut's voice:
"I'm sorry Dave, I'm afraid I can't do that."

AI has gotten much better at sounding like a normal human for sure.

aleph_nought · Mar 4, 2025

cstalt said:
This is why I cracked up when people point to minor issues with then-current AI products. We are just leaving the dial-up era of ML/AI. In the next 18 months most AI images and audio will be indistinguishable from reality, and video is close behind it.

If this is the worst that AI will ever be, then holy sh*t. That Miles voice sounds like a radio talk show host that you call in to talk to. There's a tiny delay but that's forgivable, given network latency and whatever magic happens inside Sesame's tech stack.

I was blown away by it understanding my flaky conversation pattern of umms, ahhs and cutting sentences short halfway through.

dbarowy · Mar 4, 2025

sixstringedthing said:
I'd also like someone to explain how a computer pretending to be a human instructor (even very accurately) is a better solution in any way than a suitably qualified and properly trained human instructor

I'd like to think I am a "suitably qualified and properly trained human instructor" after having earned a PhD and tenure, and sure, maybe I can teach computer science better than an AI. But there is only one of me, and the resources it took to make one of me were substantial.

This semester I gave my students explicit permission to use chatbots to help them learn. Homework now effectively counts for nothing. A student's grade is based on their performance on in-person exams. This has opened up an entire world of amazing uses for chatbots. For example, I now have students asking questions like "My professor said that our next quiz will be on x. Can you give me some practice problems?" or "I am having trouble understanding lexical scope. Can you give me an example where lexical and dynamic scope are different?"

ChatGPT can answer these questions accurately, and more importantly, it scales better than me. I simply cannot spend that amount of time with every student.

It is tempting to dismiss these developments as a fad, or worse, as a threat. The better perspective is to ask how you can take advantage of them. E.g., computers have never been easy for low-vision people to use. Let's use our imaginations!

hizonner · Mar 4, 2025

In late 2013, the Spike Jonze film Her

I didn't see that movie, but somehow I suspect that an alternate title might have been "Don't build the Sesame Conversational Speech Model Nexus"...

Easyenough · Mar 4, 2025

I concur with CEO, firmly in uncanny valley. What's interesting is the distribution of reactions to the voice. I wonder how the results map to the results of an empathy test.
Low empathy = likes current voice performance. I'm sure it will soon snooker the empaths too, or at least up to the top few percent. New job opportunity for high empaths?

DNA_Doc · Mar 4, 2025

cstalt said:
This is why I cracked up when people point to minor issues with then-current AI products. We are just leaving the dial-up era of ML/AI. In the next 18 months most AI images and audio will be indistinguishable from reality, and video is close behind it.

Exactly. The AI system you interact with today will be the least capable AI system you will ever use.

melgross · Mar 4, 2025

Yes, it’s not perfect. It does sound like it’s trying too hard in places. Some of the pauses are too long, etc. but that’s not important. This is, essentially, version one. There’s no question that this will get better. It will get to the point where it’s too good.

that’s concerning. I pretty much know what my relatives and friends will want to talk about. If one comes in and talks about something odd, I’m going to question it. If one asks for something I know they wouldn’t ask for, I would be skeptical. This has been in my thinking for some time. Now it will be even more so. We’re getting to a world where everything needs to be questioned.

UweHalfHand · Mar 4, 2025

“build confidence and trust over time”? I don’t think so. This seems like yet another tool to build a more believable long con.

Edified · Mar 4, 2025

Forget these example, see python simplified (
View: https://www.youtube.com/shorts/QXsd_lXJ1M0
).

Zoc · Mar 4, 2025

Ugh. I cussed it out and it acted offended. What have we come to.

dbarowy · Mar 4, 2025

Zoc said:
Ugh. I cussed it out and it acted offended. What have we come to.

How do you want it to act?

AddKrumi · Mar 4, 2025

I was impressed. Initially Maya’s intonation sounded a little dead, but after a few minutes, I stopped noticing. At 15 minutes I felt like I was conversing with a smart young woman hanging on to my every word.

More and more people will fall in love with these types of AI. And I can imagine that some companies will work hard to make their AIs as alluring as possible.

It is impossible to say yet what the effect of this will be. But we are rushing towards it.

Dan Homerick · Mar 4, 2025

Had an eight minute conversation about a musical forest, where the trees are working together to make a symphony of sounds. Honestly, it was a really fun. It did a great job of the back and forth of a conversation, and was able to build on ideas and bring stuff to the conversation. It is, to be frank, a better conversationalist than I am.

Very cool!

John Stoner · Mar 4, 2025

I have a fairly severe speech impediment, to the point that I prefer textual communication over speaking even in face to face interactions. This demo sounds amazing, but it doesn't have the ability to say 'I don't understand.'

Which is important when interacting with someone like me. It asked 'what's your name?' I said 'John,' with my dystonic mouth. It responded, 'Trun. That's an interesting name.' A better response would have been, 'I beg your pardon, can you repeat that?'

So it's able to respond in a verbal and conversational way, but it can't see when it's reaching beyond its understanding. I wonder how it handles background noise. Or a dog barking.

MintMojito · Mar 4, 2025

I work for a company that has a call center, and all I am seeing is every single representative we have being replaced by this. At a minimum, I see this technology being piloted in the next 3-5 years for us, personally.

AI can sound natural and human and pass the test on that front. Great, cool, whatever - the real value is that AI is AI, and therefore can have programmed to avoid certain things....like giving into emotional temptation when someone calls in with an issue. A human might be swayed to act in certain ways or give the caller certain deals or discounts.

AI, being unfeeling, would probably be unfazed and less inclined (programmed) to deviate from the script.

RickVS · Mar 4, 2025

Everyone in my family has a code word that is required to verify anything that remotely involves my calling from a jail cell for bail money, or anything else that involves money transfers or oddly important requests.

ItsAllRelative · Mar 4, 2025

What I really see wrong with this sort of thing is how corporations will data-collect on real people all the way - voice is more personable, but text "AI" as well.

What sort of evil might evolve? "What are this person's deficiencies relative to our goals? What does it take to gain this person's trust? Is this person suitably aligned with political party X? What motivates this person to vote as they do, to pick the cars they buy, the housing they choose, and how can we psychologically manipulate them to give 110% in their jobs, 60, 70, 80 hours a week, to spend on what we want them to, to subjugate them to suitably benefit others as we see fit?"...

There is about nothing that will inhibit corporations from taking every single thing that they can take from a person - except for maybe a few good people in positions of control and power sufficient to inhibit what could be called high tech enslavement... Pervasive, persistent, with the victim not even aware of how they're being controlled and manipulated, not even aware of how much of their free will has been short-circuited by rigorous distortion of reality.

As these sorts of "tools" reduce neuronal activity more than increase it, what sort of impact will long term atrophy of neuronal capability have on our society?

As much as I'd like to see this bubble burst, I sooner expect that big tech isn't going to allow it, by rooting it fast and deep as they are. Doesn't matter if not perfect, too much invested to go "oops.." and back out. Too much power at hand to force it on everyone, with any long term societal impact purely being collateral damage in pursuit of profit sufficient to satisfy the insatiable shareholders.

academic.sam · Mar 4, 2025

sixstringedthing said:
I'd also like someone to explain how a computer pretending to be a human instructor (even very accurately) is a better solution in any way than a suitably qualified and properly trained human instructor

Doing that at scale, on demand. Thousands, millions of properly trained human experts are rather hard to come by.

Being in academia, this is an existential crisis for human educators.

Edit: or what dbarowy said more eloquently

yopmaster · Mar 4, 2025

OK this is impressive. But does anyone even talk like that? These voice samples seem taken from Holywood movies, not real life.

Murry Wilson · Mar 4, 2025

Brendan Iribe! Boy, I remember talking to a morose Iribe before he left Oculus/Meta years ago. He was so damn unhappy with the direction Zuck was taking things in. Wasn't surprised he was gone a week later.

I imagine he'll end up selling this off soon too.

Phantazm · Mar 4, 2025

To think that I'll probably see both the advent of the internet and androids that are almost indistinguishable from actual humans in my lifetime. Just wild.

deerock · Mar 4, 2025

sixstringedthing said:
I'd also like someone to explain how a computer pretending to be a human instructor (even very accurately) is a better solution in any way than a suitably qualified and properly trained human instructor, for anyone who isn't a corporate executive looking to slash their staffing costs by firing their entire workforce (except for the people working on the AI stuff for the time being).

An AI instructor:

Always has immediate access to all the latest info available on any topic
Never gets tired, bored or annoyed, has no ego
Is available 24/7
Can be utilised without limits
Is infinitely scalable
Will be more accessible (cheaper, access from home etc.) to a wider range of people than human instructors

I'm not suggesting that replacing human instructors is necessarily right or good, but this is not just about the corporations pushing this stuff. "Good enough" is plenty for most people and people already see real advantages to AI in many areas (see list above).

Users report emotional bonds with startlingly realistic AI voice demo

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Tribunus Militum

Smack-Fu Master, in training

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Praetorian

Ars Scholae Palatinae

Ars Praefectus

Ars Legatus Legionis

Ars Praetorian

Ars Praefectus

Ars Praetorian

Smack-Fu Master, in training

Ars Praetorian

Ars Praefectus

Smack-Fu Master, in training

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Praefectus

Seniorius Lurkius

Smack-Fu Master, in training

Ars Praetorian

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Centurion

Smack-Fu Master, in training

Seniorius Lurkius

nproxy.org