SHANKAR VEDANTAM, HOST:
This is HIDDEN BRAIN. I'm Shankar Vedantam. At some point in our lives, many of us realize that the way we hear our own voice isn't the way others hear our voice. Shay (ph) had that realization as a child helping out at the family business, a deli in south West Virginia.
SHAY: We did a lot of business with the deli, a lot of call-in orders, and I had to answer the phone with the name of the business.
VEDANTAM: Often when the phone rang, the same thing happened over and over again.
SHAY: I vividly remember always being confused for my mother. They would always say, hey Judy (ph), you know, and either start into their questions or whatever they were looking to speak to my mother about.
VEDANTAM: This case of mistaken identity became a running joke.
SHAY: You know, it was ha-ha, they thought that he was Judy.
VEDANTAM: Shay didn't correct the callers. That's because Shay didn't mind being mistaken for Judy.
SHAY: It was just comforting to me because it felt natural.
(SOUNDBITE OF MUSIC)
VEDANTAM: Shay was raised as a boy. But now, decades later, Shay identifies as a transgender woman. We're not using her legal name at her request because it's a man's name. Shay's experience at the deli became a template for the rest of her life. She listened to her voice, and she listened to the way others heard her voice. There was always a gap between the two. She first tried to sound more masculine to fit in with the way the world saw her.
SHAY: So I would consciously make an effort to try to talk a little deeper. It was - you know, I practiced it.
(SOUNDBITE OF SONG, "ROADHOUSE BLUES")
THE DOORS: (Singing) And I woke up in morning, I got myself a beer.
VEDANTAM: One way she practiced was to sing along with The Doors, Tool and Nine Inch Nails.
SHAY: I tended to sort of go towards heavier music. You know, raspy, deep - yelling, almost - voices.
(SOUNDBITE OF SONG, "ROADHOUSE BLUES")
THE DOORS: (Singing) Let it roll all night long.
VEDANTAM: Sounding more masculine became second nature. But it wore on her.
SHAY: My entire life, I have been playing the role of a boy. And it is exhausting. It truly is.
VEDANTAM: Years and years passed like this. A divorce, a second wife, two kids and a cancer scare later, Shay began to reconsider how she wanted to sound. Instead of trying to sound more masculine, she now started to try to sound more feminine.
SHAY: Even prior to accepting that I was trans before I could put a label on what I was, I consciously made an effort to not sound as masculine, and that started in my early 30s.
VEDANTAM: Once again, she used music as a way to practice.
(SOUNDBITE OF BRITNEY SPEARS SONG, "OOPS!...I DID IT AGAIN")
SHAY: I would always sing in the car alone. And I would attempt Britney Spears.
(SOUNDBITE OF SONG, "OOPS!...I DID IT AGAIN")
BRITNEY SPEARS: (Singing) I think I did it again. I made you believe we're more than just friends. Baby.
SHAY: It seems silly, but I would, when I'm singing in the car, I would sort of turn my head towards my driver's side window...
(SOUNDBITE OF SONG, "OOPS!...I DID IT AGAIN")
SPEARS: (Singing) 'Cause to lose all my senses.
SHAY: ...Because it would reflect the sound back to me a little more loudly so that I could hear the pitch and tone of my voice.
(SOUNDBITE OF SONG, "OOPS!...I DID IT AGAIN")
SPEARS: (Singing) Oops. I did it again. I played with your heart.
SHAY: And I would try to make my voice sound at a higher pitch without it sounding like I was trying.
(SOUNDBITE OF SONG, "OOPS!...I DID IT AGAIN")
SPEARS: (Singing) Love, and I'm sent from above. I'm not that innocent.
VEDANTAM: What did you hear? Did you hear the voice that you wanted to hear?
SHAY: No. No. I've never - I don't know that I have ever actually been able to hear the same voice that I hear come from me.
VEDANTAM: Shay has spent a lifetime being dissatisfied with the way she sounds. She viscerally knows something the rest of us often forget - our voices shape who we are. They shape how other people think of us.
(SOUNDBITE OF CROSSTALK)
VEDANTAM: This week on HIDDEN BRAIN, we look at the relationship between our voices and our identities.
RUPAL PATEL: Voice is about who you are. Our voice signals things about our personality.
VEDANTAM: Plus how technology might help people with vocal impairments find voices that reflect who they are...
(SOUNDBITE OF ARCHIVED RECORDING)
LONNIE BLANCHARD: Once you close your eyes and let your mind relax, it doesn't take long to escape to the beautiful beach.
VEDANTAM: ...And the ethical quandaries that arise when we can create personalized, customized voices.
(SOUNDBITE OF ARCHIVED RECORDING)
COMPUTER-GENERATED VOICE #1: (As Donald Trump) This is huge. They can make us say anything now - really, anything.
(SOUNDBITE OF MUSIC)
VEDANTAM: Jackie Kirk (ph) used to love the sound of her voice. She spent her 20s in San Francisco. And like many young people living in the big city, she enjoyed going out with her friends. She danced to electronic music at clubs and drank at bars. She was outgoing, and she was a flirt.
JACKIE KIRK: I have to admit I've always enjoyed flirting. (Laughter). And so I was quite the flirter. (Laughter). It was sort of a fun activity to help pass the day - you know, doing something mundane, you know, at work.
VEDANTAM: For Jackie, flirting was also a demonstration of her confidence.
KIRK: It reinforces my own identity, how I felt about myself - fun, I'm somebody that people are attracted to, not just physically or sexually, but a person who people like.
(SOUNDBITE OF MUSIC)
VEDANTAM: Jackie liked to be liked. She liked being someone people wanted to be around. When she thinks about who she was back then she refers to that person as Voice One Jackie.
KIRK: Voice One Jackie (laughter) was really fun-loving, always joking, pretty carefree.
VEDANTAM: For years, Jackie had been doing backpacking trips with her then-boyfriend.
KIRK: All of our gear on our back. Fifty pounds, 60 pounds, et cetera. Even carrying bottles of wine, (laughter), you know, in the backpack.
VEDANTAM: She'd been doing these backpacking trips for several years. But during a trip to a national forest in California, Jackie came up short.
KIRK: I couldn't go one step further. I - you know, and being young, you think, you know, what could it be? There's nothing wrong with me. I'm 20, whatever, 20-something years old. And I notice I'm getting really short of breath, and it was a real struggle. Yeah. I finally made it, of course, but it was really slow and a real struggle to make it back.
VEDANTAM: Hiking became too taxing for her so she cut back and switched to ballet. But one day during a series of releves, a more aerobically challenging dance move, Jackie felt lightheaded and dizzy. It was serious.
KIRK: I had a seizure. And, you know, all of the medical follow-up led me to discover that I had a lung disease.
(SOUNDBITE OF MUSIC)
VEDANTAM: She was diagnosed with idiopathic pulmonary hypertension. It's a rare, progressive disease where the blood vessels in the lungs shrink and oxygen is not distributed properly.
KIRK: There's no cure for it except for a lung transplant.
VEDANTAM: In 2008, at 32, Jackie received a double lung transplant. The surgery was successful. Within weeks, she was out of the hospital and back in the dance studio. In 2010, she left San Francisco to explore Latin America and Europe, but her body began to reject her new lungs. Before long, Jackie was back in the hospital, this time in Switzerland, waiting for another lung transplant.
KIRK: I had the surgery in January of 2013, and I was asleep still for another month and a half.
VEDANTAM: When she woke up, she found her surgery had been successful, except for one very important thing.
KIRK: I couldn't speak. I couldn't speak.
VEDANTAM: During the operation, the medical staff had used a ventilator to help her breathe.
KIRK: So I was intubated. That means they basically have a tube that they put inside your mouth, and it goes down your throat. And they send that down into your lungs. And during this intubation, the tube was rigid enough to cause some damage to my vocal cords.
VEDANTAM: Jackie began speech therapy, and within a few weeks she slowly regained the ability to speak. But the voice that came out of her mouth, it wasn't her voice.
KIRK: My voice changed. Its raspier. It's broken a bit.
VEDANTAM: Ever since, speaking has been hard work.
KIRK: Yeah. I really have to push my - I feel it. It's actually a physical effort. Like, I'm actually squeezing the vocal cords as hard as I can to make the loudest sound possible to get to be heard. (Laughter). And it's very tiring.
VEDANTAM: The harder it became to produce sound, the more self-conscious she became about the way she sounded.
KIRK: I feel less confident. I'm aware of how people might perceive me. So I'm a little more shy. I don't approach people like I used to.
VEDANTAM: Jackie believes the change in her voice has led to a dramatic change in her personality. For much of her life, her voice was a manifestation of her confidence.
KIRK: I used to go to clubs quite a bit, you know? But, you know, when you have a normal voice, you can still talk to people in those environments where it's kind of loud and noisy, or bars to meet friends or to flirt, (laughter), like I like doing. But, you know, those places now, you know, I don't really go to anymore.
VEDANTAM: Jackie, a woman who once described herself as carefree and outgoing who took pride in her ability to flirt, became withdrawn, reserved.
KIRK: You know, I have tons of scars all over my body, and that plays on my confidence as well. But in public life, people can't see those scars. And I feel like my voice is that, you know, that scar they can hear, you know? They know something's wrong and that they know, oh, maybe she's weak, maybe she's sick, just by hearing my voice. It's this signal.
VEDANTAM: Our voices communicate so much more than mere information. They communicate our feelings, our temperament, our identity. When we come back - how scientists are weaving this insight into custom-built voices.
MAEVE FLACK: I can't wait for my friends to hear my new voice.
(SOUNDBITE OF MUSIC)
VEDANTAM: Scientists have been trying for more than two centuries to analyze the human voice, decode its components and recreate it. An early success came from a man named Homer Dudley. He developed an organ-like machine that he called the voder. It worked using special keys and a foot pedal and was capable of creating about 20 different electronic buzzes and sounds. When those sounds were combined, they formed words. The voder fascinated people at the 1939 World's Fair in New York.
(SOUNDBITE OF ARCHIVED RECORDING)
UNIDENTIFIED PERSON #1: Well, we've heard the voder make a word. And by combining words, of course, we got a sentence. For example, Helen, will you have the voder say, she saw me?
VODER-GENERATED VOICE: She saw me.
UNIDENTIFIED PERSON #1: That sounded awfully flat. How about a little expression? Say the sentence in answer to these questions. Who saw you?
VODER-GENERATED VOICE: She saw me.
UNIDENTIFIED PERSON #1: Whom did she see?
VODER-GENERATED VOICE: She saw me.
UNIDENTIFIED PERSON #1: Well, did she see you or hear you?
VODER-GENERATED VOICE: She saw me.
VEDANTAM: The voder was an early example of electronic speech, but it was cumbersome to operate and required special training. Over the next 40 years, speech scientists continued studying the components of the human voice. They eventually developed methods to mathematically map the acoustic patterns and phonetic properties of natural speech - vowels, syllable constructions, consonants.
(SOUNDBITE OF MONTAGE)
COMPUTER-GENERATED VOICE #2: Welcome to the Stockholm speech communication seminar.
COMPUTER-GENERATED VOICE #3: Hello, I (unintelligible) machine. Welcome to Mid-Manhattan Library.
COMPUTER-GENERATED VOICE #4: (Singing) A, B, C, D, E, F, G...
COMPUTER-GENERATED VOICE #5: To be or not to be, that is the question.
COMPUTER-GENERATED VOICE #6: I can read stories and speak them aloud. I do not understand what the words mean when I read them.
COMPUTER-GENERATED VOICE #7: This is such a beautiful (unintelligible). You are listening to the voice of a machine.
COMPUTER-GENERATED VOICE #4: (Singing) When we know our ABC's.
VEDANTAM: By the '80s, speech synthesis was no longer the stuff of science demonstrations at shows and fairs.
COMPUTER-GENERATED VOICE #8: Text-to-speech systems are beginning to be applied in many ways, including aids for the handicapped, medical aids and teaching devices. The first kind of aid to be considered as a talking aid for the vocally handicapped.
VEDANTAM: The research of Dennis Klatt at MIT paved the way for the voices we might be familiar with today, many of them used in assistive communication devices.
BETTY: I am beautiful Betty, the standard female voice.
HARRY: I am huge Harry, a very large person with a deep voice.
PAUL: I am the standard male voice, perfect Paul.
VEDANTAM: This last one, by the way, became famous after Stephen Hawking adopted it. Speech technology has come a long way in the years since Homer Dudley unveiled the voder, but in many ways, synthetic voices still sounded synthetic. They didn't convey all the information that's packed into the human voice.
PATEL: Voice is identity, right? Voice is about who you are. Our voice signals how old we are. Our voice signals our gender. Our voice signals, you know, things about our personality.
VEDANTAM: Rupal Patel is a speech scientist at Northeastern University. Perhaps more than many people, she has thought a lot about the human voice. When she misses her mother, for instance, Rupal has a special technique to evoke her presence.
PATEL: That's right. My parents now live in LA and I live here in Boston. And oftentimes, I find myself imitating my mom. You know, I'll say, oh, beti (ph) how are you today, you know, or something like that. I'll imitate her the way she might say something. I might say that the same way to my daughter or something like that. But I - what I'm evoking is my mother's voice, primarily, to feel the closeness of her here.
VEDANTAM: In 2002, Rupal took these ideas with her to Denmark where she was scheduled to speak at a conference for researchers and patients.
PATEL: I was presenting some of my early work showing that individuals with very severe speech disorder still have the ability to make sound, and those sounds have some communicative content in them, some information that could be used.
VEDANTAM: After her presentation, she walked out to the exhibit hall, and that's where she noticed something. Lots of people were using devices that produced synthetic voices. What was odd was that many of the voices didn't seem to match the people using them.
PATEL: And at that point back in 2002, we had very limited synthetic voice of options available. And so, you - I heard a little girl, or a young girl, using a device to talk with an adult male voice and and having a conversation with another person, a middle-aged man, who also was using the same voice. And so they're using different devices, but their voices were identical.
VEDANTAM: She had just presented on the idea that our individual voices carry something unique about us. So why was this not reflected in these synthetic voices?
PATEL: Why are we giving them the same black box to speak through? There's got to be something that we can do that we can harness the quality of the voices that they have and imprint those or use that to give them a prosthetic voice that somehow reflects who they are and not just the same voice for everyone.
VEDANTAM: Could a synthetic voice capture the richness of natural human speech? Rupal launched a company to answer this question. It's called VocaliD, and it uses machine learning and other artificial intelligence technologies to create personalized voices.
PATEL: So what synthetic speech is is taking recordings of anyone and then taking those recordings and building a model of the voice quality of the annunciation abilities, right? You aren't necessarily analyzing it from a top down, saying, well, this person has a high-pitched voice or this person has a low-pitched voice. You're taking the recordings as basically the raw ingredients to feed to a machine-learning algorithm or set of algorithms, really. And those are - they're learning the patterns of the clarity of the person's S, the - you know, how that sound is changed in the different phonetic environments, the voice qualities aspects of - you know, all of these are learned by the machine. It's really re-emulating the human voice by a machine.
(SOUNDBITE OF MUSIC)
VEDANTAM: In other words, the idea is to build a model of how a person sounds. To do so, you use a vast range of examples of that person's speech. Then you use the model to produce spoken language that incorporates all the idiosyncrasies and texture of that person's voice. One of Rupal's early clients was a young girl, Maeve Flack.
PATEL: Maeve was born with cerebral palsy. She's in a beautiful family where she has two other older sisters.
VEDANTAM: Rupal's goal was to give Maeve her own unique synthetic voice, one that could express not just her words but her identity. The first step was to record her.
MAEVE: (Unintelligible) (Laughter).
PATEL: So those sounds were the kinds of sounds that are unique to Maeve. Those are the sounds that she makes where if - you know, when she's in a classroom with several other kids who also have communication disabilities, when she makes that sound, you know it's Maeve speaking. So we harnessed those sounds of Maeve's to create her unique voice for her.
VEDANTAM: Then Rupal turned to Maeve's older sisters, Erin and Meghan, who volunteered to record their voices so they could be blended with Maeve's.
UNIDENTIFIED PERSON #2: Ice cream is my guilty pleasure.
UNIDENTIFIED PERSON #3: That man ran fast.
VEDANTAM: Erin and Meghan read hundreds of sentences and phrases and uploaded them to a website for Rupal. Like a painter mixing a palette, Rupal took elements of Maeve's voice and mixed them with those of her sisters and other vocal donors to create what she calls a bespoke voice.
MAEVE: I can't wait for my friends to hear my new voice. My parents are really happy I'm not addicted to Fortnite. I want to meet Taylor Swift.
PATEL: So we're hearing, you know, Maeve at this age in terms of her sound as well as her siblings' recordings being combined and being produced through this speech synthesis engine.
VEDANTAM: It's possible that Maeve may decide as she gets older that her voice needs to age with her. She'll need a new bespoke voice at that point. The same technology can also be used to preserve a person's existing voice. Sometimes this is done when a person faces the prospect of losing his voice.
PATEL: These could be individuals who are losing their voice to degenerative conditions - so slowly their voice is changing - such as ALS or Parkinson's disease and then those who the trauma is actually far more pronounced for individuals with something like head and neck cancer where they learn that they're going to have their voice box removed within a couple of weeks.
VEDANTAM: Lonnie Blanchard confronted this traumatic news in 2018. Doctors had diagnosed him with cancer and said surgery was the only option. Lonnie had to have his tongue removed. Here he is speaking to the BBC.
(SOUNDBITE OF ARCHIVED RECORDING)
BLANCHARD: Now that I know I'm going to lose my voice, I got to get some things down on a personal recorder to get what I would normally say to my wife and kids. But every time I go to do that, I draw a blank.
VEDANTAM: By the time Lonnie started working with Rupal, he only had a few weeks to back his voice.
PATEL: We helped him get set up in terms of the microphone he would need, and things like that.
VEDANTAM: Rupal walked with Lonnie to build a database of sound samples before his surgery. He recorded sentences that gave Rupal and her colleagues the different kinds of sounds they would need to build a new voice.
(SOUNDBITE OF ARCHIVED RECORDING)
BLANCHARD: I wish we could get acquainted. I'm going to be a teacher when I grow up.
VEDANTAM: After Lonnie banked his voice, Rupal use the recordings to create a personalized voice for him. The difference between his voice and Maeve's voice is that Rupal didn't need to blend voices from donors. Lonnie was his own donor.
PATEL: Those voice samples then are used, are cleaned up and annotated by machine, actually, and then used to feed into the algorithms we have to create the synthetic voice.
VEDANTAM: Similar to Maeve, Lonnie uses an assistive device - in his case, an iPad. He can type out what he wants to say and hear his voice speaking to his family.
(SOUNDBITE OF ARCHIVED RECORDING)
BLANCHARD: Once you close your eyes and let your mind relax, it doesn't take long to escape to the beautiful beach.
PATEL: It's really empowering. It's continued to be a way that he can connect to family members and feel that part of him is not fully lost.
(SOUNDBITE OF MUSIC)
VEDANTAM: While most of us will never have the experience of losing our voices and having to obtain synthetic voices as replacements, increasingly, many of us are coming into contact with these voices.
UNIDENTIFIED PERSON #4: Hey, Siri.
UNIDENTIFIED PERSON #5: Hey, Google.
UNIDENTIFIED PERSON #6: Alexa?
UNIDENTIFIED PERSON #4: How many ounces are in a cup?
ALEXA: One cup is 8 eight fluid ounces.
UNIDENTIFIED PERSON #6: OK, Google.
UNIDENTIFIED PERSON #4: Hey, Siri. Set a timer.
SIRI: For how long?
UNIDENTIFIED PERSON #5: Fifty-six minutes.
SIRI: OK.
ALEXA: Sure.
SIRI: Fifty-six minutes, starting now.
UNIDENTIFIED PERSON #6: Alexa?
UNIDENTIFIED PERSON #5: Hey, Google?
UNIDENTIFIED PERSON #6: Can you play music?
UNIDENTIFIED PERSON #5: Play some jazz.
SIRI: Here's a station you might like.
(SOUNDBITE OF MUSIC)
VEDANTAM: Synthetic voices are already changing our lives, and it's likely we're going to become even more reliant on them. In May 2018, Google revealed a new program it was working on. CEO Sundar Pichai presented it to an audience of software developers. The technology is called Google Duplex. It allows you to make a restaurant reservation through a voice assistant.
(SOUNDBITE OF ARCHIVED RECORDING)
UNIDENTIFIED RESTAURANT HOST: How may I help you?
GOOGLE DUPLEX: Hi. I'd like to reserve a table for Wednesday the 7.
UNIDENTIFIED RESTAURANT HOST: For seven people?
GOOGLE DUPLEX: It's for four people.
UNIDENTIFIED RESTAURANT HOST: Four people? When? Today? Tonight?
GOOGLE DUPLEX: Wednesday at 6 p.m.
UNIDENTIFIED RESTAURANT HOST: Actually, we reserve for upwards five people. For four people, you can come.
GOOGLE DUPLEX: How long is the wait, usually, to be seated?
UNIDENTIFIED RESTAURANT HOST: For when? Tomorrow, or a weekday, or...
GOOGLE DUPLEX: For next Wednesday, the 7.
UNIDENTIFIED RESTAURANT HOST: No. It's not too busy. You can come for four people. OK?
GOOGLE DUPLEX: Oh - I gotcha. Thanks.
UNIDENTIFIED RESTAURANT HOST: Bye-bye.
(LAUGHTER, APPLAUSE)
VEDANTAM: The audience is laughing and applauding because the man making the call isn't a man but a machine.
PATEL: It didn't seem like it was a robotic voice. The robotic voices we're used to are the voices like when you are in a parking garage and you hear the, you know, please place your ticket with the stripe facing to the right - very, very canned sort of speech. This was far more sophisticated and much more like you and I talk, with hesitations and pauses, and ums and ahs. You think it's a human on the other end.
VEDANTAM: Now, of course, one of the things about that voice that Google had is that it did seem like a convincing voice, but if you need to convince me that that voice is not just a human voice but a particular human's voice, you need to convince me now this is not just anyone calling for a restaurant reservation, but it's Barack Obama calling for a restaurant reservation. Presumably, now the bar is much, much higher.
PATEL: That's right. It is. Barack Obama, though, does have a ton on his audio, (laughter), on the Internet. And there's a lot more audio to make, you know, his voice than there is my voice, for example. And so yeah. But it's absolutely possible to learn. And if you have long enough, you can learn anybody's voice. And if you have enough data.
(SOUNDBITE OF MUSIC)
VEDANTAM: It's not hard to see how bad actors could misuse this, create havoc in people's lives, trouble at companies, political misinformation.
PATEL: That's absolutely - I mean, we're seeing deep fakes in video. We've seen, you know, President Obama's face with - being manipulated and the audio coming out. You know, people creating these fake media in video, and you're also seeing it audio. That's exactly why the security aspects of what we're doing are trying to detect is that fake audio, or is that real? Is part of that fake audio or is part of that real, right? So it is - this isn't completely sci fi. It isn't so far away. It isn't necessarily 20 - you know, 2028. It's probably 2020. So we've got to get our defenses up in terms of questioning where that - the authenticity of audio just as we do video.
VEDANTAM: In 2017, a Canadian company called Lyrebird showed how audio deepfakes might work in politics.
(SOUNDBITE OF MONTAGE)
COMPUTER-GENERATED VOICE #1: (As Donald Trump) This is huge. They can make us say anything now - really anything.
COMPUTER-GENERATED VOICE #9: (As Barack Obama) The good news is that they will offer the technology to anyone.
COMPUTER-GENERATED VOICE #1: (As Donald Trump) This is huge. How does their technology work?
COMPUTER-GENERATED VOICE #10: (As Hillary Clinton) Hey, guys, I think that they use deep learning and artificial neural networks.
VEDANTAM: By 2019, deepfake audio technology had gotten even better. Shortly after critics panned the final season of "Game Of Thrones," a YouTube channel called Eating Things With Famous People put out this tongue-in-cheek video showing the supposed remorse of the lead character, Jon Snow.
(SOUNDBITE OF YOUTUBE VIDEO, "BREAKING: JON SNOW FINALLY APOLOGIZED FOR SEASON 8")
COMPUTER-GENERATED VOICE #11: (As Jon Snow) It's time for some apologies. I'm sorry we wasted your time. I'm sorry we didn't learn anything from the ending of "Lost." I have more lines in this video than I had in the last season. I'm sorry we wrote this in, like, six days or something. Now, let us burn the script of Season 8 and just forget it forever.
VEDANTAM: Spoofing a TV show is one thing, but imagine such high-quality deepfakes occurring in a more high-stakes setting. Voices are increasingly being used by financial institutions to authenticate the identities of consumers. Recently, Rupal worked with a bank to assess how vulnerable it was to vocal hacking.
PATEL: We tested their authentication system by creating synthetic samples or synthetic voices of particular individuals who are enrolled in their authentication system. And we tried to test those voices against the system to see if we could get through with the synthetic voices. And we were not able to do that for every single voice, but we were able to do it for some voices. And so it just starts to show that there is a vulnerability in this technology.
VEDANTAM: So how would you guard against it?
PATEL: Well, there are many ways to guard against it. One is you can classify the difference between is the audio signal I'm listening to - is it synthetic or is it human? As the synthetic voices become better and better sounding, that will be a more difficult decision to make. And it is something that if we can proactively solve, I think - or at least start to address - we're going to be way ahead of the curve than if we're trying to clean up our mess after the fact.
VEDANTAM: Despite the potential risks of these new technologies, Rupal is also optimistic. Voice synthesis tools have the potential to allow people to craft the voice they hear on the outside so that it matches the identity they feel on the inside.
PATEL: Ideally, in the future, these decisions are made by the end user themselves. Like, oh, I actually want that to be - sound a little breathier. I'd love that to sound a little bit more confident. And, I mean, how does that translate to the acoustics? We don't quite know yet, but that's actually I think where - when we can finally give the control of what the voice sounds like to the individual, I mean, that's the Holy Grail.
(SOUNDBITE OF MONTAGE)
VODER-GENERATED VOICE: She saw me. She saw me. She saw me. She saw me.
COMPUTER-GENERATED VOICE #2: Hello. I am (unintelligible) machine.
COMPUTER-GENERATED VOICE #6: You are listening to the voice of a machine.
COMPUTER-GENERATED VOICE #3: (Singing) A, B, C, D, E, F, G, H, I, J, K...
COMPUTER-GENERATED VOICE #7: The first kind of aid to be considered as a talking aid for the vocally handicapped.
BETTY: I am beautiful Betty, the standard female voice.
PAUL: I am the standard male voice, perfect Paul.
COMPUTER-GENERATED VOICE #12: I was sad because there was no ice cream in the freezer.
COMPUTER-GENERATED VOICE #13: The sky is clear and the stars are twinkling.
MAEVE: I can't wait for my friends to hear my new voice.
COMPUTER-GENERATED VOICE #3: (Singing) When we know our ABC's.
ALEXA: Once cup is eight fluid ounces.
COMPUTER-GENERATED VOICE #14: You are listening to a machine.
COMPUTER-GENERATED VOICE #1: (As Donald Trump) This is huge. How does their technology work?
COMPUTER-GENERATED VOICE #10: (As Hillary Clinton) Hey, guys, I think that they use deep learning and artificial neural networks.
GOOGLE DUPLEX: Hi. I'd like to reserve a table for Wednesday the 7.
(SOUNDBITE OF MUSIC)
VEDANTAM: This week's show was produced by Thomas Lu. It was edited by Tara Boyle and Rhaina Cohen. Our team includes Parth Shah, Jenny Schmidt and Laura Kwerel. Special thanks to Brent Baughman, Greg Sauer and Kavon Jones (ph).
(SOUNDBITE OF MUSIC)
VEDANTAM: Our unsung hero this week is Rebecca Ralph (ph). She's part of NPR's team looking at our changing interactions with smart speakers. She helped us record some of the smart devices you heard in this week's episode. Thanks, Rebecca. If you liked this episode, please share it with a friend. We're always looking for new people to discover HIDDEN BRAIN. I'm Shankar Vedantam, and this is NPR. Transcript provided by NPR, Copyright NPR.