Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

Danes bi rada govorila o močnem in bistvenem vidiku tega kar smo. O našem glasu. Vsak od nas ima unikaten odtis glasu ki odseva našo starost, našo velikost, celo naš stil življenja in osebnost. V besedah pesnika Longfellowa: "Človeški glas je glasbilo duše." Kot znanstvenico govora me navdušuje nastanek zvoka in imam idejo, kako bi ga lahko naredili. To bi rada delila z vami.

I'm going to start by playing you a sample of a voice that you may recognize.

Začela bom s predvajanjem vzorca glasu, ki ga boste morda prepoznali.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Posnetek) Stephen Hawking: Menil bi, da je bilo popolnoma očitno, kaj sem mislil."

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

Rupal Patel: To je bil glas profesorja Stephena Hawkinga. Ne veste pa, da bi isti glas lahko uporabljala ta deklica, ki ne more govoriti zaradi nevrološke okvare. Pravzaprav, vsi ti posamezniki bi lahko uporabljali ta glas, ker je na voljo zelo malo možnosti. Samo v ZDA, je okrog 2,5 milijona Američanov ki ne morejo govoriti, in mnogi izmed njih uporabljajo računalniške naprave za komunikacijo. To so milijoni ljudi po vsem svetu, ki uporabljajo generični glas kot profesor Hawking, ki uporablja glas z ameriškim naglasom. To pomanjkanje individualnosti sintetičnega glasu me je res zadelo, ko sem bila na konferenci tehnologije za pomoč pred nekaj leti, in spominjam se, da sem stopila v razstavno dvorano in videla, kako se deklica in odrasel moški pogovarjata s svojima napravama, različne naprave, a enak glas. Pogledala sem okrog in to se je dogajalo vse okrog mene, na stotine posameznikov je uporabljalo samo nekaj glasov, glasov, ki se niso skladali z njihovimi telesi ali osebnostmi. Niti v sanjah ne bi deklici namestili proteze uda za odraslega moškega. Zakaj torej enak protetični glas? Res me je zadelo, in nekaj sem želela storiti glede tega.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Sedaj vam bom predvajala vzorec nekoga, ki ima, dva človeka pravzaprav, ki imata hude motnje govora. Želim, da slišite, kako zvenita. Izgovarjata isti zvok.

(First voice)

(Prvi glas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Drugi glas) Najbrž niste razumeli, kaj sta rekla, ampak upam, da ste slišali njuni unikatni glasovni identiteti.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

Potem sem hotela izvedeti, kako bi lahko izkoristili te preostale vokalne zmožnosti in zgradili tehnologijo, ki bi jim bila prilagojena, glasovi, ki bi jima bili prilagojeni. Povezala sem se s svojim sodelavcem, Timom Bunnelom. Dr. Tim Bunnell je strokovnjak za sintezo govora, in on ustvarja personalizirane glasove za ljudi s sestavljanjem prej posnetih vzorcev njihovega glasu in rekonstrukcijo glasu zanje. To so ljudje, ki so izgubili svoj glas kasneje v življenju. Mi nismo imeli te sreče, da bi imeli prej posnete vzorce govora ljudi, ki so rojeni z motnjo govora. Ampak pomislila sem, da gotovo obstaja način, da z obratnim inženiringom dobimo glas iz tistega, kar je še ostalo.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

Točno to smo se odločili storiti. Začeli smo z malo financiranja Nacionalne znanstvene fundacije, da bi ustvarili po meri narejene glasove, ki bi ujeli njihove edinstvene glasovne lastnosti. Temu projektu pravimo VocaliD, ali vocal I.D., vokalna identiteta.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Preden grem v podrobnosti izdelave glasu in vam ga pustim poslušati, vam moram res hitro razložiti osnove znanosti glasu, v redu? Torej, vemo, da se glas dramatično spreminja med razvojem. Otroci zvenijo drugače od najstnikov, ki zvenijo drugače kot odrasli. Vsi smo to izkusili. Drugo dejstvo je, da je govor kombinacija vira, to so vibracije, ki jih proizvaja vaš glasovni aparat, ki jih potem potisnemo skozi preostali vokalni trakt. To so votline v vaši glavi in vratu, ki vibrirajo, in dejansko filtrirajo ta izvorni zvok, da ustvarijo soglasnike in samoglasnike. S kombinacija izvora in filtra mi proizvajamo zvok. In to se zgodi v enem posamezniku.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

Prej sem vam povedala, da sem preživela dober del svoje kariere, da sem preučevala in skušala razumeti izvorne lastnosti ljudi s hudimi motnjami govora, in odkrila sem, da čeprav so bili njihovi filtri poškodovani, so lahko uravnavali svoj vir: višino, glasnost, tempo svojega glasu. To se imenuje prozodija, in že leta dokumentiram, da so prozodične sposobnosti teh posameznikov ohranjene. Ko sem spoznala, da so te iste iztočnice prav tako pomembne za identiteto govorca, sem dobila idejo. Zakaj ne vzamemo vira nekoga, čigar glas si želimo, ker je ohranjen, in si filter sposodimo od nekoga istih let in velikosti, ker lahko artikulirajo svoj govor, in jih potem pomešamo? Ker ko jih pomešamo, dobimo glas, ki je tako jasen kot naš govornik darovalec - to je oseba, pri kateri smo si izposodili filter - in je v identiteti podoben našemu govorcu. Tako preprosto je. To je znanost za našim delom.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

Ko imaš enkrat to v mislih, kako potem zgradiš ta glas? No, najti moraš nekoga, ki je pripravljen biti darovalec. To ni tako zlovešča stvar. Biti darovalec zahteva samo izgovorjavo nekaj sto do nekaj tisoč glasov. To proces poteka nekako takole.

(Video) Voice: Things happen in pairs.

(Video) Glas: Stvari se dogajajo v parih.

I love to sleep.

Rad spim.

The sky is blue without clouds.

Nebo je modro brez oblakov.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

RP: Takole bo šla naprej približno tri do štiri ure, in ideja ni, da izreče vse, kar bo hotela izreči prejemnica, ampak je ideja, da pokrijemo vse različne kombinacije zvokov, ki se lahko pojavijo v jeziku. Več govora kot imaš, lepše zveneč glas boš imel. Ko imamo te posnetke, jih moramo razčleniti v majhne drobce govora, kombinacije enega ali dveh zvokov, včasih celo cele besede da začnemo zapolnjevati set podatkov ali podatkovno bazo. Tej bazi podatkov bomo rekli glasovna banka. Moč glasovne banke je v tem, da iz nje lahko izgovorimo karkoli, recimo: "Rada imam čokolado" - vsakdo mora biti sposoben izreči to - pobrskamo po bazi podatkov in najdemo vse potrebne segmente za izgovorjavo tega.

(Video) Voice: I love chocolate.

(Video) Glas: Rada imam čokolado.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: To je sinteza govora. To je konkatenativna sinteza, in to mi uporabljamo. To ni nekaj novega. Nov del je, kako naredimo, da zveni kot ta mlada ženska.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

To je Samantha. Spoznala sem jo, ko je imela 9 let, in od takrat ji z ekipo poskušamo zgraditi njen osebni glas. Najprej smo morali najti nadomestnega donorja, potem je morala Samantha izgovoriti nekaj zlogov. Lahko proizvaja predvsem samoglasnike, a to je bilo za nas dovolj, da smo lahko pridobili njene značilnosti. Kar se zgodi potem, najlepše opiše analogija moje hčere. Šest let ima. Temu pravi, da mešamo barve, za risanje glasov. Prelepo je. Točno tako je. Samanthin glas je kot koncentriran vzorec rdečega barvila za hrano, ki ga lahko damo v posnetke njenega darovalca, da dobimo tak rožnat glas.

(Video) Samantha: Aaaaaah.

(video) Samantha: Aaaaah.

RP: So now, Samantha can say this.

RP: Sedaj, lahko Samantha reče tole.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samantha: Ta glas je samo zame. Komaj čakam, da uporabim svoj glas s svojimi prijatelji.

RP: Thank you. (Applause)

RP: Hvala. (Aplavz)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

Nikoli ne bom pozabila nežnega nasmeška, ki se je razširil čez njen obraz, ko je prvič slišala ta glas. Tu so milijoni ljudi po celem svetu, ki so kot Samantha, milijoni, in šele začeli smo. Sedaj imamo nekaj nadomestnih govorcev po celih Združenih državah Amerike, ki so darovali svoj glas, in mi smo jih uporabljali, da smo zgradili naše prve personalizirane glasove. A veliko dela je še pred nami. Za Samantho, njen donor prihaja nekje iz srednjega zahoda, tujec, ki ji je dal dar glasu. Kot znanstvenica, sem tako vznemirjena, da smo vzeli to delo iz laboratorija v realni svet, kjer ima lahko realen vpliv. Z vami bi rada delila, kako bi popeljala to delo na naslednjo raven. Predstavljam si cel svet nadomestnih darovalcev, vseh vrst, različnih velikosti, različne starosti, kako se srečujejo v tej bazi glasov, da bi dali ljudem glasove, ki so tako barviti kot njihove osebnosti. Prvi korak je bila postavitev spletne strani, VocaliD.org, da bi zbrali tiste, ki bi se nam radi pridružili kot darovalci glasov, kot izkušeni darovalci, na kakršenkoli način, da bi ta načrt postal resničnost.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

Pravijo, da darovanje krvi rešuje življenja. No, če posodite svoj glas, lahko spremenite življenja. Vse, kar potrebujemo, je nekaj ur govora našega nadomestnega govorca in samo samostalnik našega govorca, da ustvarimo unikatno glasovno identiteto.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

To je znanost za našim delom. Zaključila bi s človeško platjo, ki je resnični navdih za to delo. Pred petimi leti, smo zgradili naš prvi glas za majhnega fanta po imenu William. Ko je njegova mama prvič slišala ta glas, je rekla: "Tako bi William zvenel, če bi lahko govoril." Videla sem Williama, kako tipka sporočilo na svoji napravi. Spraševala sem se, le o čem razmišlja? Predstavljajte si, da nosite glas nekoga drugega devet let in končno najdete svoj glas. Predstavljajte si. To je rekel William:

This is what William said: "Never heard me before."

"Sebe pa še nisem slišal."

Thank you.

Hvala.

(Applause)

(Aplavz)

I'm going to start by playing you a sample of a voice that you may recognize.

Začela bom s predvajanjem vzorca glasu, ki ga boste morda prepoznali.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Posnetek) Stephen Hawking: Menil bi, da je bilo popolnoma očitno, kaj sem mislil."

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Sedaj vam bom predvajala vzorec nekoga, ki ima, dva človeka pravzaprav, ki imata hude motnje govora. Želim, da slišite, kako zvenita. Izgovarjata isti zvok.

(First voice)

(Prvi glas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Drugi glas) Najbrž niste razumeli, kaj sta rekla, ampak upam, da ste slišali njuni unikatni glasovni identiteti.

(Video) Voice: Things happen in pairs.

(Video) Glas: Stvari se dogajajo v parih.

I love to sleep.

Rad spim.

The sky is blue without clouds.

Nebo je modro brez oblakov.

(Video) Voice: I love chocolate.

(Video) Glas: Rada imam čokolado.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: To je sinteza govora. To je konkatenativna sinteza, in to mi uporabljamo. To ni nekaj novega. Nov del je, kako naredimo, da zveni kot ta mlada ženska.

(Video) Samantha: Aaaaaah.

(video) Samantha: Aaaaah.

RP: So now, Samantha can say this.

RP: Sedaj, lahko Samantha reče tole.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samantha: Ta glas je samo zame. Komaj čakam, da uporabim svoj glas s svojimi prijatelji.

RP: Thank you. (Applause)

RP: Hvala. (Aplavz)

This is what William said: "Never heard me before."

"Sebe pa še nisem slišal."

Thank you.

Hvala.

(Applause)

(Aplavz)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice