Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

Šiandien norėčiau aptarti galingą ir esminį aspektą to, kas mes esame: mūsų balsą. Kiekvienas iš mūsų turi unikalų balso tembrą, kuris atspindi mūsų amžių, dydį, net mūsų gyvenimo būdą ir asmenybę. Poeto Longfelo žodžiais, "žmogaus balsas yra sielos skambesys." Kaip kalbėsenos mokslininkė, aš esu sužavėta, kaip balsas yra sukuriamas, ir turiu idėją, kaip jis gali būti projektuojamas. Tuo ir norėčiau pasidalinti su jumis.

I'm going to start by playing you a sample of a voice that you may recognize.

Pradėti ketinu pavyzdžiu balso, kurį jūs veikiausiai atpažinsite.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Įrašas) Stivenas Hokingas: "Aš galvojau, kad buvo visiškai akivaizdu, ką turėjau omenyje."

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

Rupal Patel: Tai buvo profesoriaus Stiveno Hokingo balsas. Greičiausiai jūs nežinote, kad toks pats balsas taip pat gali būti naudojamas ir šios mažos mergaitės, kuri turi kalbos negalią dėl neurologinės ligos. Iš tiesų, visi sergantys asmenys naudoja tokį patį balsą todėl, kad įmanoma tėra rinktis iš kelių. Vien Jungtinės Valstijose yra 2,5 milijono kalbos negalią turinčių amerikiečių ir dauguma jų naudoja kompiuterizuotus prietaisus bendrauti. Pasaulyje yra milijonai žmonių, kurie naudoja bendrus balsus, įskaitant ir profesorių Hokingą, kuris naudoja balsą su amerikietišku akcentu. Individualizacijos trūkumo dirbtiniame balse problema iš tiesų sukrėtė mane, kai buvau technologijų neįgaliesiems konferencijoje prieš keletą metų, ir aš prisimenu save įeinančią į parodų salę ir matančią mažą mergaitę ir suaugusį vyrą besišnekančius naudojantis savo prietaisais, skirtingais prietaisais, bet tuo pačiu balsu. Aš apsidairiau aplink ir mačiau, kad tai vyksta visoje salėje, šimtai asmenų naudojosi vos keletu skirtingų balsų, kurie netiko jų kūnams ir jų asmenybėms. Mes nenorėtume matyti mažos mergaitės su suaugusio žmogaus galūnės protezu. Tad kodėl tas pats balsas? Tai iš tiesų sukrėtė mane ir aš panorau ko nors imtis.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Dabar ketinu parodyti pavyzdį žmogaus, iš tikrųjų dviejų žmonių, kurie turi sunkų kalbos sutrikimą. Noriu, kad pasiklausytumėte, kaip jie skamba. Jie abu sako tą patį.

(First voice)

(Pirmasis balsas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Antrasis balsas) Tikriausiai nesupratote, ką jie sako, bet tikiuosi, kad išgirdote, jog abu šie balsai unikalūs.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

Taigi, toliau ketinau atrasti kaip mes galėtume panaudoti šiuos gebėsenos kalbėti likučius ir sukurti technologiją, kuri būtų pritaikyta jiems, balsus, pritaikytus jiems. Taigi aš užmezgiau ryšius su savo bendradarbiu, Timu Bunelu. Dr. Bunelas yra kalbos sintezės ekspertas, taigi jis užsiima personalizuotų balsų kurimu žmonėms, kartu sudėdamas iš anksto įrašytus jų balso pavyzdžius ir atkurdamas jų balsą. Tai yra žmonės, kurie prarado balsą. Mes neturime prabangos įrašytų kalbos pavyzdžių tų, kurie gimė su kalbos negalia. Bet aš pamaniau, kad turi būti būdas atkurti balsą iš to, kas liko.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

Tad nusprendėme užsiimti būtent tuo. Su Nacionalinio Mokslo Fondo parama mes pamėginome sukurti dirbtinai sukurtus balsus, kurie atspindėtų unikalią jų balso tapatybę. Mes vadiname šį projektą VocaliD, arba vocal I.D., kaip balso tapatybę.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Prieš leidžiantis į detales kaip balsas yra kuriamas, ir leidžiant jo pasiklausyti, aš turiu išdėstyti trumpą kalbėsenos mokslo pamoką. Gerai? Pirma, mes žinome, kad balsas pastebimai kinta skirtingais gyvenimo laikotarpiais. Vaikų balsas skiriasi nuo paauglių, kurių balsas skiriasi nuo suaugusių. Mes visi tai patyrėme. Antra, kalba atsiranda, kai vibracijos, sukuriamos balso aparato, yra nustumiamos išilgai viso balso trakto. Tai yra ertmės galvoje ir kakle, kurios vibruoja ir jos iš tiesų filtruoja šaltinio garsą ir taip sukuriamos priebalsės ir balsės. Taigi šaltinio ir filtro kombinacija yra tai, kaip mes kalbame. Ir tai įvyksta kiekvienam individualiai.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

Aš minėjau anksčiau, kad didelę dalį savo karjeros aš praleidau bandydama suprasti ir studijuodama šaltinio savybes tų žmonių, kurie turi stiprius kalbėsenos sutrikimus, ir aš atradau, kad nors jų filtrai yra sumenkę, jie gali valdyti savo šaltinį: toną, garsumą, balso tempą. Tai vadinama prozodija, ir aš ilgus metus tyrinėjau, kad prozodiniai šių asmenų gebėjimai yra išsaugoti. Taigi kai aš supratau tai, kad šie panašūs signalai taip pat yra svarbūs kalbėtojo asmenybei, man gimė idėja. Kodėl nepaėmus šaltinio iš žmogaus, kurio balsą norime išgirsti, kadangi jis išsaugotas, ir nepasiskolinus filtro iš kieno nors panašaus amžiaus ir dydžio, kadangi jis gali artikuliuoti kalbą, ir šiuos du sujungti? Kadangi juos sujungus, mes galime gauti balsą, kuris bus aiškus kaip žmogaus, iš kurio filtras buvo pasiskolintas, bet panašus į žmogaus, kurio šaltinis naudotas balse. Tai taip paprasta. Tai mokslinis paaiškinimas tam, ką mes darome.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

Turėdami tai galvoje, kaip sukurtumėte balsą? Na, reikia surasti ką nors, kas norėtų būtų balso surogatu. Tai visiškai nėra nerimą keliantis dalykas. Buvimas surogatu donoru reikalauja tik pasakyti nuo kelių šimtų iki kelių tūkstančių išsireiškimų. Šis procesas skamba maždaug taip.

(Video) Voice: Things happen in pairs.

(Video) Balsas:

I love to sleep.

Aš mėgstu miegoti.

The sky is blue without clouds.

Dangus mėlynas be debesų.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

RP: Dabar ji darys tai apie tris ar keturias valandas ir pagrindinė mintis nėra ištarti viską, ką neįgalusis norėtų pasakyti, bet apimti visas skirtingas kombinacijas garsų, pasitaikančių kalboje. Kuo daugiau kalbos turima, tuo geriau skambantį balsą galima turėti. Jau turint šiuos įrašus, tereikia išnagrinėti šiuos įrašus į kalbos fragmentus, į vieno ar dviejų garsų kombinacijas, kartais net atskirus žodžius, ir taip pradėti įkurdinti duomenų bazę. Vadinkime šią duomenų bazę balso banku. Šio balso banko galia slypi tame, kad iš šio balso banko mes galime pasakyti bet kokį naują išsireiškimą, pavyzdžiui, "Aš mėgstu šokoladą" -- kiekvienas turi galėti tai pasakyti -- duomenų bazėje tereikia sužvejoti ir surasti reikiamas daleles šiam išsireiškimui.

(Video) Voice: I love chocolate.

(Video) Balsas: Aš mėgstu šokoladą.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: Taigi tai yra kalbos sintezė. Ji vadinama sujungiamąja sinteze, kurią mes naudojame. Tai nėra neįprastoji dalis. Neįprasta yra tai, kaip mes priversime tai skambėti kaip šios jaunos moters balsą.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

Štai Samanta. Kai aš ją sutikau, jai buvo devyneri, ir nuo tada aš ir mano komanda bandėme sukurti jai personalizuotą balsą. Pirmiausia mums teko surasti surogatą donorą, tada mums reikėjo, kad Samanta ištartų keletą frazių. Tai, ką ji gali ištarti daugiausiai yra į balses panašūs garsai, bet jų užtenka tam, kad iš jų išgautume jos šaltinio charakteristiką. Kas vyksta toliau, geriausiai apibūdina mano dukters analogija. Jai šešeri. Ji vadina tai spalvų maišymu nupiešti balsams. Tai nuostabu. Tai būtent tai. Samantos balsas yra lyg koncentruotas pavyzdys raudonų maistinių dažų, kuriuos įliejame į jos surogatinio balso įrašus gauti tokį kaip šis rožinį balsą.

(Video) Samantha: Aaaaaah.

(Video) Samanta: Aaaaaah.

RP: So now, Samantha can say this.

RP: Dabar Samanta gali pasakyti tai.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samanta: Šis balsas sukurtas tik man. Aš negaliu sulaukti, kol galėsiu kalbėti juo su savo draugais.

RP: Thank you. (Applause)

RP: Ačiū. (Plojimai)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

Aš niekad nepamiršiu švelnios šypsenos, kuri perbėgo jos veidu, kai ji išgirdo šį balsą pirmą kartą. Pasaulyje yra milijonai žmonių panašių į Samantą, milijonai, ir mes vos pradėjome darbą. Iki šiol mes turime keletą surogatų balsų iš Jungtinių Valstijų, kurie paaukojo savo balsus, ir mes naudojame juos kurdami pirmuosius personalizuotus balsus. Bet yra dar daugybė darbo. Samantos surogatė gyvena kažkur Vidurio Vakaruose, nepažįstamoji, kuri dovanojo jai balso dovaną. Kaip mokslininkė, aš esu labai sujaudinta perkeldama šį darbą iš laboratorijos į realų pasaulį, tad negaliu sulaukti šio darbo poveikio. Taip pat aš noriu pasidalinti su jumis kaip aš įsivaizduoju šį darbą kopiant į kitą lygmenį. Aš įsivaizduoju pasaulį pilną surogatų donorų iš įvairių gyvenimo sričių, skirtingo amžiaus ir padėčių, kurie kartu bus skatinami idėjos duoti žmonėms balsus, kurie būtų tokie pat spalvingi kaip ir jų asmenybės. Pirmasis žingsnis to link yra VocaliD.org puslapio sukūrimas, kaip būdas sujungti tuos, kurie nori prisidėti prie mūsų kaip balso donorai, kaip įvertinti donorai, bet kokiu keliu paversti šią viziją realybe.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

Sakoma, kad kraujo davimas gali išsaugoti gyvybes. Na, balso davimas gali pakeisti gyvenimus. Viskas, ko reikia, yra kelios valandos kalbos surogato šnekos ir visai nedaug, vos balsės, iš neįgaliojo, kad būtų sukurta unikali balso tapatybė.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

Taigi tai mokslas, kuo paremta tai, ką darome. Norėčiau baigti grįžtant prie žmogiškosios pusės, kuri įkvepia šiam darbui. Maždaug prieš penkerius metus sukūrėme patį pirmą balsą mažam berniuku, vardu Viljamas. Kai jo mama pirmą kartą išgirdo šį balsą, ji pasakė, "Štai kaip Viljamas būtų kalbėjęs, jei būtų galėjęs kalbėti." Ir tada pamačiau Viljamą rašantį žinutę savo prietaise. Aš mąsčiau, ką jis galvojo? Įsivaizduokite, ką reiškia devynerius metus kalbėti svetimu balsu ir galiausiai surasti savąjį. Tik įsivaizduokite.

This is what William said: "Never heard me before."

Štai, ką Viljamas pasakė: "Niekada nesu girdėjęs savęs."

Thank you.

Ačiū.

(Applause)

(Plojimai)

I'm going to start by playing you a sample of a voice that you may recognize.

Pradėti ketinu pavyzdžiu balso, kurį jūs veikiausiai atpažinsite.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Įrašas) Stivenas Hokingas: "Aš galvojau, kad buvo visiškai akivaizdu, ką turėjau omenyje."

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Dabar ketinu parodyti pavyzdį žmogaus, iš tikrųjų dviejų žmonių, kurie turi sunkų kalbos sutrikimą. Noriu, kad pasiklausytumėte, kaip jie skamba. Jie abu sako tą patį.

(First voice)

(Pirmasis balsas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Antrasis balsas) Tikriausiai nesupratote, ką jie sako, bet tikiuosi, kad išgirdote, jog abu šie balsai unikalūs.

(Video) Voice: Things happen in pairs.

(Video) Balsas:

I love to sleep.

Aš mėgstu miegoti.

The sky is blue without clouds.

Dangus mėlynas be debesų.

(Video) Voice: I love chocolate.

(Video) Balsas: Aš mėgstu šokoladą.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

(Video) Samantha: Aaaaaah.

(Video) Samanta: Aaaaaah.

RP: So now, Samantha can say this.

RP: Dabar Samanta gali pasakyti tai.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samanta: Šis balsas sukurtas tik man. Aš negaliu sulaukti, kol galėsiu kalbėti juo su savo draugais.

RP: Thank you. (Applause)

RP: Ačiū. (Plojimai)

This is what William said: "Never heard me before."

Štai, ką Viljamas pasakė: "Niekada nesu girdėjęs savęs."

Thank you.

Ačiū.

(Applause)

(Plojimai)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice