Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

Dnes by som rada hovorila o silnom a základnom aspekte toho, kým sme: našom hlase. Každý z nás má jedinečný „odtlačok“ hlasu, ktorý odráža náš vek, našu veľkosť, dokonca aj náš životný štýl a našu osobnosť. Slovami básnika Longellowa: „Ľudský hlas je orgánom duše.“ Ako výskumníčku reči ma fascinuje, ako je ľudský hlas tvorený, a prišla som s koncepciou toho, ako je možné ho umelo navrhnúť. Práve o to by som sa s vami chcela podeliť.

I'm going to start by playing you a sample of a voice that you may recognize.

Začnem tým, že vám prehrám vzorku hlasu, ktorý možno spoznáte.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(nahrávka) Stephen Hawking: „Myslel by som, že úplne jasné, čo som chcel povedať.“

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

Rupal Patel: To bol hlas profesora Stephena Hawkinga. Možno však neviete, že ten istý hlas môže použiť aj toto malé dievčatko, ktoré pre neurologickú poruchu nemôže rozprávať. V skutočnosti všetci títo jedinci môžu používať ten istý hlas, a to preto, že k dispozícii je len niekoľko možností. Len v Spojených štátoch žije dva a pol milióna Američanov, ktorí nedokážu rozprávať, a mnohí z nich na komunikáciu používajú počítačové prístroje. Na celom svete teda ide o milióny ľudí, ktorí používajú generické hlasy, vrátane profesora Hawkinga, ktorý používa hlas s americkým prízvukom. Nedostatok osobitného rozlíšenia syntetického hlasu mi naozaj udrel do očí, keď som pred niekoľkými rokmi bola na konferencii pomocnej technológie. Spomínam si, že som vošla do sály a videla malé dievčatko a dospelého muža rozprávať sa pomocou svojich prístrojov. Rôznymi prístrojmi, ale rovnakým hlasom. Obzrela som sa okolo seba a to isté som videla všade dookola, doslova stovky ľudí s len niekoľkými hlasmi, hlasmi, ktoré sa nehodili k ich telám, ani ich osobnostiam. Ani by nám nenapadlo malému dievčatku dať umelú končatinu dospelého muža. Tak prečo používame rovnaký umelý hlas? Naozaj to na mňa zapôsobilo a chcela som s tým niečo spraviť.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Teraz vám prehrám vzorku niekoho, vlastne dvoch ľudí, ktorí majú ťažké poruchy reči. Chcem, aby ste si ich vypočuli a všimli si, ako znejú. Povedia tú istú repliku.

(First voice)

(prvý hlas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(druhý hlas) Pravdepodobne ste nerozumeli tomu, čo povedali, ale dúfam, že ste počuli ich jedinečné vokálne identity.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

Ďalším krokom, ktorý som chcela podniknúť, bolo, že som chcela zistiť, ako by sme tieto reziduálne vokálne schopnosti mohli zachovať a vytvoriť technológiu, ktorá by sa im dala prispôsobiť, hlasy, ktoré by sa im dali prispôsobiť. Tak som sa obrátila na svojho spolupracovníka, Tima Bunnella. Dr. Bunnell je odborníkom na rečovú syntézu a venuje sa vytváraniu prispôsobených hlasov pre ľudí tým, že spája vopred nahraté vzorky ich hlasu a rekonštruuje im hlas. Sú to ľudia, ktorí hlas stratili v neskoršom veku. Nemali sme možnosť použiť vopred nahraté vzorky reči tých, ktorí sa s rečovou poruchou narodili. Myslela som si ale, že musí existovať spôsob, ako spätne vytvoriť hlas aj z toho malého množstva, ktoré zostalo.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

Presne to sme sa aj rozhodli urobiť. Začali sme malou investíciou od National Science Foundation na vytvorenie prispôsobených hlasov, ktoré si zachovali jedinečné vokálne identity. Tento projekt voláme VocaliD alebo vokálne ID, vokálna identita.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Predtým, ako sa pustím do detailov o tom, ako sa hlas vytvára a pustím vám nahrávku, musím vám dať rýchlu prednášku o rečovej vede. V poriadku? Po prvé, vieme, že hlas sa v priebehu vývoja dramaticky mení. Deti znejú inak ako adolescenti a tí znejú inak než dospelí. Toto sme si všetci zažili. Druhou vecou je, že reč je kombináciou zdroja, ktorým sú vibrácie vytvárané našimi hlasivkami, ktoré sú potom presúvané zvyškom vokálneho traktu. V hlave a krku máme komory, ktoré vibrujú a filtrujú tento zdrojový zvuk na vytvorenie spoluhlások a samohlások. Takže práve touto kombináciou zdroja a filtra vytvárame reč. A to sa deje v každom jednotlivcovi.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

Už som vám povedala, že som veľkú časť svojej kariéry venovala pochopeniu a skúmaniu zdrojových charakteristík ľudí so závažnou poruchou reči, a zistila som, že aj keď ich filtre boli poškodené, dokázali modulovať zdrojový zvuk, jeho výšku, hlasitosť a tempo. Týmto znakom sa hovorí prozodika a už roky zaznamenávam, že prozodické schopnosti týchto jednotlivcov sú zachované. Takže keď som si uvedomila, že tie isté znaky sú dôležité pre identitu hovoriaceho, dostala som nápad. Prečo nevezmeme zdroj človeka, ktorého hlas chceme zvukovo napodobniť, keďže tento zdroj je zachovaný, a nepožičiame si filter od niekoho približne rovnakého veku a rovnakej veľkosti, pretože dokáže artikulovať, a potom ich nezmiešame? Pretože ak ich zmiešame, môžeme získať hlas, ktorý je rovnako jasný ako hlas toho, koho sme použili, od koho sme si požičali filter, a je podobný identite nášho cieľového hovoriaceho. Je to také jednoduché. To je vedecká stránka toho, čo robíme.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

Takže ak toto všetko vieme, ako tento hlas prakticky vytvoríme? Musíte nájsť niekoho, kto je ochotný požičať svoj filter. Nie je to až taký problém. Byť náhradným hovoriacim si vyžaduje len nahovorenie niekoľkých stoviek až niekoľkých tisícok replík. Proces prebieha približne takto.

(Video) Voice: Things happen in pairs.

(video) hlas: Veci sa dejú v pároch.

I love to sleep.

Milujem spánok.

The sky is blue without clouds.

Bez oblakov je nebo modré.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

RP: Teraz bude takto pokračovať tri až štyri hodiny, a cieľom nie je, aby povedala niečo, čo náš cieľový hovoriaci bude chcieť povedať, ale to, aby sme pokryli všetky kombinácie zvukov, ktoré sa v jazyku vyskytujú. Čím viac reči máte, tým lepšie znejúci hlas získate. Keď tieto nahrávky máte, ďalším krokom je, že ich musíme porozdeľovať na malé kúsočky reči, kombinácie jedného alebo dvoch zvukov, niekedy dokonca aj celé slová, ktoré začnú zapĺňať dataset alebo databázu. Túto databázu nazveme hlasovou bankou. Silou hlasovej banky je, že z tejto hlasovej banky teraz môžeme povedať akúkoľvek novú repliku, ako napríklad „milujem čokoládu“ – čo každý musí dokázať povedať – prehľadáte databázu a nájdete segmenty potrebné na vyslovenie tejto repliky.

(Video) Voice: I love chocolate.

(video) hlas: Milujem čokoládu.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: Takže to je hlasová syntéza. Hovorí sa jej aj konkatenatívna syntéza, a práve tú používame. Toto nie je žiadna novinka. Novinkou je to, že vieme zvukovo napodobniť túto mladú ženu.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

Toto je Samantha. Spoznala som ju, keď mala deväť rokov, a odvtedy sme sa jej s mojím tímom snažili vytvoriť prispôsobený, osobný hlas. Najprv sme museli nájsť náhradného hovoriaceho, a potom sme požiadali Samanthu, aby vytvorila nejaké repliky aj ona. Dokáže vyprodukovať prevažne zvuky pripomínajúce samohlásky, ale to pre nás nie je dosť na odvodenie jej zdrojových vlastností. Čo nasleduje, najlepšie opíše analógia mojej dcéry. Má šesť rokov. Ona tomu hovorí miešanie farieb na vymaľovanie hlasov. Je to krásne. Presne o to ide. Samanthin hlas je ako koncentrovaná vzorka červeného potravinového farbiva, ktoré môžeme vmiešať do nahrávok jej náhradníčky, a vytvoriť tak ružový hlas ako tento.

(Video) Samantha: Aaaaaah.

(video) Samantha: Aaaaaah.

RP: So now, Samantha can say this.

RP: Takže teraz Samantha môže povedať toto.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(video) Samantha: Tento hlas je len pre mňa. Nemôžem sa dočkať, keď ho budem môcť použiť s priateľmi.

RP: Thank you. (Applause)

RP: Ďakujem. (potlesk)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

Nikdy nezabudnem na nežný úsmev, ktorý sa jej rozprestrel na tvári, keď prvýkrát tento hlas počula. Na celom svete sú milióny ľudí presne ako Samantha. Milióny. A my sme sotva začali niečo robiť. Doteraz sme získali niekoľko náhradných hovoriacich z celých Spojených štátov, ktorí darovali svoje hlasy, a ich hlasy používame na vytvorenie niekoľkých prvých prispôsobených hlasov. Ale máme pred sebou ešte množstvo práce. Pokiaľ ide o Samanthu, jej náhradníčka pochádzala zo stredozápadu, bola to neznáma, ktorá jej venovala dar hlasu. A ako vedkyňa sa nesmierne teším na to, aby som túto prácu mohla vybrať z laboratória a konečne vniesť do skutočného sveta, aby mohla mať reálny vplyv. Chcem sa s vami ešte podeliť aj o to, ako si predstavujem prenesenie tejto práce na vyššiu úroveň. Predstavujem si celý svet náhradných hovoriacich, z rôznych oblastí života, rôznych veľkostí, rôznych vekov, spoločne sa podieľajúcich na tomto projekte, aby ľuďom dali hlasy, ktoré sú rovnako farebné ako ich osobnosti. Prvým krokom tohto procesu bolo, že sme dali dokopy túto stránku, VocaliD.org, ako médium na spojenie tých, ktorí sa chcú podieľať ako darcovia hlasu, ako darcovia – odborníci, akýmkoľvek spôsobom, aby túto víziu premenili na realitu.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

Hovorí sa, že darovanie krvi zachraňuje životy. Darovanie hlasu môže životy zmeniť. Od darcov hlasu nepotrebujeme nič viac než niekoľko hodín reči a od cieľového hovoriaceho nie viac než samohlásku a môžeme vytvoriť jedinečnú vokálnu identitu.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

Takže to je vedecká stránka toho, čo robíme. Skončiť by som chcela návratom späť k ľudskému aspektu, ktorý bol skutočnou inšpiráciou tejto práce. Približne pred piatimi rokmi sme zostavili prvý hlas pre malého chlapca menom William. Keď jeho hlas počula jeho mama, povedala: „Takto by William znel, keby mohol rozprávať.“ A potom som videla, ako William na svojom prístroji napísal správu. Rozmýšľala som, na čo asi myslí. Predstavte si, že deväť rokov strávite s hlasom niekoho iného, a potom konečne nájdete svoj vlastný hlas. Predstavte si to.

This is what William said: "Never heard me before."

William povedal toto: „Nikdy predtým som nepočul seba.“

Thank you.

Ďakujem.

(Applause)

(potlesk)

I'm going to start by playing you a sample of a voice that you may recognize.

Začnem tým, že vám prehrám vzorku hlasu, ktorý možno spoznáte.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(nahrávka) Stephen Hawking: „Myslel by som, že úplne jasné, čo som chcel povedať.“

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Teraz vám prehrám vzorku niekoho, vlastne dvoch ľudí, ktorí majú ťažké poruchy reči. Chcem, aby ste si ich vypočuli a všimli si, ako znejú. Povedia tú istú repliku.

(First voice)

(prvý hlas)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(druhý hlas) Pravdepodobne ste nerozumeli tomu, čo povedali, ale dúfam, že ste počuli ich jedinečné vokálne identity.

(Video) Voice: Things happen in pairs.

(video) hlas: Veci sa dejú v pároch.

I love to sleep.

Milujem spánok.

The sky is blue without clouds.

Bez oblakov je nebo modré.

(Video) Voice: I love chocolate.

(video) hlas: Milujem čokoládu.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

(Video) Samantha: Aaaaaah.

(video) Samantha: Aaaaaah.

RP: So now, Samantha can say this.

RP: Takže teraz Samantha môže povedať toto.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(video) Samantha: Tento hlas je len pre mňa. Nemôžem sa dočkať, keď ho budem môcť použiť s priateľmi.

RP: Thank you. (Applause)

RP: Ďakujem. (potlesk)

This is what William said: "Never heard me before."

William povedal toto: „Nikdy predtým som nepočul seba.“

Thank you.

Ďakujem.

(Applause)

(potlesk)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice