Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

Aş dori să vorbesc azi despre un aspect puternic şi fundamental al persoanei noastre: vocea. Fiecare dintre noi are o amprentă vocală unică care ne reflectă vârsta, dimensiunea, chiar şi stilul de viaţă şi personalitatea. Prin cuvintele poetului Longfellow, "vocea umană este organul sufletului." Ca cercetător în domeniul vorbirii, sunt fascinată de modul în care se produce vocea şi am o idee despre cum poate fi ea creată. Asta doresc să împărtăşesc cu voi.

I'm going to start by playing you a sample of a voice that you may recognize.

Voi începe prin a vă reda o mostră dintr-o voce pe care poate o recunoaşteţi.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Înregistrare) Stephen Hawking: "Aş fi crezut că a fost destul de evident ce am vrut să spun."

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

Rupal Patel: Asta a fost vocea profesorului Stephen Hawking. Ceea ce probabil nu ştiţi este faptul că aceeaşi voce poate fi folosită de această fetiţă care nu poate vorbi din cauza unei afecţiuni neorologice. De fapt, toţi aceşti indivizi pot folosi aceeaşi voce, deoarece sunt disponibile doar câteva opţiuni. Doar în Statele Unite sunt 2,5 milioane de americani care nu pot vorbi, mulţi dintre ei folosind dispozitive computerizate pentru a comunica. Asta înseamnă milioane de oameni din lumea întreagă ce folosesc voci generice, inclusiv profesorul Hawking, care foloseşte o voce cu accent american. Lipsa de personalizare a vocii sintetice m-a frapat când mă aflam la o conferinţă despre tehnologia de asistenţă acum câţiva ani, şi-mi amintesc că am intrat într-o sală de expoziţie şi am văzut o fetiţă şi un adult care conversau folosindu-şi dispozitivele - dispozitive diferite, dar cu aceeaşi voce. M-am uitat în jur şi am văzut că asta se întâmpla peste tot în jurul meu: efectiv sute de persoane foloseau o mână de voci, voci care nu li se potriveau corpului lor sau personalităţii lor. Nu ne-ar trece prin minte să punem unei fetiţe o proteză de membru de adult. Atunci de ce se foloseşte aceeaşi proteză vocală? Chiar m-a şocat şi am vrut să fac ceva cu acest lucru.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Vă voi reda acum o mostră a cuiva, de fapt a două persoane care au tulburări severe de vorbire. Doresc să ascultaţi cum sună. Rostesc acelaşi lucru.

(First voice)

(Prima voce)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(A doua voce) Probabil nu aţi înţeles ce au spus, dar sper că aţi auzit identităţile lor vocale unice.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

Ceea ce am vrut să fac apoi a fost să aflu cum putem valorifica aceste capacităţi vocale reziduale şi să construim o tehnologie personalizată pentru ei, voci personalizate. Aşa am ajuns la colaboratorul meu, Tim Bunnell. Dr. Bunnell este expert în sinteza vorbirii. El a construit voci personalizate pentru oameni prin alăturarea unor mostre ale vocii lor înregistrate în prealabil şi reconstruirea unei voci pentru ei. Sunt oameni care şi-au pierdut vocea mai târziu în viaţă. Nu aveam luxul de a deţine mostre de vorbire preînregistrate pentru cei născuţi cu tulburări de vorbire. Dar m-am gândit că trebuie să fie un mod de a construi o voce de la coadă la cap, din puţinul care mai există.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

Aşa că am decis să facem exact asta. Am pornit cu o mică finanţare de la Fundaţia Naţională pentru Ştiinţă pentru a crea voci artizanale la comandă care să captureze identităţile lor vocale unice. Proiectul l-am numit VocaliD, sau vocal I.D., de la identitate vocală.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Înainte să intru în detalii despre modul în care e creată vocea și să vă las să o ascultaţi, trebuie să vă ţin o lecţie succintă despre ştiinţa vorbirii. În primul rând ştim că vocea se schimbă dramatic pe parcursul maturizării. Copiii sună diferit faţă de adolescenţi, care sună diferit faţă de adulţi. Toţi am trecut prin asta. În al doilea rând, vorbirea este o combinaţie a sursei, respectiv vibraţiile generate de cutia vocală împinse apoi prin restul tractului vocal. Acestea sunt cavitățile capului şi gâtului care vibrează şi filtrează sunetul sursă pentru a produce consoane şi vocale. Deci combinând sursa şi filtrul producem vorbire. Asta se întâmplă la o persoană.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

V-am spus mai devreme că mi-am consacrat o bună parte din carieră în înţelegerea şi studierea caracteristicilor sursei la persoanele cu tulburări de vorbire severe şi am aflat că deşi filtrele lor erau deteriorate, erau capabili să-şi ajusteze sursa: înălţimea vocii, intensitatea, ritmul ei. Acestea se numesc elemente metrice şi m-am documentat ani de zile asupra conservării abilităţilor metrice ale acestor persoane. Când am realizat că aceleaşi elemente sunt importante şi în identitatea vorbitorului, mi-a venit următoarea idee. De ce să nu luăm sursa de la persoana cu care vrem să semene vocea, deoarece este conservată, şi să împrumutăm filtrul de la cineva de aproximativ aceeaşi vârstă şi mărime, pentru că poate articula cuvintele, şi apoi să le amestecăm? Deoarece, când le amestecăm, putem obţine o voce la fel de clară ca cea a vorbitorului surogat - adică persoana de la care am împrumutat filtrul - şi similară în identitate cu vorbitorul ţintă. Simplu de tot. Asta e ştiinţa de la baza a ceea ce facem noi.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

Odată ce ai asta în minte, cum faci să sintetizezi vocea? Trebuie să găseşti pe cineva dispus să fie surogat. Nu-i așa rău. Să fii donator surogat necesită doar să spui de la câteva sute la câteva mii de enunţuri. Procesul funcţionează cam aşa.

(Video) Voice: Things happen in pairs.

(Video) Vocea: Lucrurile se întâmplă în pereche.

I love to sleep.

Îmi place să dorm.

The sky is blue without clouds.

Cerul este albastru fără nori.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

RP: Va vorbi tot aşa pentru trei-patru ore. Scopul nu este să spună tot ceea ce persoana ţintă vrea să spună, ci să acopere toate combinaţiile diferite de sunete care apar în limbaj. Cu cât ai mai mult discurs, cu atât mai bine va suna vocea. Odată ce avem aceste înregistrări, ceea ce trebuie să facem este să le analizăm în fragmente mici de vorbire, combinaţii de câte unul sau două sunete, câteodată chiar cuvinte întregi care încep popularea unui set de date sau a unei baze de date. Numim această bază de date o bancă de voci. Puterea aceste bănci de voci este că din această bancă putem acum alcătui orice enunţ nou, precum "Îmi place ciocolata." Toată lumea trebuie să poată spune asta. Căutăm în baza de date şi găsim toate segmentele necesare pentru a pronunța acest fragment.

(Video) Voice: I love chocolate.

(Video) Vocea: Îmi place ciocolata.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: Asta înseamnă sinteza vorbirii. Se numeşte sinteză concatenativă şi asta e ceea ce folosim noi. Nu asta e partea de noutate. Noutatea este cum o facem să sune ca această tânără.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

Ea este Samantha. Am întâlnit-o când avea nouă ani şi de atunci echipa mea şi cu mine am încercat să-i construim o voce personalizată. A trebuit întâi să găsim un donator surogat şi apoi să o punem pe Samantha să facă nişte enunţuri. Ceea ce poate ea să scoată sunt în mare sunete ca vocalele, dar suficient pentru noi ca să-i luăm caracteristicile sursei ei. Ceea ce se întâmplă apoi este cel mai bine descris prin analogia fiicei mele. Are şase ani. Ea spune că amestecăm culori ca să pictăm vocile. E minunat şi exact asta este. Vocea Samanthei e ca o mostră concentrată de colorant alimentar roşu pe care-l putem turna peste înregistrările surogatului ei pentru a face o voce roz ca aceasta.

(Video) Samantha: Aaaaaah.

RP: So now, Samantha can say this.

RP: Acum, Samantha poate spune asta.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samantha: Această voce este doar pentru mine. De-abia aştept să-mi folosesc vocea nouă cu prietenii.

RP: Thank you. (Applause)

RP: Mulţumim. (Aplauze)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

Nu voi uita niciodată zâmbetul delicat ce i s-a întins pe faţă când a auzit acea voce prima dată. Există milioane de oameni în lumea întreagă cum e Samantha, milioane, şi abia suntem la început. Până acum avem câţiva vorbitori surogat din Statele Unite care şi-au donat vocile, şi le-am folosit pentru a construi primele voci personalizate. Dar mai este atât de mult de făcut. Pentru Samantha, surogatul ei este de undeva din Vestul Mijlociu, o necunoscută care i-a oferit în dar vocea. Ca om de ştiinţă, sunt foarte încântată să scot din laborator toată munca asta şi s-o implementez în sfârşit în lumea reală pentru un impact real asupra lumii. Ce doresc să împărtăşesc în continuare cu voi este modul în care îmi imaginez să duc munca la următorul nivel. Îmi imaginez o lume întreagă de donatori surogaţi din toate etapele vieţii, de dimensiuni şi vârste diferite, care să ajungă în această bancă pentru a da oamenilor voci colorate ca personalităţile lor. Pentru asta, ca prim pas am făcut împreună acest website, VocaliD.org, pentru a-i aduce laolaltă pe cei care doresc să ni se alăture ca donatori de voci, ca experți, în orice fel, pentru a transforma această viziune în realitate.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

Se spune că donarea de sânge salvează vieţi. Ei, donarea de voce schimbă vieţi. Tot ce avem nevoie sunt câteva ore de pronunţie din partea vorbitorului surogat, şi de puţin, precum o vocală, de la vorbitorul ţintă pentru a crea o identitate vocală unică.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

Asta e ştiinţa din spatele a ceea ce facem. Doresc să închei prin a readuce în discuţie partea umană, adevărata inspiraţie pentru această muncă. Acum vreo cinci ani am construit prima noastră voce pentru un băieţel pe nume William. Când mama lui a auzit pentru prima dată vocea, a spus: "Aşa ar fi sunat William dacă ar fi putut vorbi." Apoi l-am văzut pe William scriind un mesaj pe dispozitivul lui. Mă gândeam la ce era în mintea lui. Imaginaţi-vă să purtaţi vocea altcuiva timp de nouă ani şi în final să vă găsiţi vocea proprie. Imaginaţi-vă asta.

This is what William said: "Never heard me before."

Iată ce a scris William: "Nu m-am mai auzit niciodată pe mine."

Thank you.

Mulţumim.

(Applause)

(aplauze)

I'm going to start by playing you a sample of a voice that you may recognize.

Voi începe prin a vă reda o mostră dintr-o voce pe care poate o recunoaşteţi.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Înregistrare) Stephen Hawking: "Aş fi crezut că a fost destul de evident ce am vrut să spun."

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Vă voi reda acum o mostră a cuiva, de fapt a două persoane care au tulburări severe de vorbire. Doresc să ascultaţi cum sună. Rostesc acelaşi lucru.

(First voice)

(Prima voce)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(A doua voce) Probabil nu aţi înţeles ce au spus, dar sper că aţi auzit identităţile lor vocale unice.

(Video) Voice: Things happen in pairs.

(Video) Vocea: Lucrurile se întâmplă în pereche.

I love to sleep.

Îmi place să dorm.

The sky is blue without clouds.

Cerul este albastru fără nori.

(Video) Voice: I love chocolate.

(Video) Vocea: Îmi place ciocolata.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

RP: Asta înseamnă sinteza vorbirii. Se numeşte sinteză concatenativă şi asta e ceea ce folosim noi. Nu asta e partea de noutate. Noutatea este cum o facem să sune ca această tânără.

(Video) Samantha: Aaaaaah.

RP: So now, Samantha can say this.

RP: Acum, Samantha poate spune asta.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Video) Samantha: Această voce este doar pentru mine. De-abia aştept să-mi folosesc vocea nouă cu prietenii.

RP: Thank you. (Applause)

RP: Mulţumim. (Aplauze)

This is what William said: "Never heard me before."

Iată ce a scris William: "Nu m-am mai auzit niciodată pe mine."

Thank you.

Mulţumim.

(Applause)

(aplauze)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice