Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

Я б хотіла розповісти сьогодні про могутній та фундаментальний аспект того, ким ми є: наш голос. Кожен із нас має унікальний спектр голосу, який відображає наш вік, нашу статуру, навіть наш стиль життя та особистість. Словами поета Лонґфелло: "людський голос є органом душі". Як науковець, котрий вивчає мовлення, я захоплююся тим, як продукується голос, і я маю уявлення про те, як він генерується. Це те, про що б я хотіла вам розповісти.

I'm going to start by playing you a sample of a voice that you may recognize.

Я почну з програвання зразка голосу, який ви можете упізнати.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Звукозапис) Стівен Гокінґ: "Я думав, що було досить очевидним, що я мав на увазі".

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

Рупал Пател: Це був голос Професора Стівена Гокінґа. Ви можете не знати, що той самий голос може також використовувати ця маленька дівчинка, яка не може говорити через неврологічний розлад. Насправді, усі ці індивіди можуть використовувати той самий голос, це відбувається через те, що є всього кілька доступних опцій. Тільки у США є 2,5 мільйони американців, неспроможних говорити, і багато з них використовують комп'ютеризовані прилади для комунікації. На даний час мільйони людей по всьому світу користуються згенерованими голосами, включно з професором Гокінґом, який використовує голос з американським акцентом. Цей брак індивідуалізації синтетичного голосу дуже зачепив мене за живе, коли я була на конференції з допоміжних технологій кілька років тому, я пригадую, як ішла виставковою залою і побачила маленьку дівчинку і дорослого чоловіка, які спілкувалися, використовуючи їхні прилади, різні прилади, але однаковий голос. Я подивилася довкола і побачила, що це відбувається всюди довкола мене, буквально сотні осіб використовували невеличку кількість голосів, голосів, які не підходили їхнім тілам або їхнім особистостям. Ми б не хотіли уявити маленьку дівчинку з протезом кінцівки дорослого чоловіка. То ж чому однаковий штучний голос? Це насправді шокувало мене, і я хотіла щось з цим зробити.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

Зараз я прокручу вам запис однієї особи, власне, двох осіб зі значними розладами мовлення. Я хочу, щоб ви послухали, як вони звучать. Вони промовляють одне і те ж речення.

(First voice)

(Перший голос)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Другий голос) Ймовірно, ви не зрозуміли, що вони сказали, але, сподіваюся, ви почули їхні унікальні вокальні ідентичності.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

Тож далі я хотіла знайти, як ми можемо продукувати цю залишкову вокальну здатність і розробити технологію, яку можна підлаштована під них, голоси, які можна підлаштувати для них. Тож я звернулася до свого співробітника, Тіма Баннела. Доктор Баннел є експертом із синтезування мовлення і займався розробкою персоналізованих голосів для людей, складаючи докупи попередньо записані зразки їхнього голосу та реконструюючи голос для них. Це люди, які втратили голос пізніше у житті. У нас не було такої розкоші, як попередньо записані зразки мовлення для тих, хто народився з мовленнєвими розладами. Але я подумала, що мав бути спосіб для зворотнього генерування голосу із тої дещиці, яка залишилася.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

І ми вирішили саме так і зробити. Ми почали з незначним фінансуванням від Національної наукової фундації, щоб створити голоси, розроблені під споживача, які містили їхню унікальну голосову ідентичність. Ми називаємо цей проект ВокалІД, або вокальний I.D., для вокальної ідентичності.

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

Перед тим, як я вдамся до деталей, як робиться голос і дам вам його послухати, мушу вам дати коротенький урок з науки про мовлення, гаразд? По-перше, ми знаємо, що голос кардинально змінюється у процесі розвитку. Діти звучать інакше, ніж підлітки, які звучать відмінно від дорослих. Ми всі це пройшли. По-друге, мовлення є комбінацією джерела, яким є вібрації, згенеровані вашою голосовою скринькою, які надалі передаються крізь решту голосового тракту. Вібрації створюють камери вашої голови та шиї, і, власне, вони фільтрують це джерело звуку для продукування приголосних і голосних. Отож поєднання джерела та фільтру - ось як ми продукуємо мовлення. І це відбувається в одному індивіді.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

Я говорила вам раніше, що я провела значну частину моєї кар'єри, спостерігаючи та вивчаючи характеристики первинних джерел людей із значними розладами мовлення, і ми виявили, що навіть при тому, що функція їхніх фільтрів була порушена, вони були здатні модулювати джерело звуку: тон, гучність, темп їхнього голосу. Ці характеристики називаються прозодичними, і я роками документально підтверджувала, що прозодичні здатності цих індивідів були збережені. Коли я усвідомила, що ці самі засоби є також важливими для ідентичності мовця, у мене з'явилася ідея. Чому б нам не взяти джерело від людини, чий голос ми б хотіли згенерувати, оскільки він збережений, і позичити фільтр у когось такого ж віку та статури, бо вони можуть артикулювати мовлення, і змішати їх? Бо, коли ми змішаємо їх, ми можемо отримати голос, настільки ж чистий, як у нашого сурогатного мовця - людини, в якої ми позичили фільтр - і подібний до ідентичності нашого цільового мовця. Настільки просто. Ось така наука стоїть за тим, що ми робимо.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

Як тільки у вас з'являється думка про те, як вам згенерувати цей голос? Що ж, вам треба знайти когось, хто забажає бути сурогатом. Це не настільки жахлива річ. Бути сурогатним донором вимагає від вас тільки проговорити від декількох сотень до декількох тисяч речень. Процес виглядає приблизно так.

(Video) Voice: Things happen in pairs.

(Відео) Голос: Речі відбуваються попарно.

I love to sleep.

Я люблю спати.

The sky is blue without clouds.

Небо блакитне й безхмарне.

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

РП: Вона продовжуватиме в такому ж дусі приблизно три-чотири години, і головне для неї не сказати все, що цільовий мовець може хотіти сказати, а головне покрити усі різноманітні комбінації звуків, які трапляються в мові. Що більше проговореного тексту у вас буде, то краще звучання голосу ви отримаєте. Як тільки в нас є ці записи, нам потрібно синтаксично розібрати їх на маленькі уривочки мовлення, одно- та двозвукові комбінації, іноді навіть цілі слова, з яких починається певний набір даних або база даних. Назвемо цю базу даних банком голосу. Сила банку голосу в тому, що завдяки йому ми можемо сказати будь-яке нове речення, наприклад, "Я люблю шоколад" - кожному треба бути здатним сказати це - пошукати в цій базі даних і знайти усі необхідні сегменти, щоб сказати це речення.

(Video) Voice: I love chocolate.

(Відео) Голос: Я люблю шоколад.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

РП: Це синтез голосу. Його називають конкатенативним синтезом, це те, що ми використовуємо. Це не новинка. Новітнім є те, як ми змушуємо його звучати як ця молода жінка.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

Це Саманта. Я познайомилася з нею, коли їй було дев'ять, і з того часу я зі своєю командою намагалися згенерувати для неї персоналізований голос. Спершу нам треба було знайти сурогатного донора, і тоді нам треба було, щоб Саманта спродукувала деякі речення. Вона здебільшого може продукувати певні голосні звуки, але нам цього достатньо, щоб витягти характеристики її джерела. Що відбувається далі, найкраще опише аналогія, проведена моєю донькою. Їй шість років. Вона називає це - змішувати кольори, щоб малювати голоси. Красиво сказано. Це саме так і є. Голос Саманти є наче концентрованим зразком червоного харчового барвника, який ми можемо додати в записи її сурогата, щоб отримати рожевий голос, як оцей.

(Video) Samantha: Aaaaaah.

(Відео) Саманта: Аааааа.

RP: So now, Samantha can say this.

РП: А зараз Саманта може сказати це.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

(Відео) Саманта: Цей голос є тільки для мене. Не можу дочекатися, щоб використати мій новий голос з моїми друзями.

RP: Thank you. (Applause)

РП: Дякую. (Оплески)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

Ніколи не забуду милу усмішку, яка осяяла її обличчя, коли вона почула цей голос вперше. Є мільйони людей у світі таких, як Саманта, мільйони, і ми тільки-но почали пробувати ґрунт. На теперішній момент ми маємо декілька сурогатних мовців із різних штатів США, які стали донорами голосів, і ми використовували їх для побудови наших кількох перших персоналізованих голосів. Але ще стільки роботи попереду. Для Саманти, її сурогат походить звідкись із середнього заходу, незнайомець, який подарував їй голос. Як науковець, я захоплююся тим, що виношу цю працю з лабораторії і в кінцевому результаті вношу в реальний світ, щоб вона могла справді впливати на світ. Хочу поділитися з вами тим, як я планую перенести цю роботу на той наступний рівень. Я уявляю собі цілий світ сурогатних донорів, з різних прошарків життя, різних статур, різного віку, які поєднуються у стремлінні дати людям голоси, які настільки ж яскраві, як їхні особистості. Для початку ми запустили ось цей вебсайт, VocaliD.org, як засіб для об'єднання тих, хто хоче приєднатися до нас як донори голосу, як досвідчені донори, у будь-який спосіб, щоб зробити цю візію реальністю.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

Кажуть, що донорство крові може рятувати життя. Що ж, донорство голосу може змінювати життя. Потрібно всього лиш кілька годин говоріння від нашого сурогатного мовця, і всього лише голосна від нашого цільового мовця, щоб створити унікальну вокальну ідентичність.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

Ось така наука стоїть за тим, що ми робимо. Хочу закінчити, повертаючись до людського фактору, що дійсно надихає на цю працю. Близько п'яти років тому ми згенерували наш найперший голос для маленького хлопчика на ім'я Вільям. Коли його мама вперше почула цей голос, вона сказала: "Ось так Вільям звучав би, якби міг говорити". І тоді я побачила, як Вільям друкував повідомлення на своєму пристрої. Мені стало цікаво, про що він думав? Уявіть, що ви носили чийсь чужий голос протягом дев'яти років і нарешті знайшли ваш власний голос. Уявіть це.

This is what William said: "Never heard me before."

Ось що сказав Вільям: "Ніколи не чув себе раніше".

Thank you.

Дякую.

(Applause)

(Оплески)

I'm going to start by playing you a sample of a voice that you may recognize.

Я почну з програвання зразка голосу, який ви можете упізнати.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(Звукозапис) Стівен Гокінґ: "Я думав, що було досить очевидним, що я мав на увазі".

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

(First voice)

(Перший голос)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(Video) Voice: Things happen in pairs.

(Відео) Голос: Речі відбуваються попарно.

I love to sleep.

Я люблю спати.

The sky is blue without clouds.

Небо блакитне й безхмарне.

(Video) Voice: I love chocolate.

(Відео) Голос: Я люблю шоколад.

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

(Video) Samantha: Aaaaaah.

(Відео) Саманта: Аааааа.

RP: So now, Samantha can say this.

РП: А зараз Саманта може сказати це.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

RP: Thank you. (Applause)

РП: Дякую. (Оплески)

This is what William said: "Never heard me before."

Ось що сказав Вільям: "Ніколи не чув себе раніше".

Thank you.

Дякую.

(Applause)

(Оплески)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice