Rupal Patel: Synthetic voices, as unique as fingerprints

I'd like to talk today about a powerful and fundamental aspect of who we are: our voice. Each one of us has a unique voiceprint that reflects our age, our size, even our lifestyle and personality. In the words of the poet Longfellow, "the human voice is the organ of the soul." As a speech scientist, I'm fascinated by how the voice is produced, and I have an idea for how it can be engineered. That's what I'd like to share with you.

ברצוני לדבר היום על היבט יסודי ועוצמתי של מי אנחנו: הקול שלנו. לכל אחד מאיתנו יש חתימת קול ייחודית המשקפת את גילנו, הגודל שלנו, אפילו את סגנון חיינו והאישיות שלנו. במילותיו של המשורר לונגפלו, "הקול האנושי הוא איבר של הנשמה." כמדענית דיבור, אני מוקסמת מאיך שמופק הקול, ויש לי רעיון כיצד ניתן להנדס אותו. זה מה שהייתי רוצה לשתף איתכם.

I'm going to start by playing you a sample of a voice that you may recognize.

אני אתחיל בלנגן לכם דגימה של קול שייתכן ותזהו.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(הקלטה): "הייתי חושב שזה היה די ברור למה התכוונתי."

Rupal Patel: That was the voice of Professor Stephen Hawking. What you may not know is that same voice may also be used by this little girl who is unable to speak because of a neurological condition. In fact, all of these individuals may be using the same voice, and that's because there's only a few options available. In the U.S. alone, there are 2.5 million Americans who are unable to speak, and many of whom use computerized devices to communicate. Now that's millions of people worldwide who are using generic voices, including Professor Hawking, who uses an American-accented voice. This lack of individuation of the synthetic voice really hit home when I was at an assistive technology conference a few years ago, and I recall walking into an exhibit hall and seeing a little girl and a grown man having a conversation using their devices, different devices, but the same voice. And I looked around and I saw this happening all around me, literally hundreds of individuals using a handful of voices, voices that didn't fit their bodies or their personalities. We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. So why then the same prosthetic voice? It really struck me, and I wanted to do something about this.

זה היה קולו של פרופסור סטיבן הוקינג. מה שייתכן שאינכם יודעים זה שאותו הקול עשוי להיות בשימוש של הילדה הקטנה הזו שאינה מסוגלת לדבר בגלל מצב נוירולוגי. למעשה, כל האנשים הללו עשויים להשתמש באותו הקול, וזאת מכיוון שיש רק מספר אפשרויות מצומצם. בארה"ב לבדה יש 2.5 מליון אמריקאים שאינם מסוגלים לדבר, ורבים מהם משתמשים במכשירים ממוחשבים על מנת לתקשר. כלומר, מדובר על מליוני אנשים בעולם המשתמשים בקולות גנריים, כולל פרופסור הוקינג, המשתמש בקול במבטא אמריקאי. חוסר האינדיבידואליזם בקולות סינטתיים היה ממש ברור כאשר הייתי בועידה לטכנולוגיות נגישות לפני מספר שנים, ואני זוכרת שנכנסתי לאולם התצוגה וראיתי ילדה קטנה ואדם מבוגר מנהלים שיחה תוך שימוש במכשירי העזר שלהם, מכשירים שונים, אבל אותו הקול. והיבטתי מסביבי וראיתי את זה קורה מכל הכיוונים, למעשה מאות אנשים משתמשים בקומץ של קולות, קולות שלא התאימו לגוף שלהם או לאישיות שלהם. לא נעלה על דעתנו להתאים לילדה קטנה פרוטזה (רגל תותבת) של אדם מבוגר. אז מדוע אם כך אותו "תותב קול"? זה המם אותי, ורציתי לעשות משהו בנוגע לזה.

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

אנגן לכם עכשיו דגימה של מישהו... של שני אנשים למעשה, שיש להם הפרעת דיבור חמורה. אני רוצה שתקשיבו כיצד הם נשמעים. הם אומרים את אותו המשפט.

(First voice)

(קול ראשון)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(קול שני) בטח לא הבנתם מה הם אמרו, אבל אני מקווה ששמעתם את הזהות הקולית הייחודית שלהם.

So what I wanted to do next is, I wanted to find out how we could harness these residual vocal abilities and build a technology that could be customized for them, voices that could be customized for them. So I reached out to my collaborator, Tim Bunnell. Dr. Bunnell is an expert in speech synthesis, and what he'd been doing is building personalized voices for people by putting together pre-recorded samples of their voice and reconstructing a voice for them. These are people who had lost their voice later in life. We didn't have the luxury of pre-recorded samples of speech for those born with speech disorder. But I thought, there had to be a way to reverse engineer a voice from whatever little is left over.

אז הדבר הבא שרציתי לעשות הוא רציתי לברר כיצד נוכל לרתום את שארית היכולות הקוליות הללו ולבנות טכנולוגיה שתוכל להיות מותאמת להם, קולות שיוכלו להיות מותאמים להם. אז פניתי לשותף שלי, טים בנל. ד"ר בנל הוא מומחה בסינטזה של דיבור, ומה שהוא עשה זה לבנות קולות אישיים לאנשים על ידי צירוף דגימות מוקלטות של קולם ושחזור של קול עבורם. אלו אנשים שאיבדו את קולם מאוחר יותר בחייהם. לנו לא היתה את הפריווילגיה שבזמינות של דגימות דיבור מוקלטות עבור אלו שנולדו עם הפרעת דיבור. אבל חשבתי - חייבת להיות דרך להנדס לאחור קול מהמעט שנותר.

So we decided to do exactly that. We set out with a little bit of funding from the National Science Foundation, to create custom-crafted voices that captured their unique vocal identities. We call this project VocaliD, or vocal I.D., for vocal identity.

אז החלטתנו לעשות בדיוק את זה. יצאנו לדרך עם מעט מימון מהקרן הלאומית למדעים, ליצור קולות בהתאמה אישית שיכילו את הזהות הקולית הייחודית שלהם. אנו קוראים לפרוייקט הזה "ווקאלי-די" או "ווקאל איי-די" עבור "זהות קולית".

Now before I get into the details of how the voice is made and let you listen to it, I need to give you a real quick speech science lesson. Okay? So first, we know that the voice is changing dramatically over the course of development. Children sound different from teens who sound different from adults. We've all experienced this. Fact number two is that speech is a combination of the source, which is the vibrations generated by your voice box, which are then pushed through the rest of the vocal tract. These are the chambers of your head and neck that vibrate, and they actually filter that source sound to produce consonants and vowels. So the combination of source and filter is how we produce speech. And that happens in one individual.

ובכן, לפני שארד לפרטים כיצד הקול נוצר ואתן לכם להקשיב לו, אני צריכה להעביר לכם שעור זריז במדעי הדיבור. בסדר? ראשית, אנו יודעים שהקול משתנה באופן דרמטי לאורך תהליך ההתפתחות. ילדים נשמעים שונה מבני עשרה שנשמעים שונה ממבוגרים. כולנו התנסנו בזה. עובדה מספר שתיים היא שדיבור היא שילוב של המקור, שהוא התנודות שמופקות על ידי תיבת הקול שלכם, שלאחר מכן נדחפות דרך שאר מערכת הקול. אלו החללים של הראש והצוואר שרועדים, והם למעשה מסננים את מקור הצליל על מנת להפיק עיצורים ותנועות. אז השילוב של מקור ומסנן זה הדרך בה אנו מפיקים דיבור. וזה קורה אצל כל אחד.

Now I told you earlier that I'd spent a good part of my career understanding and studying the source characteristics of people with severe speech disorder, and what I've found is that even though their filters were impaired, they were able to modulate their source: the pitch, the loudness, the tempo of their voice. These are called prosody, and I've been documenting for years that the prosodic abilities of these individuals are preserved. So when I realized that those same cues are also important for speaker identity, I had this idea. Why don't we take the source from the person we want the voice to sound like, because it's preserved, and borrow the filter from someone about the same age and size, because they can articulate speech, and then mix them? Because when we mix them, we can get a voice that's as clear as our surrogate talker -- that's the person we borrowed the filter from— and is similar in identity to our target talker. It's that simple. That's the science behind what we're doing.

ובכן אמרתי לכם קודם לכן שהשקעתי חלק ניכר מהקריירה שלי בהבנה ולימוד של מאפייני המקור של אנשים עם הפרעות דיבור חמורות, ומה שגיליתי שלמרות שהמסננים שלהם היו פגומים, הם הצליחו לאפנן את המקור שלהם: את גובה הצליל, העוצמה והקצב של הקול שלהם. מרכיבים אלו נקראים פרוזודיה, ותיעדתי במשך שנים שהיכולות הפרוזודיות של אנשים אלו משתמרות. אז כאשר הבנתי שאותם הסימנים הללו הם גם חשובים לזהות הדובר, היה לי רעיון כזה. מדוע שלא ניקח את המקור מהאדם שנרצה שהקול יישמע כמותו, מכיוון שהוא משתמר, ונשאיל את המסנן ממישהו בערך באותו גיל וגודל, מכיוון שהם יכולים להגות דיבור ואז נערבב אותם? כי כאשר אנחנו מערבבים אותם, אנו יכולים לקבל קול שהוא ברור כמו הדובר החליפי שלנו -- זה האדם שהשאלנו ממנו את המסנן -- והוא דומה בזהות לדובר היעד שלנו. זה ממש פשוט. זה המדע העומד מאחורי מה שאנחנו עושים.

So once you have that in mind, how do you go about building this voice? Well, you have to find someone who is willing to be a surrogate. It's not such an ominous thing. Being a surrogate donor only requires you to say a few hundred to a few thousand utterances. The process goes something like this.

אז כשאתם מבינים את זה, איך מתקדמים בבניית קולות? ובכן, עליכם למצוא מישהו שמוכן להיות דובר חליפי. זה לא סיפור. להיות תורם חליפי דורש מכם לומר רק מאות ספורות עד אלפים ספורים של ביטויים. התהליך קורה באופן כמו זה:

(Video) Voice: Things happen in pairs.

"דברים קורים בצמדים"

I love to sleep.

"אני אוהב לישון"

The sky is blue without clouds.

"השמיים כחולים ללא עננים"

RP: Now she's going to go on like this for about three to four hours, and the idea is not for her to say everything that the target is going to want to say, but the idea is to cover all the different combinations of the sounds that occur in the language. The more speech you have, the better sounding voice you're going to have. Once you have those recordings, what we need to do is we have to parse these recordings into little snippets of speech, one- or two-sound combinations, sometimes even whole words that start populating a dataset or a database. We're going to call this database a voice bank. Now the power of the voice bank is that from this voice bank, we can now say any new utterance, like, "I love chocolate" -- everyone needs to be able to say that— fish through that database and find all the segments necessary to say that utterance.

ובכן, היא עומדת להמשיך כך במשך שלוש עד ארבע שעות והרעיון אינו שתאמר כל דבר שהיעד ירצה לומר, אלא הרעיון לכסות את כל הצירופים השונים של הצלילים שקיימים בשפה. ככל שיש לך יותר דיבור, כך תשתפר איכות הקול שתקבל. כאשר יש לך את ההקלטות הללו, מה שאנו צריכים לעשות הוא שעלינו לנתח ולפרק את ההקלטות הללו לקטעים קטנים של דיבור, צירוף של צליל אחד או שניים, לפעמים גם מילים שלמות שמתחילים למלא בסיס נתונים. נקרא לבסיס הנתונים הזה בנק קול. ובכן, העוצמה של בנק הקול היא שמבנק קול זה, אנו יכולים כעת לומר כל ביטוי חדש, כמו "אני אוהב שוקולד" -- כולם צריכים להיות מסוגלים לומר את זה -- חפור בבסיס הנתונים הזה ותמצא את כל המקטעים הדרושים לומר את הביטוי הזה.

(Video) Voice: I love chocolate.

"אני אוהב שוקולד"

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

אז זה סינטזה של דיבור. היא נקראת סינטזה משרשרת, ובזה אנו משתמשים. החדשנות אינה בזה. מה שחדשני הוא איך אנו גורמים לזה להשמע כמו האישה הצעירה הזו.

This is Samantha. I met her when she was nine, and since then, my team and I have been trying to build her a personalized voice. We first had to find a surrogate donor, and then we had to have Samantha produce some utterances. What she can produce are mostly vowel-like sounds, but that's enough for us to extract her source characteristics. What happens next is best described by my daughter's analogy. She's six. She calls it mixing colors to paint voices. It's beautiful. It's exactly that. Samantha's voice is like a concentrated sample of red food dye which we can infuse into the recordings of her surrogate to get a pink voice just like this.

זאת סמנטה. פגשתי אותה כשהיתה בת תשע, ומאז הצוות שלי ואני מנסים לבנות לה קול אישי. ראשית היה עלינו למצוא תורם קול חליפי, ואז היה עלינו לגרום לסמנטה להפיק מספר ביטויים. מה שהיא מסוגלת להפיק זה בעיקר צלילים דמויי תנועות אבל זה מספיק לנו על מנת למצות את מאפייני מקור הקול שלה. הדבר הבא שקורה מתואר הכי טוב על ידי אנלוגיה של הבת שלי. היא בת שש. היא קוראת לזה לערבב צבעים כדי לצבוע קולות. זה יפהפה. זה בדיוק זה. הקול של סמנטה הוא כמו דגימה מרוכזת של צבע מאכל אדום שאנו יכולים להזריק לתוך ההקלטות של הקול החליפי שלה על מנת לקבל קול ורוד, בדיוק ככה:

(Video) Samantha: Aaaaaah.

סמנטה: "אאאאאאה."

RP: So now, Samantha can say this.

אז עכשיו סמנטה יכולה לומר את זה.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

סמנטה: "הקול הזה הוא רק בשבילי. אני לא יכולה לחכות להשתמש בקול החדש שלי עם החברים שלי."

RP: Thank you. (Applause)

תודה רבה. (מחיאות כפיים)

I'll never forget the gentle smile that spread across her face when she heard that voice for the first time. Now there's millions of people around the world like Samantha, millions, and we've only begun to scratch the surface. What we've done so far is we have a few surrogate talkers from around the U.S. who have donated their voices, and we have been using those to build our first few personalized voices. But there's so much more work to be done. For Samantha, her surrogate came from somewhere in the Midwest, a stranger who gave her the gift of voice. And as a scientist, I'm so excited to take this work out of the laboratory and finally into the real world so it can have real-world impact. What I want to share with you next is how I envision taking this work to that next level. I imagine a whole world of surrogate donors from all walks of life, different sizes, different ages, coming together in this voice drive to give people voices that are as colorful as their personalities. To do that as a first step, we've put together this website, VocaliD.org, as a way to bring together those who want to join us as voice donors, as expertise donors, in whatever way to make this vision a reality.

לעולם לא אשכח את החיוך העדין שנפרש על פניה כאשר שמעה את הקול הזה בפעם הראשונה. ובכן, יש מליוני אנשים בכל העולם כמו סמנטה. מליונים. ורק התחלנו לגלות את קצה הקרחון. מה שעשינו עד כה זה שיש לנו כמה דוברים חליפיים מרחבי ארה"ב שתרמו את קולם, והשתמשנו באלה לבנות את מספר הקולות המותאמים אישית הראשונים שלנו אבל יש עוד כל כך הרבה עבודה לעשות. עבור סמנטה, הדובר החליפי שלה הגיע מאיפשהו במערב המרכזי (של ארה"ב) - זר שנתן לה את מתנת הקול. וכמדענית אני כל כך נרגשת לקחת את העבודה הזו אל מחוץ למעבדה וסוף סוף אל העולם האמיתי על מנת שתהיה לה השפעה אמיתית. הדבר הבא שאני רוצה לחלוק איתכם זה איך אני חוזה שהעבודה הזו תעבור לשלב הבא. אני מדמיינת עולם שלם של תורמי קול חליפי מכל תחומי החיים, בגדלים שונים, גילאים שונים, באים יחדיו למהלך הקולי הזה לתת לאנשים קולות שהם מגוונים כמו האישיות השונה של כל אחד מהם. על מנת לעשות את זה, כצעד ראשון הקמנו את האתר הזה: VocaliD.org כדרך לכנס את כל אלה שרוצים להצטרף אלינו כתורמי קול, כתורמי מומחיות, בכל דרך שתאפשר להפוך את החזון הזה למציאות.

They say that giving blood can save lives. Well, giving your voice can change lives. All we need is a few hours of speech from our surrogate talker, and as little as a vowel from our target talker, to create a unique vocal identity.

אומרים שתרומת דם מצילה חיים. ובכן, לתת את הקול שלך יכול לשנות חיים. כל מה שאנו צריכים זה מספר שעות של דיבור מתורם הקול החליפי שלנו, ורק צליל תנועה מדובר היעד, על מנת ליצור זהות קולית ייחודית.

So that's the science behind what we're doing. I want to end by circling back to the human side that is really the inspiration for this work. About five years ago, we built our very first voice for a little boy named William. When his mom first heard this voice, she said, "This is what William would have sounded like had he been able to speak." And then I saw William typing a message on his device. I wondered, what was he thinking? Imagine carrying around someone else's voice for nine years and finally finding your own voice. Imagine that.

אז זה המדע שמאחורי מה שאנו עושים. אני רוצה לסיים בחזרה לסיפור האנושי שהוא בעצם ההשראה לעבודה הזו. לפני כחמש שנים בנינו את הקול הראשון בהחלט שלנו לנער צעיר בשם וויליאם. כשאימו שמעה את הקול שלו בפעם הראשונה, היא אמרה: "זה איך שוויליאם היה נשמע אם הוא היה יכול לדבר." ואז ראיתי את וויליאם מקליד מסר במכשיר שלו. תהיתי, מה הוא חושב? דמיינו שאתם נושאים קול של מישהו אחר במשך תשע שנים וסוף סוף מוצאים את הקול שלכם. דמיינו את זה.

This is what William said: "Never heard me before."

זה מה שוויליאם אמר: "אף פעם לא שמעתי את עצמי לפני כן."

Thank you.

תודה.

(Applause)

(מחיאות כפיים)

I'm going to start by playing you a sample of a voice that you may recognize.

אני אתחיל בלנגן לכם דגימה של קול שייתכן ותזהו.

(Recording) Stephen Hawking: "I would have thought it was fairly obvious what I meant."

(הקלטה): "הייתי חושב שזה היה די ברור למה התכוונתי."

I'm going to play you now a sample of someone who has, two people actually, who have severe speech disorders. I want you to take a listen to how they sound. They're saying the same utterance.

(First voice)

(קול ראשון)

(Second voice) You probably didn't understand what they said, but I hope that you heard their unique vocal identities.

(קול שני) בטח לא הבנתם מה הם אמרו, אבל אני מקווה ששמעתם את הזהות הקולית הייחודית שלהם.

(Video) Voice: Things happen in pairs.

"דברים קורים בצמדים"

I love to sleep.

"אני אוהב לישון"

The sky is blue without clouds.

"השמיים כחולים ללא עננים"

(Video) Voice: I love chocolate.

"אני אוהב שוקולד"

RP: So that's speech synthesis. It's called concatenative synthesis, and that's what we're using. That's not the novel part. What's novel is how we make it sound like this young woman.

(Video) Samantha: Aaaaaah.

סמנטה: "אאאאאאה."

RP: So now, Samantha can say this.

אז עכשיו סמנטה יכולה לומר את זה.

(Video) Samantha: This voice is only for me. I can't wait to use my new voice with my friends.

סמנטה: "הקול הזה הוא רק בשבילי. אני לא יכולה לחכות להשתמש בקול החדש שלי עם החברים שלי."

RP: Thank you. (Applause)

תודה רבה. (מחיאות כפיים)

This is what William said: "Never heard me before."

זה מה שוויליאם אמר: "אף פעם לא שמעתי את עצמי לפני כן."

Thank you.

תודה.

(Applause)

(מחיאות כפיים)

Rupal Patel: Synthetic voices, as unique as fingerprints

Rupal Patel: Synthetic voices, as unique as fingerprints

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice

Related talks

Ellen Jorgensen: Biohacking -- you can do it, too

Julian Treasure: Shh! Sound health in 8 steps

Craig Venter: On the verge of creating synthetic life

Rébecca Kleinberger: Why you don't like the sound of your own voice

Shaylin Schundler: Why does your voice change as you get older?

Roger Ebert: Remaking my voice