Christian Rudder: Inside OKCupid: The math of online dating

Hello, my name is Christian Rudder, and I was one of the founders of OkCupid. It's now one of the biggest dating sites in the United States. Like most everyone at the site, I was a math major, As you may expect, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two people should go on a date. We built our entire business around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an algorithm is just a systematic, step-by-step way to solve a problem. It doesn't have to be fancy at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are algorithms even important? Why does this lesson even exist? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step processes. A computer without an algorithm is basically an expensive paperweight. And since computers are such a pervasive part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they wanted from someone else. That worked really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book burning or flag burning?" might reveal more about someone than their taste in movies. And it doesn't make sense to weigh all things equally, so we added one final data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else -- your potential match -- to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is based on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you answered "very organized," and you'd like someone else to answer "very organized," and the question is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to ideas like "somewhat important" and "very important," because computers need everything in numbers. We at OkCupid decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's worth 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that wrong, so B's answers were 50 out of 51 possible points. That's 98% satisfactory. Pretty good. The second question the algorithm looks at is: How much did you satisfy B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 -- you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 percent satisfactory to B. That's not bad. The final step is to take these two match percentages and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this sample, is only 2, we have: match percentage equals the square root of 98 percent times 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm multiply, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different properties. In other words, it's perfect for romantic matching. You've got wide ranges and you've got tons of different data points, like I said, about movies, politics, religion -- everything. Intuitively, too, this makes sense. Two people satisfying each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small number of questions, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the steps we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in simple, mathematical ways. This, the ability to take real-world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer. If you learn the language, you can go out and tell your stories. I hope this will help you do that.

שלום, שמי כריסטיאן רודר, ואני הייתי אחד המקימים של OK קופידון. כרגע זה אחד מאתרי ההכרויות הגדולים ביותר בארצות הברית. כמו כמעט כולם באתר, הייתי בוגר מתמטיקה, וכמו שאתם יכולים לצפות, אנחנו ידועים בגישה האנליטית שיש לנו לאהבה. אנחנו קוראים לה אלגוריתם ההתאמה שלנו. בעיקרון אלגוריתם ההתאמה של OK קופידון עוזר לנו להחליט אם שני אנשים צריכים לצאת לפגישה. בנינו את כל העסק סביב זה. עכשיו אלגוריתם זו מילה מפוארת, ואנשים אוהבים להגיד אותה כאילו זה דבר גדול כזה, אבל, למעשה, אלגוריתם הוא דרך, שיטתית של שלב אחר שלב לפתור בעיה. זה לא חייב להיות מפואר בכלל. כאן, בשיעור הזה, אני עומד להסביר איך הגענו לאלגוריתם היחודי שלנו כך שתראו איך זה נעשה. עכשיו, למה אלגוריתמים בכלל חשובים? למה השיעור הזה בכלל קיים? ובכן, שימו לב לביטוי אחד חשוב למעלה שהשתמשתי בו: יש דרך של שלב אחר שלב לפתור את הבעיה, וכמו שאתם בוודאי יודעים, מחשבים מצטיינים בתהליכים של שלב אחר שלב. מחשב בלי אלגוריתם הוא בעיקרון משקולת נייר יקרה. ומאחר ומחשבים הם דבר כה נפוץ בחיים היום יומיים, אלגוריתמים הם בכל מקום. המתמטיקה מאחורי אלגוריתם ההתאמה של OK קופידון היא פשוטה להפליא. זה פשוט קצת חיבור, כפל, ומעט שורשים מרובעים. החלק הקשה בלתכנן את זה, עם זאת, היה להבין איך לקחת משהו מסתורי, משיכה אנושית, ולפרק את זה לחלקים שמחשב יכול לעבוד איתם. ובכן, הדבר הראשון שהיינו צריכים כדי לשדך אנשים היה מידע, משהו לאלגוריתם לעבוד עליו. הדרך הטובה ביותר כדי להשיג מידע מאנשים במהירות הוא פשוט לבקש אותו. אז, החלטנו שOK קופידון ישאל אנשים שאלות, דברים כמו, "אתה רוצה ילדים יום אחד?" ו "באיזו תכיפות אתה מצחצח את השיניים?", "אתה אוהב סרטים מפחידים?" ודברים גדולים כמו "אתה מאמין באלוהים?" עכשיו, הרבה מהשאלות טובות להתאמת תחומי עניין, זה כששני האנשים עונים אותו הדבר. לדוגמה, שני אנשים שאוהבים סרטים מפחידים הם כנראה התאמה טובה יותר מאדם אחד שאוהב ואדם שני שלא. אבל מה עם שאלות כמו, "אתה אוהב להיות במרכז העניינים?" אם שני האנשים ביחסים אומרים כן לזה, אז הם יהיו בבעיה גדולה. הבנו את זה די בהתחלה, ואז החלטנו שאנחנו צריכים יותר מידע מכל שאלה. היינו צריכים לבקש מאנשים לפרט לא רק את התשובה שלהם, אלא את התשובה שהם רצו מהאדם השני. זה עבד ממש טוב, אבל היינו צריכים עוד מימד. כמה מהשאלות מספרות לכם על האדם יותר מאחרות. לדוגמה, שאלה על פוליטיקה, משהו כמו, "מה גרוע יותר: שריפת ספרים או שריפת דגלים?" אולי תגלה יותר על מישהו מהטעם שלהם בסרטים. וזה לא הגיוני לשקלל את כל הדברים במשקל זהה, אז אנחנו צריכים להוסיף עוד נקודת מידע אחרונה. לכל דבר שOK קופידון שואל אתכם, יש לכם אפשרות להגיד לנו את התפקיד שזה משחק בחייכם, וזה נע מלא רלוונטי להכרחי. אז עכשיו, לכל שאלה, יש לנו שלושה דברים לאלגוריתם שלנו: ראשית, את התשובה שלכם; שנית, איך הייתם רוצים שמישהו אחר, ההתאמה הפוטנציאלית שלכם, יענה; ושלישית, כמה השאלה בכלל חשובה לכם. עם כל המידע הזה, OK קופידון יכול להבין כמה שני אנשים יכולים להתאים. האלגוריתם מעבד את המספרים ונותן לנו תוצאה. כדוגמה מעשית, בואו נראה איך נתאים אתכם לאדם אחר, בואו נקרא לו, "ב". התאמת האחוזים שלכם עם ב מבוססת על שאלות ששניכם עניתם. בואו נקרא לסט הזה של שאלות משותפות, "ס". כדוגמה ממש פשוטה, נשתמש בסט קטן "ס" עם רק שתי שאלות משותפות ונחשב התאמה לפי זה. הנה שתי שאלות הדוגמה שלנו. הראשונה, נגיד, היא," כמה מבולגן אתה?" והתשובות האפשריות הן מאוד מבולגן, ממוצע, ומאוד מסודר. ובואו נגיד שעניתם "מאוד מסודר," והייתם רוצים מישהו אחר שענה "מאוד מסודרים," והשאלה היא מאוד חשובה לכם. בעיקרון אתם משוגעים לסדר. אתם מסודרים, אתם רוצים מישהו אחר שיהיה מסודר, וזהו זה. ובואו נגיד ש"ב" הוא מעט שונה. הוא ענה מאוד מסודר על עצמו, אבל ממוצע בסדר לו כתשובה של מישהו אחר, והשאלה רק מעט חשובה לו. בואו נביט בשאלה השניה, היא זאת מהדוגמה הקודמת שלנו: "האם אתם אוהבים להיות מרכז העניינים?" התשובות הן רק כן ולא. עכשיו אתם עניתם "לא," ורציתם שגם השני יענה "לא," והשאלה היא רק מעט חשובה לכם. עכשיו "ב", ענה "כן," והוא רוצה שהאחר יענה "לא," מפני שהוא רוצה את אור הזרקורים עליו, והשאלה היא מעט חשובה לו. אז, בואו ננסה לחשב את כל זה. השלב הראשון שלנו הוא, מאחר ואנחנו משתמשים במחשבים כדי לעשות את זה, אנחנו צריכים לשייך ערכים מספריים לרעיונות כמו "מעט חשוב" ומאוד חשוב" מפני שמחשבים צריכים הכל במספרים. אנחנו ב OK קופידון החלטנו על המדד הבא: לא רלוונטי שווה 0, קצת חשוב זה 1, די חשוב שווה 10, מאוד חשוב זה 50, והכרחי לחלוטין זה 250. אחרי זה, האלגוריתם עושה שני חישובים פשוטים. הראשון הוא כמה התשובות של "ב" מספקות אתכם, שזה אומר, כמה נקודות אפשריות "ב" קיבל במדד שלכם? ובכן, אמרתם שהתשובה של "ב" לשאלה הראשונה על סדר היא מאוד חשובה לכם. היא שווה 50 נקודות ו"ב" קלע אליה. השאלה השניה שווה רק 1 מפני שאמרתם שזה רק קצת חשוב לכם, ו"ב" לא קלע לזה. אז התשובות של "ב" היו 50 מתוך 51 נקודות אפשריות. זה סיפוק של 98%. זה די טוב. והשאלה השניה שהאלגורתם בודק זה כמה אתם מספקים את "ב". ובכן, "ב" נתן נקודה אחת לתשובה שלכם לשאלת הסדר ו 10 על התשובה שלכם לשניה. מאלה, 11, זה 1 ועוד 10, אתם הרווחתם 10, אתם סיפקתם אחד את השני בשאלה השניה. אז התשובה שלכם היתה 10 מתוך 11 שזה שווה ל 91% סיפוק ל "ב". זה לא רע. השלב האחרון הוא לקחת את שתי התאמות האחוז האלו ולקבל מספר אחד לשניכם. כדי לעשות את זה, האלגוריתם מכפיל את התוצאות שלכם, אז הוא לוקח את השורש ה"n", כש "n" הוא מספר השאלות. מפני ש"ס", שזה מספר השאלות, בדוגמה הזו, הוא רק 2, יש לנו אחוז התאמה ששווה לשורש ריבועי של 98% כפול 91%. זה שווה 94%. ה94% האלה הם אחוז ההתאמה שלכם ל"ב". זה ביטוי מתמטי של כמה מאושרים אתם תהיו אחד עם השני בהתבסס על מה שאנחנו יודעים. עכשיו, למה האלגוריתם מכפיל ולא ממצע את שתי התוצאות יחד ומחשב שורש ריבועי? בכללי, הנוסחה נקראת הממוצע הגאומטרי, שזו דרך מעולה לשלב ערכים שיש להם טווח רחב ומייצגים תכונות שונות מאוד. במילים אחרות, זה מושלם להתאמה רומנטית. יש לכם טווח רחב ויש לכם המון נקודות מידע, כמו שאמרתי, על סרטים, על פוליטיקה, על דת, על הכל. באופן אינטואיטיבי זה הגיוני. שני אנשים שמספקים אחד את השני 50% צריכים להיות התאמה טובה יותר מאלה שמספקים אחד את השני 0 ו 100, מפני שחיבה צריכה להיות הדדית. אחרי הוספת תיקון קטן למרווח טעות, במקרה שיש לנו מספר קטן של שאלות, כמו שאנחנו עושים בדוגמה הזו, אנחנו מוכנים לצאת לדרך. כל פעם שOK קופידון מתאים שני אנשים, הוא עובר את השלבים שהראנו. ראשית הוא אוסף מידע על התשובות שלכם, אז הוא משווה את הבחירות שלכם וההעדפות שלכם לאנשים האחרים בדרכים מתמטיות פשוטות. היכולת לקחת תופעה מהעולם האמיתי ולהפוך אותה למשהו שמיקרומעבד יכול להבין, היא, אני חושב, היכולת הכי חשובה שיכולה להיות למישהו היום. כמו שאתם משתמשים במשפטים כדי לספר סיפור לאדם, אתם משתמשים באלגוריתמים לספר סיפור למחשב. אם תלמדו את השפה, אתם יכולים לצאת ולספר את הסיפורים שלכם. אני מקווה שזה יעזור לכם לעשות את זה.

Christian Rudder: Inside OKCupid: The math of online dating

Christian Rudder: Inside OKCupid: The math of online dating

Related talks

Iseult Gillespie: Why should you read "A Midsummer Night's Dream?"

Helen Fisher: Why we love, why we cheat

Natalya St. Clair: The unexpected math behind Van Gogh's "Starry Night"

Priyanka Jain: How to make applying for jobs less painful

Amy Webb: How I hacked online dating

Dennis Wildfogel: How big is infinity?

Related talks

Iseult Gillespie: Why should you read "A Midsummer Night's Dream?"

Helen Fisher: Why we love, why we cheat

Natalya St. Clair: The unexpected math behind Van Gogh's "Starry Night"

Priyanka Jain: How to make applying for jobs less painful

Amy Webb: How I hacked online dating

Dennis Wildfogel: How big is infinity?