Christian Rudder: Inside OKCupid: The math of online dating

Hello, my name is Christian Rudder, and I was one of the founders of OkCupid. It's now one of the biggest dating sites in the United States. Like most everyone at the site, I was a math major, As you may expect, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two people should go on a date. We built our entire business around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an algorithm is just a systematic, step-by-step way to solve a problem. It doesn't have to be fancy at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are algorithms even important? Why does this lesson even exist? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step processes. A computer without an algorithm is basically an expensive paperweight. And since computers are such a pervasive part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they wanted from someone else. That worked really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book burning or flag burning?" might reveal more about someone than their taste in movies. And it doesn't make sense to weigh all things equally, so we added one final data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else -- your potential match -- to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is based on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you answered "very organized," and you'd like someone else to answer "very organized," and the question is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to ideas like "somewhat important" and "very important," because computers need everything in numbers. We at OkCupid decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's worth 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that wrong, so B's answers were 50 out of 51 possible points. That's 98% satisfactory. Pretty good. The second question the algorithm looks at is: How much did you satisfy B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 -- you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 percent satisfactory to B. That's not bad. The final step is to take these two match percentages and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this sample, is only 2, we have: match percentage equals the square root of 98 percent times 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm multiply, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different properties. In other words, it's perfect for romantic matching. You've got wide ranges and you've got tons of different data points, like I said, about movies, politics, religion -- everything. Intuitively, too, this makes sense. Two people satisfying each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small number of questions, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the steps we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in simple, mathematical ways. This, the ability to take real-world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer. If you learn the language, you can go out and tell your stories. I hope this will help you do that.

안녕하세요, 제 이름은 크리스티안 러더 (Christian Rudder) 입니다. 저는 오케이 큐피드 (OK Cupid) 의 설립자 중 한사람이며, 오케이 큐피드는 현재 미국에서 가장 큰 만남 주선 회사 중 하나입니다. 사이트 내의 거의 모든 사람들이 그러하듯, 저는 수학 전공을 전공했고, 여러분이 기대하듯, 우리는 사랑을 찾는 데에 대한 분석적인 접근으로 잘 알려져 있습니다. 우리는 그것을 매칭 알고리즘이라고 부릅니다. 기본적으로 오케이 큐피드의 매칭 알고리즘은 두 사람이 데이트를 해야 할 것인지 결정하도록 도와줍니다. 저희의 모든 사업은 이 알고리즘을 기초로 만들어졌습니다. 알고리즘은 뭔가 특별한 단어이고, 사람들은 대단한 것인양 대화에 즐겨 사용합니다. 하지만 실제로 알고리즘은 단지 문제를 해결하는 체계적이고 단계적인 방법일 뿐입니다. 따라서 그것은 결코 화려할 필요가 없어요. 이번 강의에게 우리가 어떻게 우리만의 알고리즘에 도달하게 되었는지 알려 드리겠습니다. 그러면 어떻게 구축된 것인지 이해하실 거에요. 이제 알고리즘이 왜 그렇게 중요한지 아시겠어요? 왜 이런 강연이 있을까요? 글쎄요, 제가 앞서 썼던 표현 중 중요한 문구를 잘 생각해 보세요. 알고리즘은 문제를 단계적으로 해결하는 방식입니다. 그리고 여러분이 아마도 알고 계시듯이, 컴퓨터는 단계별 과정에 뛰어납니다. 알고리즘없는 컴퓨터는 기본족으로 값비싼 타자기에 불과합니다. 컴퓨터가 일생 생활의 모든 영역에 스며들었기 때문에, 알고리즘은 어디에나 있습니다. 오케이 큐피드의 매칭 알고리즘의 배경이 되는 수학은 매우 간단합니다. 그것은 단지 덧셈, 곱셈, 제곱근을 사용합니다. 그러나 알고리즘을 구성하는 데 어려운 부분은 신비스러운 부분인 사람 간의 끌림을 어떻게 밝혀내는가 하는 것이며, 또 컴퓨터가 작업할 수 있도록 구성 요소들로 나누는가 입니다. 사람들을 연결시키기 위해 처음으로 필요했던 것은 알고리즘으로 작업할 수 있는 어떤 형태의 자료였습니다. 사람들로부터 자료를 수집하는 최고의 방법은 그들에게 물어보는 것이었어요. 그래서 우리는 오케이 큐피트 사용자들에게 질문을 하기로 했죠. "언젠가 아이를 갖기를 원하세요?" 같은 것들이나 "얼마나 자주 양치질을 하나요?" "공포 영화를 좋아하세요?" 또는 "신을 믿으세요?" 와 같은 중요한 질문들이었습니다. 굉장히 많은 질문들이 선호 사항과 선호 사항 사이를 연결시켜주는 데 유용합니다. 그리고 바로 그 때가 사람들이 동일한 방식으로 대답을 할 때죠. 예를 들어, 공포 영화를 좋아하는 두 사람이 아마 공포 영화를 좋아하는 한 사람과 그렇지 않은 다른 한 사람 보다 더 잘 어울릴 거에요. 그러면 다음과 같은 질문에는 어떻게 답하는지 볼까요. "당신은 관심의 중심이 되고 싶은가요?" 만약 연인관계에 있는 두 사람 모두 이 질문에 "네" 라고 대답한다면, 두 사람은 곧 큰 문제에 직면하게 될 거에요. 우리는 이것을 일찍 알았고, 각각의 질문에 대해 정보가 더 필요하고 이를 모아야겠다고 생각했습니다. 우리는 사람들에게 자신의 대답을 구체화 할 뿐 아니라, 다른 사람들로부터 그들이 원하는 대답을 구체화 하도록 요청했습니다. 그런 노력들은 효과가 있었습니다. 그러나 우리는 한 가지 다른 차원이 필요했어요. 어떤 질문들은 다른 것들보다 한 사람에 대해 훨씬 더 많은 것을 알려줍니다. 예를 들어, "책을 태우는 것과 깃발을 태우는 것 가운데 어느 것이 더 나쁜가요?" 와 같은 정치적 질문은 개인의 영화에 대한 취향보다는 그들 자신에 관해 더 알려줄 수도 있거든요. 그리고 모든 것들을 똑같은 비중으로 다룬다는 것은 말이 되지 않죠. 그래서 우리는 마지막으로 한 가지의 정보가 더 필요했습니다. 오케이 큐피드가 묻는 모든 것들에 대해 사람들은 각자의 삶에서 그런 질문들이 어떤 역할을 하는지 말할 기회를 갖게 됩니다. 이것은 무의미한 것부터 필수적인 것까지 다양합니다. 그래서 지금 우리는 각 질문을 통해 우리의 알고리즘에 관한 세 가지 사실을 파악하고 있습니다: 첫째, 여러분의 대답. 둘째, 여러분이 다른 누군가 즉, 여러분의 잠재적 상대가 어떻게 대답하길 원하는지. 셋째, 그 질문이 여러분에게 얼마나 중요한지 하는 것이에요. 이런 정보들로, 오케이 큐피드는 두 사람이 얼마나 잘 어울리는지 알아냅니다. 알고리즘은 수치들을 분석하여 결과를 알려줍니다. 실제적인 예로, 우리가 어떻게 여러분을 연결시켜 드리는지 살펴보겠습니다. 상대를 "B" 라고 하겠습니다. B 와 여러분의 연결 확률은 질문들에 대한 두사람의 대답에 달려 있습니다. 공통 질문 세트를 "S" 라고 부르겠습니다. 매우 간단한 예로, 우리는 공통으로 단 두개의 질문만 있는 작은 세트 질문 "S"를 사용합니다. 그리고 그것들로 연결을 계산합니다. 두 가지 예를 말씀 드릴께요. 예를 들어, 첫 질문은 "여러분은 얼마나 지저분한가요?" 입니다. 그리고 가능한 대답은 매우 지저분한, 보통, 매우 정리된 입니다. 여러분이 "매우 정리된"이라고 대답했다고 생각해 보세요. 그러면 여러분은 짝이 될 사람도 "매우 정리된" 이라고 대답하길 기대할 거에요. 이 질문은 사람들에게 매우 중요합니다. 기본적으로 사람들에게는 정리벽이 있거든요. 사람들은 단정하고, 다른 사람도 단정하길 바랍니다, 그것 뿐입니다. B 는 다소 다르다고 생각해봅시다. B는 자신에 대해 매우 정리된 사람이라 대답하지만, 그는 보통이라고 말하는 사람과도 사이가 좋습니다. 그런 질문은 그에게 그리 중요하지 않습니다. 두 번째 질문을 보겠습니다. 앞서 예로 들었던 질문입니다. "여러분은 관심의 중심이 되고 싶은가요?" 대답은 그저 네 아니면 아니오 입니다. 여러분은 "아니오" 라고 대답했고, 여러분은 상대방도 "아니오"라고 대답하길 원하며, 그 질문은 여러분에게 별로 중요하지 않다고 생각해 봅시다. B는 "네" 라고 대답했고, 그는 자신이 주목받길 원하기 때문에 상대방이 "아니오"라고 대답하길 원한다고 생각해봐요. 그리고 그 질문은 그 사람에게 매우 중요합니다. 그러면 이것을 가지고 측정을 해 봅시다. 우리의 첫 단계는 컴퓨터를 사용해야 하기 때문에 "다소 중요" 나 "매우 중요"와 같은 생각들에 절대값을 부여해야 합니다. 왜냐하면 컴퓨터는 모든 것을 수치로 필요로 하기 때문이죠. 오케이 큐피드는 다음과 같은 척도를 사용합니다. '무의미한'은 0 '조금 중요한'은 1 '다소 중요한'은 10 '매우 중요한'은 50 그리고 '절대적으로 필수적인'은 250 입니다. 다음으로 알고리즘은 두 가지 간단한 계산을 합니다. 첫번째는 B의 대답들이 얼마나 여러분을 만족시키는지, 즉, 여러분의 척도에서 얼마나 많은 점수를 획득하느냐 입니다. 여러분은 지저분함에 관한 질문에 대한 B의 대답이 매우 중요하다고 암시했습니다. 그것은 50점이고 B는 그 점수를 획득했어요. 두번째 질문은 단지 1점입니다. 왜냐하면 여러분이 그것이 중요하지 않다고 말했기 때문입니다. 그리고 B는 점수를 받지 못했습니다. 그래서 B의 대답은 51점 중 50점을 받았습니다. 98% 만족스러운거죠. 상당히 좋습니다. 알고리즘의 두 번째 질문은 얼마나 여러분이 B를 만족시키는지를 보는 것입니다. B는 지저분함에 관한 여러분의 대답에 1점을 두번째 질문에는 10점을 부여했습니다. 1점과 10점을 합해서 11점 중 여러분은 10점을 얻었고, 두 사람은 두번째 질문에서 서로를 만족시켰습니다. 그래서 여러분의 대답은 11점 중 10점을 받았고, B에 대해 똑같이 91% 만족하게 됩니다. 나쁘지 않죠. 마지막 단계는 이 두 수치를 합하여 두 사람 모두에 대한 하나의 결과치를 만드는 작업입니다. 이렇게 하기 위해서는, 알고리즘이 여러분의 점수들을 곱해야 합니다. 그리고 n제곱근을 구해야 합니다. 여기서 n은 질문의 수입니다. 우리가 든 예에서 s가 겨우 2였기 때문에, 우리는 98%와 91%의 곱의 제곱근을 구합니다. 이는 94%가 됩니다. 이 94%의 값이 B와 여러분의 매칭 백분위입니다. 이것은 여러분이 서로 얼마나 만족할지 우리가 알고 있는 정보에 근거하여 수학적으로 표현한 것입니다. 그러면, 왜 알고리즘이 두 매칭 점수의 평균을 구하는 대신 곱셈을 해서 제곱근을 하는걸까요? 일반적으로, 이 공식은 기하 평균이라고 불립니다. 이것은 넓은 범위를 지닌 값들을 통합하여 매우 다른 특성들을 나타내는 좋은 방법입니다. 다시 말해, 로맨틱 매칭에 있어 완벽한 방법입니다. 사람들은 다양한 범위의 값과 수 많은 다른 정보 점수를 갖고 있습니다. 제가 말한 것 처럼, 영화에 관해서, 정치에 관해서, 종교에 관해서, 모든 것에 관해서 말이죠. 이것은 직관적으로도 의미가 있어요. 두 사람이 서로를 50% 만족시킨다면 한 사람이 0% 만족시키고 다른 한 사람이 100% 만족시키는 커플보다 더욱 괜찮은 매칭입니다. 왜냐하면 애정은 상호적이어야 하기 때문입니다. 앞서 예에서 들었던 것처럼 우리가 매우 적은 수의 질문을 갖고 있는 경우에 오차를 조금만 수정하고 나면 계속 진행해도 좋습니다. 오케이 큐피드가 두 사람을 연결할 때마다 우리가 방금 간략하게 소개한 과정들을 거치게 됩니다. 첫째, 사람들의 대답 정보를 모읍니다. 그리고 간단하고 수학적인 방법으로 그들의 선택과 선호도를 다른 사람들의 것과 비교합니다. 저는 현실의 현상을 가지고 마이크로 칩이 이해할 수 있는 어떤 것을 만들어 내는 능력이 오늘날 누구든지 가질 수 있는 가장 중요한 기술이라고 생각합니다. 여러분이 누군가에게 이야기를 들려주기 위해 쓰는 문장들처럼, 여러분은 컴퓨터에게 이야기를 들려주기 위해 알고리즘을 사용합니다. 만약 여러분이 그 언어를 배운다면, 여러분은 여러분의 이야기를 들려줄 수 있을거에요. 저는 이 이야기가 어려분이 그렇게 하도록 도울 수 있으면 좋겠습니다.

Christian Rudder: Inside OKCupid: The math of online dating

Christian Rudder: Inside OKCupid: The math of online dating

Related talks

Iseult Gillespie: Why should you read "A Midsummer Night's Dream?"

Helen Fisher: Why we love, why we cheat

Natalya St. Clair: The unexpected math behind Van Gogh's "Starry Night"

Priyanka Jain: How to make applying for jobs less painful

Amy Webb: How I hacked online dating

Dennis Wildfogel: How big is infinity?

Related talks

Iseult Gillespie: Why should you read "A Midsummer Night's Dream?"

Helen Fisher: Why we love, why we cheat

Natalya St. Clair: The unexpected math behind Van Gogh's "Starry Night"

Priyanka Jain: How to make applying for jobs less painful

Amy Webb: How I hacked online dating

Dennis Wildfogel: How big is infinity?