Hello, my name is Christian Rudder, and I was one of the founders of OkCupid. It's now one of the biggest dating sites in the United States. Like most everyone at the site, I was a math major, As you may expect, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two people should go on a date. We built our entire business around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an algorithm is just a systematic, step-by-step way to solve a problem. It doesn't have to be fancy at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are algorithms even important? Why does this lesson even exist? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step processes. A computer without an algorithm is basically an expensive paperweight. And since computers are such a pervasive part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they wanted from someone else. That worked really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book burning or flag burning?" might reveal more about someone than their taste in movies. And it doesn't make sense to weigh all things equally, so we added one final data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else -- your potential match -- to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is based on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you answered "very organized," and you'd like someone else to answer "very organized," and the question is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to ideas like "somewhat important" and "very important," because computers need everything in numbers. We at OkCupid decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's worth 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that wrong, so B's answers were 50 out of 51 possible points. That's 98% satisfactory. Pretty good. The second question the algorithm looks at is: How much did you satisfy B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 -- you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 percent satisfactory to B. That's not bad. The final step is to take these two match percentages and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this sample, is only 2, we have: match percentage equals the square root of 98 percent times 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm multiply, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different properties. In other words, it's perfect for romantic matching. You've got wide ranges and you've got tons of different data points, like I said, about movies, politics, religion -- everything. Intuitively, too, this makes sense. Two people satisfying each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small number of questions, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the steps we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in simple, mathematical ways. This, the ability to take real-world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer. If you learn the language, you can go out and tell your stories. I hope this will help you do that.
大家好,我的名字叫 Christian Rudder, 我是 OK Cupid 的創辦者之一。 現在它是美國 最大的交友網站之一。 跟這網站的其它負責人一樣, 我主修數學,而就如你所預期的, 我們較為人知的是 用分析方式研究戀愛行為。 我們把它叫做 速配演算法。 基本上,OK Cupid 的速配演算法 幫助我們決定 某兩個人該不該去約會。 這是我們事業的技術核心。 演算法聽起來很花俏, 而人們放棄搞懂因為它太複雜了 但說真的,演算法只是一個 有系統的、 一步一步 解決問題的方法。 不複雜也不花俏。 這個課程裡,我將會解釋 我們是怎麼設計我們的演算法 而它是如何運作的。 為什麼演算法如此重要? 又為什麼要有這個課程? 這個,請注意我剛用的那個 非常重要的字: 演算法是一步一步 解決問題的方法, 而就像你可能知道的, 電腦很擅長做一步步 規劃好的程序。 一臺沒有演算法的電腦 基本上只是一個很貴的紙鎮而已。 由於電腦在日常生活中 已經非常普及, 所以演算法也是無所不在。 而 OK Cupid 演算法背後的數學 其實異常地簡單。 只是一些加法、 乘法、 還有一些些開根號。 而要設計它比較麻煩的部份,反而是 想辦法把一些神秘的東西, 像是人類的吸引力, 把它變成電腦可以運算的東西。 好,要將人配對 所需要的第一樣東西是數據, 也就是要讓演算法計算的東西。 要快速取得人們資料 最好的方法 就是直接問他。 所以,我們決定 OK Cupid 應該要問 使用者一些問題, 像是:「你未來希望有小孩嗎?」 還有「你多常刷牙?」 「你喜歡恐怖片嗎?」 以及較大的問題 像是「你相信神嗎?」 而很多問題都有助於 將喜歡的人和喜歡的人 配在一起, 這是當雙方都回答了同一個答案的情況。 舉例來說,兩個都喜歡恐怖片的人 也許就是不錯的配對, 比起將喜歡 和不喜歡的人配在一起好。 但如果是像這樣的問題: 「你喜歡成為眾人的焦點嗎?」 如果一對情侶的兩個人都說「喜歡」 那麼他們就有大問題了。 我們很早就知道這點, 所以我們決定 每個問題都需要再多一點資訊。 我們要求使用者 不只是回答問題本身, 同時也回答他們對別人的期望。 這效果真的很好, 但我們還須要另一個思維。 有一些問題比其它問題 更能提供一個人的個性。 比如說,像是政治的問題: 「哪一個比較糟:燒書或是燒國旗?」 比起對電影的品味,這可能透露更多 這個人的個性。 而每個人看事情的輕重大小都不同 所以我們加入了最後一個資料點。 每一個 OK Cupid 問你的問題, 你都可以告訴我們 它在你生活中扮演的角色, 而選項是從「不相關」到「極重要」。 所以現在,每一個問題, 我們都有三筆資訊 可以給我們的演算法: 第一,你的答案; 第二,你對別人期望的答案, 就是可能會跟你配對的人; 就是可能會跟你配對的人; 第三,這問題究竟對你有多重要。 有全部這些資訊, OK Cupid 就可以算出 這兩個人相處有多融洽。 這演算法會把數字吃進去 然後給我們答案。 舉一個實際的例子, 我們來看看你和另一個人有多速配, 估且叫他作 B 君。 你和 B 君的速配指數 是基於 你們雙方回答的答案。 我們把同樣的問題這集合叫做 s。 一個非常簡單的例子, 我們用很小的集合 s, 只有兩個相同的問題, 然後由它算出速配程度。 這是兩個可能的問題。 第一個是:「你有多不愛乾淨?」 而可能的答案是 「很髒亂」、 「普通」、 「很愛乾淨」。 假設你的答案是「很愛乾淨」, 而你期望別人也回答「很愛乾淨」, 並且這問題對你來說「非常重要」。 基本上你有潔癖。 你愛乾淨、 你也希望別人愛乾淨, 就是這樣。 又假設 B 君回答有點不一樣。 他回答自己「很愛乾淨」, 但別人回答是「普通」 對他來說就可以了, 並且這問題對它只有「些許重要。」 接著我們來看第二個問題, 是我們先前說過的例子: 「你喜歡成為眾人的焦點嗎?」 而答案只有「是」或「否」。 假設你的答案是「否」, 而你希望對方回答「否」、 並且這問題對你只有「些許重要」。 換 B 君,他回答「是」, 而他希望對方回答「否」, 因為他希望焦點是在他身上, 而這問題對他「蠻重要的」。 好,讓我們試著來算看看。 第一個步驟, 因為我們是用電腦算, 我們必須給不同答案 相對應的數字, 比如說「蠻重要的」和「非常重要」, 因為電腦須要每件事都是數字 才能運算。 在 OK Cupid 裡我們訂定了這樣的量表: 「不相關」是 0、 「些許重要」是 1、 「蠻重要的」是 10、 「非常重要」是 50、 而「極重要」是 250。 接著,演算法會進行兩個簡單的運算。 第一是 B 君的答案 有多符合你的期望。 也就是,B 君在你的量表上會得到幾分? 嗯,你在第一個愛乾淨的問題中 表示 B 君的答案 對你非常重要。 它佔 50 分而 B 正好符合。 而第二個問題只佔 1 分, 因為你說它只有些許重要, 而 B 君答得不對。 所以 B 君的答案 在總數 51 分裡得到 50 分。 這樣是 98% 的滿意度。 相當不錯。 而演算法第二步要做的是 你有多符合 B 君。 嗯,B 君認為你對整潔問題 的答案佔 1 分, 而第二個問題的答案佔 10 分。 總共是 11 分,也就是 1 + 10, 你得到 10 分, 你們雙方在第二個問題 符合兩方的條件。 所以你的答案是 11 分裡得 10 分, 相當於 B 君 91% 的滿意度。 也是不錯。 而最後一步, 是把這兩個數字 變成你們兩個速配指數。 要完成這件事, 演算法會把你們的分數乘起來, 然後開 n 次方根, (譯註:在 OK Cupid 官網中都是開根號。) 這裡 n 是問題的數目。 因為在我們例子的 s 裡, 問題數只有 2, 我們就算出速配指數 是 98% 乘 91% 的開根號。 也就是 94%。 這 94% 就是你和 B 君的速配指數。 這是基於我們的了解, 你們兩個相處融洽的程度 的一種數學式。 而,為什麼演算法要用相乘 而不用相加, 並且要取平方根呢? 一般來說,這個公式叫作 幾何平均數, 它是將範圍很廣、 並表達不同特性的數據合在一起的 一種很棒的方法。 也就是說,它對浪漫的配對來說 是很完美的。 你會有很廣的數據、 你也許多不一樣的資訊, 比如說,關於電影、 關於政治、 關於信仰、 關於所有事。 直覺來說,這也合理。 兩個人互相有 50% 的滿意度 應該會比 一人是 0% 另一人是 100% 來得好, 因為感情是互相的。 再加上一些邊界錯誤的修正, 就是說當問題數很少的時候的修正, 像是我們這個例子, 我們就完成了。 每一次 OK Cupid 在幫兩人配對時, 都經過了我們所講的那些步驟。 首先從你的答案收集資訊, 然後用簡潔的數學方法 來將你和其它人的偏好作比較。 這樣把真實世界的現象 變成微晶片能運作的一種能力, 我認為, 是我們現今可以擁有的 最重要的技能。 就像是你用句子來 向別人說故事一樣, 你會用演算法來 對電腦訴說故事。 如果你學會這種語言, 你就可以把你的故事告訴別人。 這就是我希望幫助你達成的事情。