Luis von Ahn: Massive-scale online collaboration

How many of you had to fill out a web form where you've been asked to read a distorted sequence of characters like this? How many of you found it really annoying?

여러분 중에 몇 명이나 이렇게 구불구불한 문자열을 읽어서 웹페이지의 입력양식을 채워 보셨나요? 이것이 정말 짜증난다고 생각하는 사람은 몇 명 정도 되나요? 그렇군요, 아주 많네요. 사실 그건 제가 발명했어요.

(Laughter)

OK, outstanding. So I invented that.

(웃음)

(Laughter)

그걸 발명한 사람들 중의 한 명이었죠.

Or I was one of the people who did it. That thing is called a CAPTCHA. And it is there to make sure you, the entity filling out the form, are a human and not a computer program that was written to submit the form millions of times. The reason it works is because humans, at least non-visually-impaired humans, have no trouble reading these distorted characters, whereas programs can't do it as well yet. In the case of Ticketmaster, the reason you have to type these characters is to prevent scalpers from writing a program that can buy millions of tickets, two at a time.

이런 것을 캡차(CAPTCHA)라고 하죠. 이것을 사용하는 이유는 양식을 입력하는 주체가 수백만 개의 입력양식을 작성하는 컴퓨터 프로그램이 아니라 실제 사람이라는 것을 확인하기 위해서입니다. 이런 방법이 효과가 있는 이유는 우리는 시각장애가 없는 한 이렇게 왜곡된 문자를 읽는데 전혀 문제가 없기 때문이죠. 반면에 컴퓨터 프로그램은 아직 이런것을 잘 읽지 못합니다. 예를 들면, 티켓마스터 사이트에서, [역: 미국의 공연티켓 판매사] 여러분이 이런 구불구불한 문자들을 입력해야 하는 이유는, 암표상들이 컴퓨터 프로그램으로 매번 두장씩 수백만 장의 표를 사는것을 방지하기 위한 것이죠.

CAPTCHAs are used all over the Internet. And since they're used so often, a lot of times the sequence of random characters shown to the user is not so fortunate. So this is an example from the Yahoo registration page. The random characters that happened to be shown to the user were W, A, I, T, which, of course, spell a word. But the best part is the message that the Yahoo help desk got about 20 minutes later.

캡챠는 인터넷상의 다양한 서비스에 이용됩니다. 그런데 캡차가 그렇게 많이 사용되다 보니 유저의 화면에 나타나는 이런 랜덤 글자들은 어떤때는 히안한 단어를 만들기도 합니다. 그 일례로 이건 야후 서비스의 등록페이지입니다. 사용자에게 무작위로 선택되어 나타났던 문자들이 W, A, I, T이었지요. 즉, 기다리라는 단어가 나온거죠. 하지만 정말 재밌는 것은 20분 후에 야후 고객센터에 접수된 메세지였지요.

[Help! I've been waiting for over 20 minutes and nothing happens.]

메세지 : "도와주세요! 20분 기다렸는데 아무런 반응이 없습니다!"

(Laughter)

(웃음)

This person thought they needed to wait. This, of course, is not as bad as this poor person.

그 유저는 계속 기다려야한다고 생각했던 거죠. 그래도 그건 이 캡차 보다는 낳죠.

(Laughter)

(웃음)

CAPTCHA Project is something that we did at Carnegie Melllon over 10 years ago, and it's been used everywhere. Let me now tell you about a project that we did a few years later, which is sort of the next evolution of CAPTCHA. This is a project that we call reCAPTCHA, which is something that we started here at Carnegie Mellon, then we turned it into a start-up company. And then about a year and a half ago, Google actually acquired this company.

캡차는 여기 카네기멜론 대학에서 10년 전에 했던 프로젝트였는데 지금은 누구나 다 사용하죠. 그 후 몇년 후에 우리가 한 프로젝트에 대해서 말씀드리겠습니다. 그것은 개량형 캡차라고 말할 수 있는 리캡차라는 프로젝트였는데 카네기 멜론 대학에서 시작했었고, 그후에는 신생회사가 되었는데 약 1년 반 전에 구글이 그 회사를 인수했죠.

Let me tell you what this project started. This project started from the following realization: It turns out that approximately 200 million CAPTCHAs are typed everyday by people around the world. When I first heard this, I was quite proud of myself. I thought, look at the impact my research has had. But then I started feeling bad. Here's the thing: each time you type a CAPTCHA, essentially, you waste 10 seconds of your time. And if you multiply that by 200 million, you get that humanity is wasting about 500,000 hours every day typing these annoying CAPTCHAs.

이 프로젝트가 어떤 움직임을 시작했는지 말씀드리죠. 이 프로젝트는 다음과 같은 이유로 시작했습니다. 매일 약 2억 개의 캡차가 전 세계적으로 입력되지요. 제가 이사실을 처음 들었을 때 저는 매우 자랑스러웠지요. "내 연구가 미친 영향력을 봐라!" 하며 좋아했어요. 하지만 저는 좀 미안한 생각이 들기 시작했습니다. 사람들이 캡차 문자열을 입력할 때마다 약 10초 정도의 시간을 낭비하게 됩니다. 그 시간을 2억으로 곱하면 우리는 전체적으로 이 짜증나는 문자열을 입력하기 위해 매일 50만 시간을 낭비하고 있다는 결론이 나옵니다.

(Laughter)

그래서 좀 미안하다는 생각을 했어요.

So then I started feeling bad.

(웃음)

(Laughter)

그런데 웹페이지의 보안이 캡차에 의존하기 때문에

And then I started thinking, of course, we can't just get rid of CAPTCHAs, because the security of the web depends on them. But then I started thinking, can we use this effort for something that is good for humanity? So see, here's the thing. While you're typing a CAPTCHA, during those 10 seconds, your brain is doing something amazing. Your brain is doing something that computers cannot yet do. So can we get you to do useful work for those 10 seconds? Is there some humongous problem that we cannot yet get computers to solve, yet we can split into tiny 10-second chunks such that each time somebody solves a CAPTCHA, they solve a little bit of this problem? And the answer to that is "yes," and this is what we're doing now.

캡차를 그냥 버릴 수도 없지요 저는 그래서 캡차를 입력하는 시간을 인류를 위해 좋은 일을 하는데 쓸수 있을까 생각했죠. 자, 이런 생각을 해 보세요... 여러분이 10초 동안 캡차 문자열을 입력할 때 여러분의 뇌는 매우 어려운 일을 하고 있죠. 컴퓨터는 아직 그런 일을 못합니다. 어떻게 하면 여러분이 그 10초 동안 어떤 유용한 일을 할 수 있게 저희들이 도와드릴 수 있을까요? 다시 말하면, 아직까지 컴퓨터가 풀지 못하는 어떤 거창한 문제를 찾아 그것을 10초 단위의 작은 작업단위로 쪼개서 사람들이 캡차 질문에 답을 할때 마다 그 문제의 작은 부분을 해결하게 만드는 것이 가능할까요? 그 질문에 대한 답은 "예" 이며 우리가 지금 하는 일이 바로 그겁니다.

Nowadays, while you're typing a CAPTCHA, not only are you authenticating yourself as a human, but in addition you're helping us to digitize books. Let me explain how this works. There's a lot of projects trying to digitize books. Google has one. The Internet Archive has one. Amazon, with the Kindle, is trying to digitize books. Basically, the way this works is you start with an old book. You've seen those things, right? Like a book?

여러분이 모르실지는 모르지만 요즘은 캡차 문자를 입력할 때 사람이 캡차 입력을 한다는 것을 증명할 뿐만이 아니라 저희들이 종이책을 디지털화하는 일을 실지로 도와줍니다. 자, 그럼 제가 좀 더 자세히 설명드리겠습니다. 책들을 디지털화하는 프로젝트는 많이 있습니다. 구글, 인터넷 아카이브, 아마존 그리고 지금은 킨들도 종이책을 디지털화 하려고 노력하고 있습니다. 이런 작업은 주로 오래된 책으로 부터 시작됩니다. 그런것들 보셨죠? 책이라는 것 말이예요? (웃음)

(Laughter)

처음에 하는 일은 책을 스캔하는 것이죠.

So you start with a book and then you scan it.

책을 스캔하는것은

Now, scanning a book is like taking a digital photograph of every page. It gives you an image for every page. This is an image with text for every page of the book. The next step in the process is that the computer needs to be able to decipher the words in this image. That's using a technology called OCR, for optical character recognition, which takes a picture of text and tries to figure out what text is in there. Now, the problem is that OCR is not perfect. Especially for older books where the ink has faded and the pages have turned yellow, OCR cannot recognize a lot of the words. For things that were written more than 50 years ago, the computer cannot recognize about 30 percent of the words. So now we're taking all of the words that the computer cannot recognize and we're getting people to read them for us while they're typing a CAPTCHA on the Internet.

책의 모든 페이지를 디지털 사진기로 찍는것과 비슷합니다. 그렇게 각 페이지의 이미지를 포착해서 그 책에 담긴 텍스트의 모든 이미지를 얻는 거죠. 다음 과정은 컴퓨터가 각 이미지에 있는 단어를 해독하는 것이지요. 우리는 텍스트의 이미지를 읽으며 무슨 글이 써있는지 판독해 주는 광학문자인식(OCR)이라는 기술을 이용합니다. 그런데 문제는 그 OCR 기술이 완벽하지 않습니다. 특히 잉크가 바래고, 페이지가 노랗게 변한 오래된 책은 OCR이 많은 단어를 인식하지 못합니다. 예를들어 50년이 넘은 책들은 컴퓨터가 대략 30%정도를 인식하지 못합니다. 그래서 우리가 지금 하고 있는 작업은 컴퓨터가 인식하지 못하는 모든 단어들을 모아서 여러분이 인터넷상에서 캡챠 문자를 입력할 때 사람들이 그런 문자를 읽게 합니다.

So the next time you type a CAPTCHA, these words that you're typing are actually words from books that are being digitized that the computer could not recognize. The reason we have two words nowadays instead of one is because one of the words is a word that the system just got out of a book, it didn't know what it was and it's going to present it to you. But since it doesn't know the answer, it cannot grade it. So we give you another word, for which the system does know the answer. We don't tell you which one's which and we say, please type both. And if you type the correct word for the one for which the system knows the answer, it assumes you are human and it also gets some confidence that you typed the other word correctly. And if we repeat this process to 10 different people and they agree on what the new word is, then we get one more word digitized accurately.

그래서 다음에 여러분이 입력하는 캡차 문자는 디지털화 하는 과정에서 컴퓨터가 인식하지 못했던 것이지요. 그런데 요즘엔 한단어 대신 두단어를 보여주는데 그중 하나는 컴퓨터가 디지털화하다가 판독하지 못했던 단어를 보여주는 것이지요. 그런데 컴퓨터는가 그 단어를 모르기 때문에 정답이 입력됐는지 모르죠. 그래서 컴퓨터는 자기가 답을 아는 다른 단어를 하나 더 화면에 보여줍니다. 그리고는 그냥 두 단어를 모두 입력하라고 하지요. 그래서 컴퓨터가 이미 알고 있는 단어에 대해 여러분이 정답을 입력하면, 여러분을 인간으로 인정하고 다른 단어도 옳게 입력했을거라는 자신을 어느정도 갖게 돼죠. 만약 이런 과정을 10명의 다른 사람들에게 반복하고, 10명 모두가 그 단어를 똑같이 읽으면 그 단어를 정확하게 디지털화 한 것으로 간주합니다.

So this is how the system works. And since we released it about three or four years ago, a lot of websites have started switching from the old CAPTCHA, where people wasted their time, to the new CAPTCHA where people are helping to digitize books. So every time you buy tickets on Ticketmaster, you help to digitize a book. Facebook: Every time you add a friend or poke somebody, you help to digitize a book. Twitter and about 350,000 other sites are all using reCAPTCHA. And the number of sites that are using reCAPTCHA is so high that the number of words we're digitizing per day is really large. It's about 100 million a day, which is the equivalent of about two and a half million books a year. And this is all being done one word at a time by just people typing CAPTCHAs on the Internet.

캡차 시스템은 이렇게 작동합니다. 약 3, 4년 전에 저희들이 이 시스템을 소개한 이후 많은 웹사이트가 시간을 낭비하던 구형 캡차 시스템으로 부터 책을 디지털화하는데 도움을주는 신형 캡차 시스템으로 업그레이드 했지요. 예를들면, 티켓마스터도 업그레이드 했는데 그래서 거기서 표를 살때마다 여러분은 책을 디지털화하는데 도움을 주게됩니다. 페이스북에서는 친구를 추가하거나 누군가를 찜할때 마다 책을 디지털화하는데 도움을 주게 됩니다. 트위터와 약 35만개의 다른 사이트들도 신형 리캡차를 사용합니다. 사실 리캡차 서비스를 이용하는 사이트가 많아서 매일 디지털화되는 단어 수는 정말 엄청나게 많습니다. 일일 대략 1억개 정도가 되는데 이 숫자는 연간 250만권의 책들에 해당하는 숫자이죠. 이건 단순히 사람들이 인터넷상에서 리캡차 단어를 하나씩 입력해서 가능하게 된 것입니다. (박수)

(Applause)

물론,

Now, of course, since we're doing so many words per day, funny things can happen. This is especially true because now we're giving people two randomly chosen English words next to each other. So funny things can happen. For example, we presented this word. It's the word "Christians"; there's nothing wrong with it. But if you present it along with another randomly chosen word, bad things can happen. So we get this.

우리가 매일 수많은 단어를 다루기 때문에 재미있는 일들이 벌어 질수도 있습니다. 그런데 지금은 무작위로 선택된 영어 단어를 두개 나란히 보여주기 때문에 재미있는 일들이 발생 할 수 있죠. 그 일례로, 우리가 이 단어를 보여줬지요. "Christians" 이라는 단어에는 전혀 문제가 없지요. 그런데 다른 무작위로 선택된 다른 단어를 보여줄때 좋지 않은 일이 발생할 수 있지요. 이런거죠. "나쁜 기독교인"

[bad Christians]

더 나쁜 상황은 이 단어가

But it's even worse, because the website where we showed this actually happened to be called The Embassy of the Kingdom of God.

하나님의 왕국 단체라고 불리는 사이트에서 보여 줬다는거죠. (웃음)

(Laughter)

이런..

Oops.

(웃음)

(Laughter)

여기 또다른 나쁜예가 있습니다.

Here's another really bad one. JohnEdwards.com

JohnEdwards.com (텍스트: 젠장할 자유당)

[Damn liberal]

(Laughter)

(웃음)

So we keep on insulting people left and right everyday. Of course, we're not just insulting people. Here's the thing. Since we're presenting two randomly chosen words, interesting things can happen. So this actually has given rise to a really big Internet meme that tens of thousands of people have participated in, which is called CAPTCHA art. I'm sure some of you have heard about it. Here's how it works. Imagine you're using the Internet and you see a CAPTCHA that you think is somewhat peculiar, like this CAPTCHA.

그래서 우리는 매일 사람들에게 모욕을 줍니다. 물론 모욕만 하는것은 아니죠. 캡차는 무작위로 선택된 두 개의 단어를 보여 주기 때문에 재미있는 일도 벌어집니다. 리캡차는 수만명의 사람들이 즐기는 캡차 아트라고 하는 매우 인기있는 인터넷 밈을 창출했지요. 캡차아트에 대해 들어보신 분들이 계시겠지요. 어떻게 하는건지 말씀드리죠. 여러분이 인터넷을 사용하다가 흥미있는 캡차 문자를 보면 이런 캡챠 말입니다 (텍스트: 보이지 않은 토스터)

[invisible toaster]

그 화면을 캡쳐하는것입니다.

What you're supposed to do is you take a screenshot of it. Then of course, you fill out the CAPTCHA because you help us digitize a book. But first you take a screenshot and then you draw something that is related to it.

그리고는 물론 캡챠 문자도 입력합니다. 왜냐하면 책을 디지털화하는 것을 도와야 하니까요. 그리구 나서 앞서 포착한 스크린 숏에 관련되는 그림을 그리는 거죠.

(Laughter)

(웃음)

That's how it works.

이렇게 작동하는 거죠.

(Laughter)

수만개의 이런 캡처가 있죠.

There are tens of thousands of these. Some of them are very cute.

어떤 것은 귀엽죠. (텍스트: 꽉 잡아)

[clenched it]

(웃음)

(Laughter)

어떤것은 더 재밌습니다.

Some of them are funnier.

"취한 설립자들"

[stoned Founders]

(Laughter)

(웃음)

And some of them, like paleontological shvisle ...

그리고

(Laughter)

"paleontological shvisle" 같은 것에는

they contain Snoop Dogg.

Snoop Dogg의 사진이 나타나기도 합니다.

(Laughter)

(웃음)

OK, so this is my favorite number of reCAPTCHA. So this is the favorite thing that I like about this whole project. This is the number of distinct people that have helped us digitize at least one word out of a book through reCAPTCHA: 750 million, a little over 10 percent of the world's population, has helped us digitize human knowledge. And it is numbers like these that motivate my research agenda. So the question that motivates my research is the following: If you look at humanity's large-scale achievements, these really big things that humanity has gotten together and done historically -- like, for example, building the pyramids of Egypt or the Panama Canal or putting a man on the Moon -- there is a curious fact about them, and it is that they were all done with about the same number of people. It's weird; they were all done with about 100,000 people. And the reason for that is because, before the Internet, coordinating more than 100,000 people, let alone paying them, was essentially impossible. But now with the Internet, I've just shown you a project where we've gotten 750 million people to help us digitize human knowledge. So the question that motivates my research is, if we can put a man on the Moon with 100,000, what can we do with 100 million?

자, 이것은 제가 좋아하는 리캡차 숫자입니다. 이것은 전 프로젝에서 제가 좋아하는 부분입니다. 이 숫자는 리캡차를 통해 최소한 한개의 단어라도 디지털화하도록 도와준 각 사람들의 총 숫자입니다. 전 세계 인구의 10%가 약간 넘는 7억 5천만명이 인간의 지식을 디지털화할 수 있게 우리를 도와주었습니다. 이와같은 거창한 숫자는 저를 격려해 줍니다. 이 숫자가 제 연구에 용기를 주는 이유는 다음과 같습니다. 인류 역사상 대규모의 사람들이 협력하여 지금까지 성취한 거창한 업적들을 살펴보면 -- 예를들면, 이집트의 피라미드, 파나마 운하, 또는 인류의 달착륙 등을 보면 특이한 공통된 사실이 하나 있는데 그것은 모두 비슷한 수의 사람들이 협력해서 달성했다는 겁니다. 그런데 각 경우에 약 10만 명이 일했다는 것은 참 신기합니다. 인터넷 시대 이전에는 10만 명 이상의 임금을 지불하는 것은 고사하고 그들을 관리하는 것이 사실상 불가능했지요. 하지만 지금은 인터넷을 사용해 제가 보여드린 것처럼 7억5천만 명의 사람들이 인간의 지식을 디지털화하기 위해서 저희를 돕고 있습니다. 그래서 제 연구에 용기를 주는 것은 우리가 10만 명의 노력으로 사람을 달에 보낼 수 있다면, 1억명의 사람으로는 무엇이 가능하겠냐는 것이죠.

So based on this question, we've had a lot of different projects that we've been working on. Let me tell you about one that I'm most excited about. This is something that we've been semiquietly working on for the last year and a half or so. It hasn't yet been launched. It's called Duolingo. Since it hasn't been launched, shhh!

그래서 이 질문을 기반으로, 저희는 여러 가지 프로젝트를 진행하고 있습니다. 그중 제가 가장 좋아하는 프로젝트에 대해 말씀드리지요. 지난 일년반 동안 저희들이 조용히 진행해 온 일입니다. 아직 공식적으로 발표되지 않은 듀오링고라는 프로젝트죠. 아직 공개되지 않았으니.. 쉬~~ 비밀입니다.

(Laughter)

(웃음)

Yeah, I can trust you'll do that. So this is the project. Here's how it started. It started with me posing a question to my graduate student, Severin Hacker. OK, that's Severin Hacker. So I posed the question to my graduate student. By the way, you did hear me correctly; his last name is Hacker.

저는 여러분이 비밀을 지켜주리라 믿습니다. 이프로젝트는 다음과 같이 시직됐지요. 이 프로젝트는 제가 세버린 해커라는 제 대학원 학생에게 던진 질문으로 부터 시작했지요. 그래요. 이게 세버린 해커입니다. 제가 그 학생에게 그 질문을 던졌죠. 녜, 여러분이 제말을 옳게 들으셨어요. 이 학생의 성이 정말로 해커(Hacker) 예요.

(Laughter)

하여튼 제가 그에게 질문을 던졌지요.

So I posed this question to him: How can we get 100 million people translating the web into every major language for free? There's a lot of things to say about this question. First of all, translating the web. Right now, the web is partitioned into multiple languages. A large fraction of it is in English. If you don't know English, you can't access it. But there's large fractions in other different languages, and if you don't know them, you can't access it. So I would like to translate all of the web, or at least most of it, into every major language. That's what I would like to do.

어떻게하면 1억명의 사람이 모든 웹페이지를 모든 주요 언어로, 그리고 무료로 번역을 하도록 만들 수 있을까? 자, 이 질문에 대해서 할 말들이 많이 있습니다. 현재 웹은 여러 언어로 갈려져 있습니다. 대부분의 페이지는 영어로 되어 있구요. 영어를 모르면 영어 페이지에 접할 수가 없지요. 하지만 다른 언어로된 웹페이지도 많이 있지요. 물론 그 언어를 모르면 그 웹페이지를 읽을 수 없지요. 그래서 저는 웹의 전체 또는 최소한 일부라도 모든 주요 언어로 번역하고 싶습니다. 바로 그게 제가 하고 싶은 일입니다.

Now, some of you may say, why can't we use computers to translate? Machine translation is starting to translate some sentences here and there. Why can't we use it to translate the web? The problem with that is it's not yet good enough and it probably won't be for the next 15 to 20 years. It makes a lot of mistakes. Even when it doesn't, since it makes so many mistakes, you don't know whether to trust it or not.

어떤분은 왜 컴퓨터로 번역을 하지 못하냐고 물으실지 모르죠. 왜 기계번역을 사용 할 수 없을까요? 요즘 기계번역한 문장들이 가끔 눈에 뜨입니다. 그런데 웹 전체를 기계로 번역할 수 있을가요? 문제는 아직도 기계번역의 질이 낮다는 것입니다. 아마 앞으로 15 내지 20년 까지는 그럴 겁니다. 기계번역에는 오역이 많이 있습니다. 기계가 한 번역이 오역이 아니더라도 일반적으로 워낙 오역이 많기 때문에 믿지 못하죠.

So let me show you an example of something that was translated with a machine. Actually, it was a forum post. It was somebody who was trying to ask a question about JavaScript. It was translated from Japanese into English. So I'll just let you read. This person starts apologizing for the fact that it's translated with a computer. So the next sentence is going to be the preamble to the question. So he's just explaining something. Remember, it's a question about JavaScript.

기계번역의 일례를 보여드리죠. 이 글은 어떤 포럼에 올라온 글입니다. 어떤분이 자바스크립트에 대해 질문하는 글이 었지요. 일본어를 영어로 번역한 것입니다. 한번 읽어보세요. 그는 질문을 하기에 앞서 그의 질문을 컴퓨터로 번역한 것을 먼저 사과합니다. 그 다음 문장은 그의 질문의 서두니까 뭐를 설명하는 거죠. 이 질문은 자바스크립트에 대한 것이라것을 명심하세요.

[At often, the goat-time install a error is vomit.]

(텍스트: 종종 염소시간에 에러를 설치하는것은 토하는것이다)

(Laughter)

(웃음)

Then comes the first part of the question.

그리고 질문의 첫번째 부분이 나옵니다.

[How many times like the wind, a pole, and the dragon?]

(텍스트: 얼마나 자주 바람, 막대기, 용처럼 ?)

(Laughter)

(웃음)

Then comes my favorite part of the question.

그리고 이 질문 중 제가 좋아하는 부분이 나옵니다.

[This insult to father's stones?]

(텍스트: 이것 아버지의 돌을 모욕한다?)

(Laughter)

(웃음)

And then comes the ending, which is my favorite part of the whole thing.

그리고 이 질문 전체중 제가 좋아하는 마지막 부분이 나옵니다. (텍스트: 여러분의 어리석음을 사과하세요. 많이 감사드립니다.)

[Please apologize for your stupidity. There are a many thank you.]

(웃음)

(Laughter)

자, 컴퓨터 번역은 아직 질이 문제입니다.

OK, so computer translation, not yet good enough. So back to the question. So we need people to translate the whole web. So now the next question you may have is, well, why can't we just pay people to do this? We could pay professional translators to translate the whole web. We could do that. Unfortunately, it would be extremely expensive. For example, translating a tiny fraction of the whole web, Wikipedia, into one other language, Spanish. OK? Wikipedia exists in Spanish, but it's very small compared to the size of English. It's about 20 percent of the size of English. If we wanted to translate the other 80 percent into Spanish, it would cost at least 50 million dollars -- and this is even at the most exploited, outsourcing country out there. So it would be very expensive. So what we want to do is, we want to get 100 million people translating the web into every major language for free.

이제 다시 원래 질문으로 돌아가죠. 우리는 웹 전체를 번역할 사람들이 필요합니다. 그럼 여러분은 번역가에게 돈을 주고 번역을 시키면 되지 않겠냐고 물으실지 모르죠. 전문 번역가에게 돈을 주고 웹 전체를 번역 할 수 있겠지요. 그렇게 할 수 있어요. 하지만 불행히도 비용이 무지막지하게 들어갑니다. 예를들어, 위키피디아 같은 웹 전체의 아주 작은 부분을 스페인어로 번역하려면, 물론 스페인어 위키피디아가 있지만, 영어페이지에 비하면 매우 작은 분량입니다. 영어판의 약 20% 정도지요. 나머지 80%를 스페인어로 번역을 하려면 학대받는 정도로 임금이 저렴한 국가로 아웃소싱하더라도 약 5천만달러가 들겁니다. 웹 전체를 번역하려면 엄청난 돈이 들겠지요. 우리는 모든 웹페이지를 모든 주요 언어로, 그리고 무료로 번역해 줄 1억명 정도의 사람이 필요합니다.

If this is what you want to do, you quickly realize you're going to run into two big hurdles, two big obstacles. The first one is a lack of bilinguals. So I don't even know if there exists 100 million people out there using the web who are bilingual enough to help us translate. That's a big problem. The other problem you're going to run into is a lack of motivation. How are we going to motivate people to actually translate the web for free? Normally, you have to pay people to do this. So how are we going to motivate them to do it for free? When we were starting to think about this, we were blocked by these two things. But then we realized, there's a way to solve both these problems with the same solution. To kill two birds with one stone. And that is to transform language translation into something that millions of people want to do and that also helps with the problem of lack of bilinguals, and that is language education.

자, 이런식으로 일을 하려면 두 개의 큰 문제가 있다는 것을 금방 알수 있습니다. 첫번째는 이중언어 구사자들의 부족이죠. 웹을 사용하는 사람중 우리의 번역을 도와줄 수 있을 정도로 이중언어를 잘아는 사람이 1억명 정도가 되는가도 의문이죠. 이건 큰 문제입니다. 두번째는 동기와 의욕 문제입니다. 웹을 무료로 번역하도록 어떻게 사람들에게 동기와 의욕을 줄 수 있을까요? 번역을 하려면 보통 돈을 지불해야 하죠. 그런데 어떻게 하면 무료로 번역을 하게 만들 수 있을까요? 애당초에 우리는 이 두가지 문제에 걸렸었지요. 그런데 우리는 한가지 해결책으로 이 두 문제를 동시에 해결수 있다는 것을 깨달았지요. 일석이조의 방법을 찾은거죠. 그것은 번역 작업을 수백만명의 사람들이 원하는 언어교육으로 탈바꿈 시키는 것인데 그렇게 하면 이중언어 구사자가 부족한 문제도 해결할 수 있죠.

So it turns out that today, there are over 1.2 billion people learning a foreign language. People really want to learn a foreign language. And it's not just because they're being forced to do so in school. In the US alone, there are over five million people who have paid over $500 for software to learn a new language. So people really want to learn a new language. So what we've been working on for the last year and a half is a new website -- it's called Duolingo -- where the basic idea is people learn a new language for free while simultaneously translating the web. And so basically, they're learning by doing.

현재 전세계에서 외국어를 배우고 있는 사람은 12억명이 넘습니다. 사람들은 정말 외국어를 배우고 싶어합니다. 이것은 단순히 학교에서 배우라고 그래서가 아니지요. 예를들어, 미국만 보더라도, 외국어 교육 소프트웨어에를 살려고 500달러 이상을 지불한 사람이 500만 명이 넘습니다. 사람들은 정말로 새로운 외국어를 배우기를 원합니다. 그래서 저희들은 지난 일년반 동안 듀오링고라는 웹사이트를 개발하고 있습니다. 우리의 기본 아이디어는 사람들이 웹을 번역하면서 무료로 외국어를 배우게 하자는거죠. 간단히 말하면 번역을 하며 외국어를 배우게 하는거죠.

So the way this works is whenever you're a just a beginner, we give you very simple sentences. There's a lot of very simple sentences on the web. We give you very simple sentences along with what each word means. And as you translate them and as you see how other people translate them, you start learning the language. And as you get more advanced, we give you more complex sentences to translate. But at all times, you're learning by doing.

우리는 초보자에게는 매우 간단한 문장을 주지요. 웹 페이지에는 물론 간단한 문장들이 많이 있지요. 우리는 매우 간단한 문장과 각 단어의 의미를 보여주지요. 그러면 다른 사람들이 번역한 것을 보고 번역을 하면서 언어를 배울 수 있지요. 그리고 외국어 수준이 올라갈 수록 우리는 번역하기 더 복잡한 문장들을 보내죠. 계속해서 실지로 번역을 하면서 배우는 거죠. 그런데 히안한 것은 이 방법이 외국어를 배우는데

Now, the crazy thing about this method is that it actually really works. People are really learning a language. We're mostly done building it and now we're testing it. People really can learn a language with it. And they learn it about as well as the leading language learning software. So people really do learn a language. And not only do they learn it as well, but actually it's more interesting. Because with Duolingo, people are learning with real content. As opposed to learning with made-up sentences, people are learning with real content, which is inherently interesting. So people really do learn a language.

매우 효과적이라는 겁니다. 사람들이 이런 방법을 사용해서 정말로 외국어를 배웁니다. 우리는 이 시스템을 거의 완성했고 지금은 테스트중이죠. 듀오링고를 통해 정말로 외국어를 배울 수 있습니다. 듀오링고의 성과는 일류 외국어 학습 소프트웨어와 비등합니다. 그래서 사람들은 정말로 언어를 배울 수 있죠. 듀오링고는 다른 방법에 못지 않게 외국어를 쉽게 배울 있수 있을 뿐만아니라 실제 컨텐츠를 사용하기 때문에 더 흥미가 있지요. 꾸며서 만든 문장과는 달리 실제 컨텐츠를 사용하니까 본질적으로 더 흥미가 있지요. 그래서 사람들은 정말로 외국어를 배우게 되지요.

But perhaps more surprisingly, the translations that we get from people using the site, even though they're just beginners, the translations that we get are as accurate as those of professional language translators, which is very surprising. So let me show you one example. This is a sentence that was translated from German into English. The top is the German. The middle is an English translation that was done by a professional translator who we paid 20 cents a word for this translation. And the bottom is a translation by users of Duolingo, none of whom knew any German before they started using the site. If you can see, it's pretty much perfect. Of course, we play a trick here to make the translations as good as professional language translators. We combine the translations of multiple beginners to get the quality of a single professional translator.

하지만 더욱 놀랄만한 것은, 우리의 사이트에서 얻어낸 번역물들은 초보자의 번역도 전문 번역가의 번역물 만큼 정확하다는 것입니다. 한가지 예를 보여드리지요. 이 문장은 독일어를 영어로 번역한 것입니다. 윗부분은 독일어입니다. 중간 부분은 단어당 20센트를 받던 영문 전문 번역가가 번역한 문장입니다. 그리고 밑 부분은 듀오링고 서비스를 사용하기 전에는 독일어를 전혀 몰랐던 듀오링고 사용자가 번역한 문장입니다. 보시듯이 거의 완벽한 번역이라고 할 수 있죠. 물론 우리는 전문 번역가의 수준에 도달하기 위해 약간의 속임수를 쓰죠. 우리는 전문 번역가 수준에 도달하도록 여러 초보자자들의 번역을 합치지요.

Now, even though we're combining the translations, the site actually can translate pretty fast. So let me show you, this is our estimates of how fast we could translate Wikipedia from English into Spanish. Remember, this is 50 million dollars' worth of value. So if we wanted to translate Wikipedia into Spanish, we could do it in five weeks with 100,000 active users. And we could do it in about 80 hours with a million active users. Since all the projects my group has worked on so far have gotten millions of users, we're hopeful that we'll be able to translate extremely fast.

우리가 여러 사람들의 번역을 합친다고 해도 듀오링고는 꽤 빨리 번역을 할 수 있습니다. 제가 보여드리죠. 이 슬라이드는 영문 위키피디아를 스페인어로 얼마나 빨리 번역 할 수 있는지를 보여주는 예측 수치입니다. 이건 5천만달러에 해당하는 외국어 교육 시스템이라는 것을 기억하세요. 위키피디아를 스페인어로 번역한다면, 듀오링고는 약 10만 명의 유저가 도와주면 5주 안에 할 수 있지요. 그리고 백만 명이 도와주면 80시간 이내에 할 수 있습니다. 저희가 지금까지 해온 모든 프로젝트에는 모두 수백만명의 유저가 있었으니까 듀오링고 프로젝트를 사용하면 정말로 빨리 번역을 할 수 있을 겁니다.

Now, the thing that I'm most excited about with Duolingo is I think this provides a fair business model for language education. So here's the thing: The current business model for language education is the student pays, and in particular, the student pays Rosetta Stone 500 dollars.

이제 제가 듀오링코에 대해 가장 기쁘게 생각하는 것은 언어교육에 대해 공정한 사업모델을 제공한다는 점입니다. 바로 이런것이죠. 현재의 언어교육 모델은 학생이 비용을 지불하는 것인데, 특히 학생들이 로제타스톤에 500달러를 주는 것이지요.

(Laughter)

(웃음)

That's the current business model. The problem with this business model is that 95 percent of the world's population doesn't have 500 dollars. So it's extremely unfair towards the poor. This is totally biased towards the rich. Now, see, in Duolingo, because while you learn, you're actually creating value, you're translating stuff -- which, for example, we could charge somebody for translations, so this is how we could monetize this. Since people are creating value while they're learning, they don't have to pay with their money, they pay with their time. But the magical thing here is that is time that would have had to have been spent anyways learning the language. So the nice thing about Duolingo is, I think, it provides a fair business model -- one that doesn't discriminate against poor people.

그게 현재 사업모델이죠. 이런 사업모델의 문제점은 전 세계 인구의 95%는 500달러를 가지고 있지 않다는 점입니다. 그래서 가난한 사람들에게는 매우 불공정한 모델입니다. 이것은 부자들에게 편향된 사업모델이지요. 자 보시죠, 듀오링고에서는, 사용자들이 배우면서 번역이라는 가치를 창조하게 됩니다. 예를들면, 누구에게 번역료를 과금할 수 도 있다는 것이죠. 즉, 이런식으로 돈을 벌 수도 있다는 말이지요. 듀오링고 유저들은 배우면서 가치를 창조하기 때문에 돈을 지불 할 필요가 없죠. 이미 시간으로 지불하니까요. 그런데 여기서 기가 막히게 좋은 것은 유저들이 시간으로 교육비를 내지만 그시간은 어차피 언어를 배우는데 소비해야 하는 시간이라는 거죠. 그래서 듀오링고의 정말로 좋은 점은 가난한 사람들을 차별하지 않는 공평한 사업모델을 제공한다는 것입니다. 이게 그 사이트입니다. 감사합니다.

So here's the site. Thank you.

(박수)

(Applause)

이것이 듀오링고 사이트입니다.

We haven't yet launched, but if you go there, you can sign up to be part of our private beta, which is probably going to start in three or four weeks. We haven't yet launched it.

아직 공식으로 발표하지는 않았지요. 하지만 그 사이트에 가보면, 베타 서비스에 가입 할 수 있는데, 이 사이트는 아마도 3, 4주내로 작업을 시작 할 것입니다. 듀오링고 사이트는 아직 공식으로 발표되지 않았습니다.

By the way, I'm the one talking here, but Duolingo is the work of a really awesome team,

여기서 제가 혼자 듀오링고에 대해 말하고 있지만 듀오링고는 다음을 포함한 정말로 훌륭한 팀의 협력작입니다.

some of whom are here. So thank you.

감사합니다.

(Applause)

(박수)

How many of you had to fill out a web form where you've been asked to read a distorted sequence of characters like this? How many of you found it really annoying?

(Laughter)

OK, outstanding. So I invented that.

(웃음)

(Laughter)

그걸 발명한 사람들 중의 한 명이었죠.

[Help! I've been waiting for over 20 minutes and nothing happens.]

메세지 : "도와주세요! 20분 기다렸는데 아무런 반응이 없습니다!"

(Laughter)

(웃음)

This person thought they needed to wait. This, of course, is not as bad as this poor person.

그 유저는 계속 기다려야한다고 생각했던 거죠. 그래도 그건 이 캡차 보다는 낳죠.

(Laughter)

(웃음)

(Laughter)

그래서 좀 미안하다는 생각을 했어요.

So then I started feeling bad.

(웃음)

(Laughter)

그런데 웹페이지의 보안이 캡차에 의존하기 때문에

(Laughter)

처음에 하는 일은 책을 스캔하는 것이죠.

So you start with a book and then you scan it.

책을 스캔하는것은

(Applause)

물론,

[bad Christians]

더 나쁜 상황은 이 단어가

But it's even worse, because the website where we showed this actually happened to be called The Embassy of the Kingdom of God.

하나님의 왕국 단체라고 불리는 사이트에서 보여 줬다는거죠. (웃음)

(Laughter)

이런..

Oops.

(웃음)

(Laughter)

여기 또다른 나쁜예가 있습니다.

Here's another really bad one. JohnEdwards.com

JohnEdwards.com (텍스트: 젠장할 자유당)

[Damn liberal]

(Laughter)

(웃음)

[invisible toaster]

그 화면을 캡쳐하는것입니다.

(Laughter)

(웃음)

That's how it works.

이렇게 작동하는 거죠.

(Laughter)

수만개의 이런 캡처가 있죠.

There are tens of thousands of these. Some of them are very cute.

어떤 것은 귀엽죠. (텍스트: 꽉 잡아)

[clenched it]

(웃음)

(Laughter)

어떤것은 더 재밌습니다.

Some of them are funnier.

"취한 설립자들"

[stoned Founders]

(Laughter)

(웃음)

And some of them, like paleontological shvisle ...

그리고

(Laughter)

"paleontological shvisle" 같은 것에는

they contain Snoop Dogg.

Snoop Dogg의 사진이 나타나기도 합니다.

(Laughter)

(웃음)

(Laughter)

(웃음)

(Laughter)

하여튼 제가 그에게 질문을 던졌지요.

[At often, the goat-time install a error is vomit.]

(텍스트: 종종 염소시간에 에러를 설치하는것은 토하는것이다)

(Laughter)

(웃음)

Then comes the first part of the question.

그리고 질문의 첫번째 부분이 나옵니다.

[How many times like the wind, a pole, and the dragon?]

(텍스트: 얼마나 자주 바람, 막대기, 용처럼 ?)

(Laughter)

(웃음)

Then comes my favorite part of the question.

그리고 이 질문 중 제가 좋아하는 부분이 나옵니다.

[This insult to father's stones?]

(텍스트: 이것 아버지의 돌을 모욕한다?)

(Laughter)

(웃음)

And then comes the ending, which is my favorite part of the whole thing.

그리고 이 질문 전체중 제가 좋아하는 마지막 부분이 나옵니다. (텍스트: 여러분의 어리석음을 사과하세요. 많이 감사드립니다.)

[Please apologize for your stupidity. There are a many thank you.]

(웃음)

(Laughter)

자, 컴퓨터 번역은 아직 질이 문제입니다.

(Laughter)

(웃음)

So here's the site. Thank you.

(박수)

(Applause)

이것이 듀오링고 사이트입니다.

We haven't yet launched, but if you go there, you can sign up to be part of our private beta, which is probably going to start in three or four weeks. We haven't yet launched it.

By the way, I'm the one talking here, but Duolingo is the work of a really awesome team,

여기서 제가 혼자 듀오링고에 대해 말하고 있지만 듀오링고는 다음을 포함한 정말로 훌륭한 팀의 협력작입니다.

some of whom are here. So thank you.

감사합니다.

(Applause)

(박수)

Luis von Ahn: Massive-scale online collaboration

Luis von Ahn: Massive-scale online collaboration

Related talks

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Adam Ostrow: After your final status update

Clay Shirky: How cognitive surplus will change the world

Lýdia Machová: The secrets of learning a new language

John McWhorter: Are Elvish, Klingon, Dothraki and Na'vi real languages?

Jimmy Wales: The birth of Wikipedia

Related talks

Aaron Koblin: Visualizing ourselves ... with crowd-sourced data

Adam Ostrow: After your final status update

Clay Shirky: How cognitive surplus will change the world

Lýdia Machová: The secrets of learning a new language

John McWhorter: Are Elvish, Klingon, Dothraki and Na'vi real languages?

Jimmy Wales: The birth of Wikipedia