I'm going to be talking about statistics today. If that makes you immediately feel a little bit wary, that's OK, that doesn't make you some kind of crazy conspiracy theorist, it makes you skeptical. And when it comes to numbers, especially now, you should be skeptical. But you should also be able to tell which numbers are reliable and which ones aren't. So today I want to try to give you some tools to be able to do that. But before I do, I just want to clarify which numbers I'm talking about here. I'm not talking about claims like, "9 out of 10 women recommend this anti-aging cream." I think a lot of us always roll our eyes at numbers like that. What's different now is people are questioning statistics like, "The US unemployment rate is five percent." What makes this claim different is it doesn't come from a private company, it comes from the government.
Danas ću govoriti o statistici. Ako se zbog toga odmah osećate pomalo obazrivo, to je u redu; to vas ne čini ludim teoretičarem zavere, već vas čini skeptičnim. A kada se radi o brojevima, pogotovo sada, treba da budete skeptični. Ali, takođe bi trebalo da možete da prepoznate koji brojevi su pouzdani, a koji nisu. Danas želim da probam da vam pružim izvesna pomagala da biste to umeli. Pre nego što to uradim, želim samo da razjasnim o kojim brojevima ovde govorim. Ne govorim o tvrdnjama poput: „Devet od deset žena preporučuje ovu kremu protiv bora.“ Mislim da veliki broj nas uvek prevrće očima na takve brojeve. Ono što je sada drugačije je što ljudi dovode u pitanje podatke poput: „Nezaposlenost u SAD je pet procenata.“ Ono po čemu je ova tvrdnja drugačija je to što ne proističe iz privatne firme, već iz vlade.
About 4 out of 10 Americans distrust the economic data that gets reported by government. Among supporters of President Trump it's even higher; it's about 7 out of 10. I don't need to tell anyone here that there are a lot of dividing lines in our society right now, and a lot of them start to make sense, once you understand people's relationships with these government numbers. On the one hand, there are those who say these statistics are crucial, that we need them to make sense of society as a whole in order to move beyond emotional anecdotes and measure progress in an [objective] way. And then there are the others, who say that these statistics are elitist, maybe even rigged; they don't make sense and they don't really reflect what's happening in people's everyday lives.
Oko četiri od deset Amerikanaca ne veruje ekonomskim podacima o kojima izveštava vlast. Među pristalicama predsednika Trampa, taj broj je još veći; oko sedam od deset. Ne treba nikome ovde da pričam da u našem društvu danas postoji mnogo linija razdvajanja, a mnoge počinju da imaju smisla kada razumete odnos ljudi prema tim vladinim brojevima. Sa jedne strane, tu su oni koji kažu da su ovi podaci od ključnog značaja, da su nam potrebni da bi nam društvo kao celina imalo smisla, kako bismo prevazišli emocionale anegdote i merili napredak na objektivan način. Zatim, tu su oni drugi, koji kažu da su ovi podaci elitistički, možda čak i namešteni; da nemaju smisla i ne odražavaju zaista ono što se dešava u svakodnevnom životu ljudi.
It kind of feels like that second group is winning the argument right now. We're living in a world of alternative facts, where people don't find statistics this kind of common ground, this starting point for debate. This is a problem. There are actually moves in the US right now to get rid of some government statistics altogether. Right now there's a bill in congress about measuring racial inequality. The draft law says that government money should not be used to collect data on racial segregation. This is a total disaster. If we don't have this data, how can we observe discrimination, let alone fix it? In other words: How can a government create fair policies if they can't measure current levels of unfairness? This isn't just about discrimination, it's everything -- think about it. How can we legislate on health care if we don't have good data on health or poverty? How can we have public debate about immigration if we can't at least agree on how many people are entering and leaving the country? Statistics come from the state; that's where they got their name. The point was to better measure the population in order to better serve it. So we need these government numbers, but we also have to move beyond either blindly accepting or blindly rejecting them. We need to learn the skills to be able to spot bad statistics.
Nekako deluje da druga grupa trenutno pobeđuje u raspravi. Živimo u svetu alternativnih činjenica, gde ljudi ne smatraju statističke podatke nekom vrstom zajedničke osnove, početnom tačkom za debatu. To je problem. Trenutno zapravo postoje pokreti u SAD da se potpuno otarasimo nekih vladinih statističkih podataka. Baš sada postoji predlog zakona u kongresu o merenju rasne nejednakosti. Nacrt zakona kaže da novac vlade ne treba koristiti za prikupljanje podataka o rasnoj segregaciji. To je potpuna katastrofa. Ako nemamo ove podatke, kako možemo da posmatramo diskriminaciju, a kamoli da je popravimo? Drugim rečima, kako vlast može da stvara pravednu politiku ako ne može da izmeri trenutni nivo nepravednosti? Ovde se ne radi samo o diskriminaciji, već o svemu; razmislite o tome. Kako možemo donositi zakone o zdravstvenoj zaštiti ako nemamo dobre podatke o zdravlju ili siromaštvu? Kako možemo javno debatovati o imigraciji ako se ne možemo makar složiti oko toga koliko ljudi ulazi u zemlju i izlazi iz nje? Statistički podaci proističu iz države; tako su dobili svoje ime. Poenta je bila da se dobiju bolje mere stanovništva kako bi mu se bolje služilo. Dakle, potrebni su nam ti vladini brojevi, ali treba i da prevaziđemo njihovo slepo prihvatanje, kao i slepo odbacivanje. Potrebne su nam veštine da bismo mogli da uočimo loše podatke.
I started to learn some of these when I was working in a statistical department that's part of the United Nations. Our job was to find out how many Iraqis had been forced from their homes as a result of the war, and what they needed. It was really important work, but it was also incredibly difficult. Every single day, we were making decisions that affected the accuracy of our numbers -- decisions like which parts of the country we should go to, who we should speak to, which questions we should ask. And I started to feel really disillusioned with our work, because we thought we were doing a really good job, but the one group of people who could really tell us were the Iraqis, and they rarely got the chance to find our analysis, let alone question it. So I started to feel really determined that the one way to make numbers more accurate is to have as many people as possible be able to question them.
Počela sam da ih stičem kada sam radila na odeljenju za statistiku koje je deo Ujedinjenih nacija. Naš posao je bio da saznamo koliko Iračana je proterano iz svojih domova kao posledica rata i šta im je potrebno. To je bio zaista važan posao, ali, takođe, neverovatno težak. Svakoga dana smo donosili odluke koje su uticale na tačnost naših brojeva - odluke poput toga u koje delove zemlje bi trebalo da idemo, sa kim treba da razgovaramo, koja pitanja treba da postavljamo. Počela sam da se osećam zaista razočarano u vezi sa našim radom, jer smo mislili da zaista dobro obavljamo posao, ali ona grupa ljudi koja bi stvarno mogla da nam ispriča stvari bili su Iračani, a oni su retko imali priliku da naiđu na našu analizu, a kamoli da je preispituju. Zato sam postala zaista odlučna da je jedini način da postignemo da brojevi budu tačniji
So I became a data journalist. My job is finding these data sets and sharing them with the public. Anyone can do this, you don't have to be a geek or a nerd. You can ignore those words; they're used by people trying to say they're smart while pretending they're humble. Absolutely anyone can do this.
da omogućimo da što više ljudi može da ih preispituje. Tako sam postala novinarka koja se bavi podacima. Moj posao je da pronađem skupove podataka i podelim ih sa javnošću. Bilo ko to može, ne morate biti štreber ili bubalica. Možete zanemariti te reči; koriste ih ljudi koji pokušavaju da kažu da su pametni dok se pretvaraju da su skromni. Apsolutno svako to može.
I want to give you guys three questions that will help you be able to spot some bad statistics. So, question number one is: Can you see uncertainty? One of things that's really changed people's relationship with numbers, and even their trust in the media, has been the use of political polls. I personally have a lot of issues with political polls because I think the role of journalists is actually to report the facts and not attempt to predict them, especially when those predictions can actually damage democracy by signaling to people: don't bother to vote for that guy, he doesn't have a chance. Let's set that aside for now and talk about the accuracy of this endeavor.
Htela bih da vam postavim tri pitanja koja će vam pomoći da možete da primetite loše statističke podatke. Dakle, pitanje broj jedan glasi: možete li da uočite nepouzdanost? Jedna od stvari koja je zaista promenila odnos ljudi prema brojevima, pa čak i njihovo poverenje u medije, bilo je korišćenje političkih anketa. Lično imam mnogo problema sa političkim anketama jer smatram da je uloga novinara da izveštava o činjenicama, a ne da pokušava da ih predvidi, naročito kada ta previđanja mogu da naškode demokratiji davanjem signala ljudima da se ne trude da glasaju za nekog jer nema šanse. Stavimo to po strani za sada i popričajmo o preciznosti ovog nastojanja.
Based on national elections in the UK, Italy, Israel and of course, the most recent US presidential election, using polls to predict electoral outcomes is about as accurate as using the moon to predict hospital admissions. No, seriously, I used actual data from an academic study to draw this. There are a lot of reasons why polling has become so inaccurate. Our societies have become really diverse, which makes it difficult for pollsters to get a really nice representative sample of the population for their polls. People are really reluctant to answer their phones to pollsters, and also, shockingly enough, people might lie. But you wouldn't necessarily know that to look at the media. For one thing, the probability of a Hillary Clinton win was communicated with decimal places. We don't use decimal places to describe the temperature. How on earth can predicting the behavior of 230 million voters in this country be that precise? And then there were those sleek charts. See, a lot of data visualizations will overstate certainty, and it works -- these charts can numb our brains to criticism. When you hear a statistic, you might feel skeptical. As soon as it's buried in a chart, it feels like some kind of objective science, and it's not.
Na osnovu državnih izbora u Ujedinjenom Kraljevstvu, Italiji, Izraelu i, naravno, najskorijih predsedničkih izbora u SAD, korišćenje anketa za predviđanje ishoda izbora otprilike je tačno kao korišćenje meseca za predviđanje prijema u bolnice. Ne, ozbiljno, koristila sam stvarne podatke iz akademske studije da bih ovo nacrtala. Postoji mnogo razloga zašto je anketiranje postalo tako netačno. Naša društva su postala veoma raznolika, što otežava anketarima da dobiju fini reprezentativni uzorak stanovništva za svoje ankete. Ljudi se nerado javljaju na telefon anketarima, a, takođe, ko bi rekao, ljudi mogu da slažu. Međutim, to nećete nužno znati pogledavši medije. Između ostalog, o verovatnoći pobede Hilari Klinton izveštavano je pomoću decimalnih brojeva. Ne koristimo decimale da izrazimo temperaturu. Kako, pobogu, predviđanje ponašanja 230 miliona glasača u ovoj zemlji može biti tako precizno? Zatim, tu su bili oni doterani grafikoni. Vidite, mnogo vizualizacije podataka će prenaglasiti sigurnost, i to deluje - ovi grafikoni mogu da otupe naš mozak za kriticizam. Kada čujete podatak, možda ćete biti skeptični. Čim je upakovan u grafikon, čini se kao nekakva objektivna nauka, a nije.
So I was trying to find ways to better communicate this to people, to show people the uncertainty in our numbers. What I did was I started taking real data sets, and turning them into hand-drawn visualizations, so that people can see how imprecise the data is; so people can see that a human did this, a human found the data and visualized it. For example, instead of finding out the probability of getting the flu in any given month, you can see the rough distribution of flu season. This is --
Zato sam pokušavala da pronađem načine da ovo bolje prenesem ljudima, da im pokažem nesigurnost u našim brojevima. Počela sam da uzimam stvarne skupove podataka i pretvaram ih u vizualizacije nacrtane rukom, tako da ljudi mogu da vide koliko su podaci neprecizni; tako da mogu da vide da je to uradilo ljudsko biće, čovek je našao podatke i vizualizovao ih. Na primer, umesto saznavanja verovatnoće da dobijete grip u bilo kom mesecu, možete videti grubu raspodelu sezone gripa. Ovo je -
(Laughter)
(Smeh)
a bad shot to show in February. But it's also more responsible data visualization, because if you were to show the exact probabilities, maybe that would encourage people to get their flu jabs at the wrong time.
loš grafikon za pokazivanje u februaru. Međutim, takođe je odgovornija vizualizacija podataka, jer ako biste pokazali tačne verovatnoće, možda bi to podstaklo ljude da dobiju vakcine protiv gripa u pogrešno vreme.
The point of these shaky lines is so that people remember these imprecisions, but also so they don't necessarily walk away with a specific number, but they can remember important facts. Facts like injustice and inequality leave a huge mark on our lives. Facts like Black Americans and Native Americans have shorter life expectancies than those of other races, and that isn't changing anytime soon. Facts like prisoners in the US can be kept in solitary confinement cells that are smaller than the size of an average parking space.
Svrha ovih nesigurnih linija je da ljudi upamte te nepreciznosti, ali i da ne ponesu nužno sa sobom određeni broj, već da mogu zapamtiti važne činjenice. Činjenice poput toga da nepravde i nejednakosti ostavljaju veliki trag u našem životu. Činjenice poput toga da američki crnci i Indijanci imaju kraći životni vek od pripadnika drugih rasa, a to se neće u skorije vreme promeniti. Činjenice poput toga da se zatvorenici u SAD mogu držati u samicama koje su manje od veličine prosečnog mesta za parkiranje.
The point of these visualizations is also to remind people of some really important statistical concepts, concepts like averages. So let's say you hear a claim like, "The average swimming pool in the US contains 6.23 fecal accidents." That doesn't mean every single swimming pool in the country contains exactly 6.23 turds. So in order to show that, I went back to the original data, which comes from the CDC, who surveyed 47 swimming facilities. And I just spent one evening redistributing poop. So you can kind of see how misleading averages can be.
Svrha ovih vizualizacija takođe je da se ljudi podsete nekih veoma važnih statističkih koncepata, kao što su prosečne vrednosti. Recimo da čujete tvrdnju kao što je: „Prosečan bazen u SAD sadrži 6,23 fekalnih nezgoda.“ To ne znači da svaki bazen u zemlji sadrži tačno 6,23 komada izmeta. Kako bih to pokazala, okrenula sam se prvobitnim podacima iz Centra za kontrolu i prevenciju bolesti koji je izvršio procenu 47 objekata za plivanje. A ja sam samo provela jedno veče u preraspodeli kake. Tako da možete videti kako prosek može da obmane.
(Laughter)
(Smeh)
OK, so the second question that you guys should be asking yourselves to spot bad numbers is: Can I see myself in the data? This question is also about averages in a way, because part of the reason why people are so frustrated with these national statistics, is they don't really tell the story of who's winning and who's losing from national policy. It's easy to understand why people are frustrated with global averages when they don't match up with their personal experiences. I wanted to show people the way data relates to their everyday lives. I started this advice column called "Dear Mona," where people would write to me with questions and concerns and I'd try to answer them with data. People asked me anything. questions like, "Is it normal to sleep in a separate bed to my wife?" "Do people regret their tattoos?" "What does it mean to die of natural causes?"
U redu, drugo pitanje koje treba da postavite sebi da biste uočili loše brojeve je da li vidite sebe u podacima. Ovo pitanje se na neki način odnosi i na prosečne vrednosti, jer je deo razloga zašto ljude toliko frustriraju ovi nacionalni statistički podaci to što ne iznose priču o tome ko pobeđuje a ko je na gubitku usled državne politike. Lako je razumeti zašto ljude frustriraju globalne prosečne vrednosti kada se ne poklapaju sa njihovim ličnim iskustvom. Htela sam da pokažem ljudima kako podaci imaju veze sa njihovim svakodnevnim životom. Pokrenula sam rubriku za savete pod nazivom „Draga Mona“, gde bi mi ljudi pisali i iznosili svoja pitanja i probleme, a ja bih pokušala da im odgovorim pomoću podataka. Ljudi su me svašta pitali, na primer: „Da li je normalno da spavam u odvojenom krevetu od svoje žene?“ „Da li se ljudi kaju zbog svojih tetovaža?“ „Šta znači umreti prirodnom smrću?“
All of these questions are great, because they make you think about ways to find and communicate these numbers. If someone asks you, "How much pee is a lot of pee?" which is a question that I got asked, you really want to make sure that the visualization makes sense to as many people as possible. These numbers aren't unavailable. Sometimes they're just buried in the appendix of an academic study. And they're certainly not inscrutable; if you really wanted to test these numbers on urination volume, you could grab a bottle and try it for yourself.
Sva ta pitanja su sjajna, jer vas teraju da razmislite o tome kako da saznate i saopštite ove brojeve. Ako vas neko pita: „Koliko piškenja je mnogo?“, što je pitanje koje sam ja dobila, želite da se postarate da vizualizacija ima smisla za što je više ljudi moguće. Ovi brojevi nisu nedostupni. Ponekad su samo zakopani u prilogu akademske studije. A svakako nisu nedokučivi; ako zaista želite da proverite ove brojeve o količini mokrenja, možete uzeti bočicu i pokušati sami.
(Laughter)
(Smeh)
The point of this isn't necessarily that every single data set has to relate specifically to you. I'm interested in how many women were issued fines in France for wearing the face veil, or the niqab, even if I don't live in France or wear the face veil. The point of asking where you fit in is to get as much context as possible. So it's about zooming out from one data point, like the unemployment rate is five percent, and seeing how it changes over time, or seeing how it changes by educational status -- this is why your parents always wanted you to go to college -- or seeing how it varies by gender. Nowadays, male unemployment rate is higher than the female unemployment rate. Up until the early '80s, it was the other way around. This is a story of one of the biggest changes that's happened in American society, and it's all there in that chart, once you look beyond the averages. The axes are everything; once you change the scale, you can change the story.
Suština ovoga nije nužno da svaki skup podataka mora da se izričito odnosi na vas. Mene zanima koliko žena je dobilo novčanu kaznu u Francuskoj za nošenje vela na licu, ili nikaba, čak iako ne živim u Francuskoj niti nosim veo preko lica. Poenta postavljanja pitanja gde se vi uklapate je da dobijete što je više konteksta moguće. Dakle, radi se o tome da umanjite sliku sa jednog podataka, na primer, stopa nezaposlenosti je 5% i vidite kako se menja tokom vremena, ili da vidite kako se menja s obzirom na status obrazovanja - zato su roditelji uvek želeli da idete na fakultet - ili da vidite kako varira s obzirom na pol. Danas je stopa nezaposlenosti muškaraca viša nego stopa nezaposlenosti žena. Do ranih '80-ih godina, bilo je obrnuto. Ovo je priča o jednoj od najvećih promena koja se dogodila u američkom društvu. i sve je na tom grafikonu, kada sagledate stvari izvan proseka. Sve je u osama; kada promenite nivo sagledavanja, možete promeniti priču.
OK, so the third and final question that I want you guys to think about when you're looking at statistics is: How was the data collected? So far, I've only talked about the way data is communicated, but the way it's collected matters just as much. I know this is tough, because methodologies can be opaque and actually kind of boring, but there are some simple steps you can take to check this.
U redu, treće i poslednje pitanje o kojem želim da razmišljate kada posmatrate statističke podatke je kako su podaci prikupljeni. Do sada sam govorila samo o načinu na koji se podaci saopštavaju, ali način njihovog prikupljanja podjednako je bitan. Znam da je ovo teško, jer metodologija može biti nejasna i nekako dosadna, ali postoje jednostavni koraci pomoću kojih možete ovo proveriti.
I'll use one last example here. One poll found that 41 percent of Muslims in this country support jihad, which is obviously pretty scary, and it was reported everywhere in 2015. When I want to check a number like that, I'll start off by finding the original questionnaire. It turns out that journalists who reported on that statistic ignored a question lower down on the survey that asked respondents how they defined "jihad." And most of them defined it as, "Muslims' personal, peaceful struggle to be more religious." Only 16 percent defined it as, "violent holy war against unbelievers." This is the really important point: based on those numbers, it's totally possible that no one in the survey who defined it as violent holy war also said they support it. Those two groups might not overlap at all.
Ovde ću upotrebiti jedan poslednji primer. Jedna anketa je otkrila da 41 odsto muslimana u ovoj zemlji podržava džihad, što je očigledno prilično zastrašujuće i o tome se izveštavalo svuda 2015. godine. Kada hoću da proverim takvu brojku, počeću pronalaženjem originalnog upitnika. Ispostavilo se da su novinari koji su izveštavali o tom podatku zanemarili pitanje nešto niže na anketi koje je pitalo ispitanike kako definišu „džihad“, a većina njih ga je definisala kao „ličnu, mirnu borbu muslimana da budu religiozniji“. Samo 16 procenata ga je definisalo kao „nasilan sveti rat protiv nevernika“. To je zaista bitan deo; na osnovu tih brojeva, sasvim je moguće da niko ko ga je u istraživanju definisao kao nasilni sveti rat nije rekao i da ga podržava. Te dve grupe se možda uopšte ne preklapaju.
It's also worth asking how the survey was carried out. This was something called an opt-in poll, which means anyone could have found it on the internet and completed it. There's no way of knowing if those people even identified as Muslim. And finally, there were 600 respondents in that poll. There are roughly three million Muslims in this country, according to Pew Research Center. That means the poll spoke to roughly one in every 5,000 Muslims in this country.
Takođe, vredi pitati kako je istraživanje sprovedeno. Ovo je bilo nešto što se zove opciona anketa, što znači da je bilo ko mogao da je nađe na internetu i popuni je. Nema načina da se sazna da li se ti ljudi uopšte identifikuju kao muslimani. Naposletku, u toj anketi je bilo 600 ispitanika. U ovoj zemlji ima približno tri miliona muslimana, prema Centru za istraživanje Pju. To znači da se anketa obraćala otprilike jednom od svakih 5 000 muslimana u ovoj zemlji.
This is one of the reasons why government statistics are often better than private statistics. A poll might speak to a couple hundred people, maybe a thousand, or if you're L'Oreal, trying to sell skin care products in 2005, then you spoke to 48 women to claim that they work.
To je jedan od razloga zašto su vladini statistički podaci često bolji od privatnih. Anketa se može obratiti par stotina ljudi, možda hiljadu, ili ako ste Loreal i pokušavate da prodate proizvode za negu kože 2005. godine, onda ste razgovarali sa 48 žena da biste tvrdili da deluju.
(Laughter)
(Smeh)
Private companies don't have a huge interest in getting the numbers right, they just need the right numbers. Government statisticians aren't like that. In theory, at least, they're totally impartial, not least because most of them do their jobs regardless of who's in power. They're civil servants. And to do their jobs properly, they don't just speak to a couple hundred people. Those unemployment numbers I keep on referencing come from the Bureau of Labor Statistics, and to make their estimates, they speak to over 140,000 businesses in this country.
Privatne kompanije nemaju veliki interes da dobiju ispravne brojeve, već su im samo potrebni odgovarajući brojevi. Vladini statističari nisu takvi. Makar u teoriji, sasvim su nepristrasni, ne samo zato što većina njih obavlja svoj posao bez obzira na to ko je na vlasti. Oni su državni službenici. A da bi valjano radili svoj posao, ne govore samo sa par stotina ljudi. Oni brojevi vezani za nezaposlenost na koje se uporno pozivam su iz odeljenja za statistiku Ministarstva za rad, a da bi izvršili svoje procene, oni se obraćaju preko 140 000 firmi u ovoj zemlji.
I get it, it's frustrating. If you want to test a statistic that comes from a private company, you can buy the face cream for you and a bunch of friends, test it out, if it doesn't work, you can say the numbers were wrong. But how do you question government statistics? You just keep checking everything. Find out how they collected the numbers. Find out if you're seeing everything on the chart you need to see. But don't give up on the numbers altogether, because if you do, we'll be making public policy decisions in the dark, using nothing but private interests to guide us.
Kapiram, to frustrira. Ako želite da proverite podatke koji dolaze iz privatne kompanije, možete da kupite kremu za lice za sebe i gomilu prijatelja, isprobate, a ako ne deluje, možete reći da su brojevi bili pogrešni. Međutim, kako da preispitate vladine podatke? Samo uporno sve proveravajte. Saznajte kako su prikupili brojeve. Otkrijte da li na grafikonu vidite sve što treba da vidite. Ali, ne odustajte sasvim od brojeva, jer ako odustanete, donosićemo odluke o javnoj politici u neznanju, isključivo koristeći lične interese kao smernice.
Thank you.
Hvala.
(Applause)
(Aplauz)