Cathy O'Neil: The era of blind faith in big data must end

Algorithms are everywhere. They sort and separate the winners from the losers. The winners get the job or a good credit card offer. The losers don't even get an interview or they pay more for insurance. We're being scored with secret formulas that we don't understand that often don't have systems of appeal. That begs the question: What if the algorithms are wrong?

อัลกอริทึมอยู่ทุกหนแห่ง มันคัดแยกและ กันผู้ชนะออกจากผู้แพ้ ผู้ชนะได้งาน หรือได้ข้อเสนอบัตรเครดิตดีๆ ผู้แพ้ไม่ได้แม้กระทั่งโอกาสเรียกสัมภาษณ์ หรือต้องจ่ายเบี้ยประกันแพงกว่า เรากำลังถูกให้คะแนน จากสูตรลับที่เราไม่เข้าใจ และโดยมากมักไม่มีระบบที่เราจะอุทธรณ์ได้ นั่นทำให้เกิดคำถามขึ้นว่า แล้วถ้าอัลกอริทึมมันผิดล่ะ

To build an algorithm you need two things: you need data, what happened in the past, and a definition of success, the thing you're looking for and often hoping for. You train an algorithm by looking, figuring out. The algorithm figures out what is associated with success. What situation leads to success?

ในการสร้างอัลกอริทึม คุณต้องการสองอย่าง คุณต้องการข้อมูล สิ่งที่เกิดขึ้นในอดีต และนิยามของความสำเร็จ สิ่งที่คุณมองหา หรือหวังว่าจะเจอ คุณฝึกสอนอัลกอริทึม โดยการมองหา และคำนวณ อัลกอริทึมจะคำนวณหา ว่าอะไรที่มีส่วนเกี่ยวข้องกับความสำเร็จ สถานการณ์อย่างไรที่จะนำ ไปสู่ความสำเร็จ

Actually, everyone uses algorithms. They just don't formalize them in written code. Let me give you an example. I use an algorithm every day to make a meal for my family. The data I use is the ingredients in my kitchen, the time I have, the ambition I have, and I curate that data. I don't count those little packages of ramen noodles as food.

ความจริงแล้ว ทุกคนต่าง ก็ใช้อัลกอริทึม เพียงแต่ไม่ได้เขียนเป็นโปรแกรม เป็นทางการ ขอยกตัวอย่างนะคะ ฉันใช้อัลกอริทึมทุกวัน เพื่อทำอาหารสำหรับครอบครัว ข้อมูลที่ฉันใช้ คือวัตถุดิบที่มีในครัว เวลาที่ฉันมี ความตั้งใจที่มี และฉันเองก็กลั่นกรองข้อมูลเหล่านั้น ฉันไม่นับพวกบะหมี่กึ่งสำเร็จรูป ว่าเป็นอาหารนะคะ

(Laughter)

(เสียงหัวเราะ)

My definition of success is: a meal is successful if my kids eat vegetables. It's very different from if my youngest son were in charge. He'd say success is if he gets to eat lots of Nutella. But I get to choose success. I am in charge. My opinion matters. That's the first rule of algorithms.

นิยามความสำเร็จของฉันคือ มื้ออาหารจะถือว่าสำเร็จ ถ้าเด็กๆ ยอมกินผัก มันจะต่างออกไปมาก ถ้าลูกชายคนเล็กของฉันเป็นคนคุมครัว เขาจะบอกว่า ความสำเร็จคือ เขาได้กินนูเทลล่าเยอะๆ แต่ฉันเป็นคนเลือกนิยามความสำเร็จ ฉันเป็นคนรับผิดชอบ ความเห็นของฉันสำคัญ มันเป็นกฏข้อแรกของอัลกอริทึม

Algorithms are opinions embedded in code. It's really different from what you think most people think of algorithms. They think algorithms are objective and true and scientific. That's a marketing trick. It's also a marketing trick to intimidate you with algorithms, to make you trust and fear algorithms because you trust and fear mathematics. A lot can go wrong when we put blind faith in big data.

อัลกอริทึมคือความคิดเห็น ที่ถูกฝังลงในในโค้ดโปรแกรม ซึ่งมันแตกต่างอย่างมาก กับที่คุณ หรือคนทั่วไปคิดถึงอัลกอริทึม พวกเขาคิดว่า อัลกอริทึมมีความ ตรงไปตรงมา เป็นวิทยาศาสตร์ นั่นเป็นแค่กลทางการตลาด และก็เป็นทริกการตลาดนี่แหละ ที่คุกคามคุณด้วยอัลกอริทึม เพื่อจะทำให้คุณเชื่อใจ และกลัวอัลกอริทึม เพราะว่าคุณไว้ใจ และกลัวคณิตศาสตร์ อาจเกิดสิ่งผิดพลาดได้มากมาย เมื่อเรามีศรัทธา อย่างมืดบอดในข้อมูลมหาศาล (big data)

This is Kiri Soares. She's a high school principal in Brooklyn. In 2011, she told me her teachers were being scored with a complex, secret algorithm called the "value-added model." I told her, "Well, figure out what the formula is, show it to me. I'm going to explain it to you." She said, "Well, I tried to get the formula, but my Department of Education contact told me it was math and I wouldn't understand it."

นี่คือ คิริ ซัวเรส เธอเป็นครูใหญ่ โรงเรียนมัธยมแห่งหนึ่งในบรุคลิน ปี 2011 เธอบอกฉันว่า ครูของเธอถูกให้คะแนนจาก อัลกอริทึมที่ซับซ้อน และเป็นความลับ ที่เรียกว่า "โมเดลเพิ่มคุณค่า" ฉันบอกเธอว่า "เอาล่ะ มาดูกันว่าสูตรคืออะไร ให้ฉันดูหน่อย ฉันจะอธิบายให้เธอฟังเอง" เธอบอก "ฉันพยายามจะเอาสูตรมา แต่ทางกระทรวงศึกษาธิการ แจ้งว่ามันเป็นคณิตศาสตร์ และฉันคงไม่เข้าใจ"

It gets worse. The New York Post filed a Freedom of Information Act request, got all the teachers' names and all their scores and they published them as an act of teacher-shaming. When I tried to get the formulas, the source code, through the same means, I was told I couldn't. I was denied. I later found out that nobody in New York City had access to that formula. No one understood it. Then someone really smart got involved, Gary Rubinstein. He found 665 teachers from that New York Post data that actually had two scores. That could happen if they were teaching seventh grade math and eighth grade math. He decided to plot them. Each dot represents a teacher.

มันยิ่งแย่ลงไปกว่านั้นอีก หนังสือพิมพ์นิวยอร์กโพสต์ ทำเรื่องขอตามกฎหมายเสรีภาพข้อมูล และได้ข้อมูลรายชื่อครู รวมถึงผลคะแนน ของครูแต่ละคน แล้วนำมาตีพิมพ์ เหมือนกับว่าจะประจานเหล่าครู เมื่อฉันติดต่อเพื่อขอทราบสูตรการคำนวณ ผ่านช่องทางเดียวกัน กลับได้รับการแจ้งว่า ไม่สามารถให้สูตรได้ ฉันถูกปฏิเสธ และฉันมาพบภายหลังว่า ไม่มีใครในนิวยอร์กสามารถเข้าถึงสูตรนั้นได้ ไม่มีใครเข้าใจมัน จนกระทั่งคนที่ฉลาดมากคนหนึ่ง เข้ามาเกี่ยวข้องด้วย แกรี่ รูบินสไตน์ เขาพบข้อมูลเกี่ยวกับครู 665 คน จากข้อมูลของนิวยอร์คโพสต์ ที่ในความจริงแล้ว มีค่าคะแนนเป็นสองค่า ซึ่งก็อาจจะเป็นไปได้หากพวกเขาสอน คณิตศาสตร์ในชั้นเกรดเจ็ด และในชั้นเกรดแปด เขาตัดสินใจนำข้อมูลพวกนั้นมาวาดกราฟ แต่ละจุดแทนครูแต่ละคน

(Laughter)

(เสียงหัวเราะ)

What is that?

นั่นคืออะไร?

(Laughter)

(เสียงหัวเราะ)

That should never have been used for individual assessment. It's almost a random number generator.

นั่นไม่ควรจะถูกเอามาใช้ ในการประเมินรายบุคคล มันเกือบจะเหมือนการสร้างตัวเลขแบบสุ่มเลย

(Applause)

(เสียงปรบมือ)

But it was. This is Sarah Wysocki. She got fired, along with 205 other teachers, from the Washington, DC school district, even though she had great recommendations from her principal and the parents of her kids.

แต่ก็เป็นไปแล้ว และนี่คือ ซาร่า ไวซอคกี เธอถูกไล่ออก พร้อมกับ ครูคนอื่นๆ อีก 205 คน จากเขตการศึกษาวอชิงตันดีซี ถึงแม้ว่าเธอจะได้รับจดหมายแนะนำตัว ที่ดีมากจากครูใหญ่ของเธอ และจากผู้ปกครองของนักเรียน

I know what a lot of you guys are thinking, especially the data scientists, the AI experts here. You're thinking, "Well, I would never make an algorithm that inconsistent." But algorithms can go wrong, even have deeply destructive effects with good intentions. And whereas an airplane that's designed badly crashes to the earth and everyone sees it, an algorithm designed badly can go on for a long time, silently wreaking havoc.

ฉันรู้ว่า พวกคุณคิดอะไรกันอยู่ โดยเฉพาะเหล่านักวิทยาการข้อมูล ที่เชี่ยวชาญ AI ในที่นี้ คุณคงคิดว่า "แหม ฉันคงไม่มีทาง สร้างอัลกอริทึมที่ไม่แน่นอนอย่างนี้หรอก" แต่อัลกอริทึม ก็สามารถผิดพลาดได้ และถึงขนาดส่งผลเสียหายอย่างขนานใหญ่ ทั้งๆ ที่มีเจตนาดีได้ นี่ไม่เหมือนกับการออกแบบเครื่องบินที่แย่ ซึ่งพอตกลงมา ทุกคนจะมองเห็นได้ แต่อัลกอริทึมที่ออกแบบไม่ดี อาจจะถูกใช้งานอยู่ได้เป็นเวลานาน สร้างหายนะอย่างเงียบๆ ไม่มีใครเห็นได้

This is Roger Ailes.

นี่คือโรเจอร์ เอลส์

(Laughter)

(เสียงหัวเราะ)

He founded Fox News in 1996. More than 20 women complained about sexual harassment. They said they weren't allowed to succeed at Fox News. He was ousted last year, but we've seen recently that the problems have persisted. That begs the question: What should Fox News do to turn over another leaf?

เขาก่อตั้งฟอกซ์นิวส์ในปี 1996 มีผู้หญิงกว่า 20 คนร้องเรียนเรื่อง การคุกคามทางเพศ พวกเธอกล่าวว่า พวกเธอไม่ได้รับโอกาส ที่จะประสบความสำเร็จในฟอกซ์นิวส์ เขาออกจากตำแหน่งเมื่อปีที่แล้ว แต่เราได้เห็นเร็วๆ นี้ว่า ปัญหาเรื่องเพศ ยังคงมีอยู่ ซึ่งนั่นก่อให้เกิดคำถามว่า ฟอกซ์นิวส์ควรจะทำอย่างไร เพื่อจะเปลี่ยนไปในทางที่ดีขึ้น

Well, what if they replaced their hiring process with a machine-learning algorithm? That sounds good, right? Think about it. The data, what would the data be? A reasonable choice would be the last 21 years of applications to Fox News. Reasonable. What about the definition of success? Reasonable choice would be, well, who is successful at Fox News? I guess someone who, say, stayed there for four years and was promoted at least once. Sounds reasonable. And then the algorithm would be trained. It would be trained to look for people to learn what led to success, what kind of applications historically led to success by that definition. Now think about what would happen if we applied that to a current pool of applicants. It would filter out women because they do not look like people who were successful in the past.

แล้วถ้าหากว่าพวกเขาเปลี่ยน กระบวนการว่าจ้าง มาให้ใช้อัลกอริทึมที่ เรียนรู้ด้วยเครื่อง (Machine Learning) ละ นั่นฟังดูดีใช่มั้ยละ ลองคิดดูดีๆ นะคะ ข้อมูลที่ใช้ ข้อมูลอะไรที่จะนำมาใช้ ตัวเลือกที่สมเหตุสมผลคือใบสมัครงาน ของฟอกซ์นิวส์ใน 21 ปีที่ผ่านมา สมเหตุสมผล แล้วนิยามของความสำเร็จละ ตัวเลือกที่ดูเหมาะสมก็คือ คิดดูนะ ใครที่ประสบความสำเร็จ ที่ฟอกซ์นิวส์ ฉันเดาว่า น่าจะเป็นใครสักคนที่อยู่ มาได้สัก 4 ปี และได้เลื่อนตำแหน่ง อย่างน้อยหนึ่งครั้ง ฟังดูเข้าท่าดีใช่มั้ยล่ะ และอัลกอริทึมก็จะถูกฝึกสอน มันจะถูกสอนให้มองหาผู้สมัคร มองหาลักษณะที่จะนำไปสู่ความสำเร็จ ใบสมัครแบบไหน ที่จะประสบความสำเร็จ จากในอดีตที่ผ่านมา ตามนิยามความสำเร็จ ลองคิดดูซิว่า จะเกิดอะไรขึ้น ถ้าเราเอามาประยุกต์ใช้กับ กลุ่มผู้สมัครในปัจจุบัน มันจะคัดกรองเอาผู้หญิงออกไป เพราะผู้หญิงดูไม่เหมือนกับ คนที่จะประสบความสำเร็จในอดีต

Algorithms don't make things fair if you just blithely, blindly apply algorithms. They don't make things fair. They repeat our past practices, our patterns. They automate the status quo. That would be great if we had a perfect world, but we don't. And I'll add that most companies don't have embarrassing lawsuits, but the data scientists in those companies are told to follow the data, to focus on accuracy. Think about what that means. Because we all have bias, it means they could be codifying sexism or any other kind of bigotry.

อัลกอริทึมไม่ได้ ทำให้เกิดความยุติธรรมขึ้นนะคะ ถ้าคุณแค่หลับหูหลับตา เอาอัลกอริทึมมาใช้ มันไม่ทำให้เกิดความเป็นธรรม มันแค่ทำซ้ำสิ่งที่เคยปฏิบัติมาในอดีต รูปแบบของเรา มันแค่ทำสถานะปัจจุบันให้เป็นอัตโนมัติ ซึ่งมันคงจะเป็นเรื่องเยี่ยม ถ้าเราอยู่ในโลกที่สมบูรณ์แบบ แต่เราไม่ได้อยู่ และฉันจะเพิ่มเติมอีกว่า ถึงบริษัทส่วนใหญ่จะไม่ได้มีคดีอื้อฉาว แต่เหล่านักวิทยาการข้อมูลในบริษัทเหล่านั้น ได้รับคำสั่งให้ถือข้อมูลเป็นหลัก ให้เน้นไปที่ความถูกต้องแม่นยำ ลองคิดดูว่า นั่นหมายความว่าอย่างไร เนื่องจากเราต่างก็มีอคติ มันหมายความว่า เรากำลังสร้างการเหยียดเพศ หรือความอคติดื้อรั้นบางอย่างลงในระบบ

Thought experiment, because I like them: an entirely segregated society -- racially segregated, all towns, all neighborhoods and where we send the police only to the minority neighborhoods to look for crime. The arrest data would be very biased. What if, on top of that, we found the data scientists and paid the data scientists to predict where the next crime would occur? Minority neighborhood. Or to predict who the next criminal would be? A minority. The data scientists would brag about how great and how accurate their model would be, and they'd be right.

มาทดลองด้านความคิดกันหน่อย เพราะฉันชอบทำ สังคมแห่งหนึ่งที่มีความแบ่งแยกอย่างสมบูรณ์ แบ่งแยกกันด้านเชื้อชาติ ในทุกเมือง ทุกชุมชน และเราส่งตำรวจไปแค่ที่ชุมชน ของคนกลุ่มเสียงข้างน้อยเท่านั้น เพื่อตรวจหาอาชญากรรม ข้อมูลการจับกุมก็จะมีความลำเอียงเป็นอย่างมาก นอกจากนั้นแล้ว ถ้าเราหานักวิทยาการข้อมูลและจ้าง ให้พวกเขาเหล่านั้น ทำนายว่า อาชญากรรมครั้งต่อไปจะเกิดที่ไหน ชุมชนของคนกลุ่มน้อย หรือเพื่อทำนายว่า อาชญากรคนต่อไปจะเป็นใคร? คนกลุ่มน้อย นักวิทยาการข้อมูลก็คงจะอวดโอ่ ได้ว่าโมเดลของพวกเขานั้น ยอดเยี่ยมและแม่นยำเพียงใด และพวกเขาก็คงจะเป็นฝ่ายถูก

Now, reality isn't that drastic, but we do have severe segregations in many cities and towns, and we have plenty of evidence of biased policing and justice system data. And we actually do predict hotspots, places where crimes will occur. And we do predict, in fact, the individual criminality, the criminality of individuals. The news organization ProPublica recently looked into one of those "recidivism risk" algorithms, as they're called, being used in Florida during sentencing by judges. Bernard, on the left, the black man, was scored a 10 out of 10. Dylan, on the right, 3 out of 10. 10 out of 10, high risk. 3 out of 10, low risk. They were both brought in for drug possession. They both had records, but Dylan had a felony but Bernard didn't. This matters, because the higher score you are, the more likely you're being given a longer sentence.

ในความเป็นจริงแล้ว มันคงไม่สุดขั้วขนาดนั้น แต่เราก็มีปัญหาการแบ่งแยกที่รุนแรง ในหลายๆ เมืองทั้งเล็กและใหญ่ และเรายังมีหลักฐานอีกมากมาย ของข้อมูลเกี่ยวกับ กระบวนการยุติธรรมที่มีอคติ และเราก็มีการพยากรณ์จุดเสี่ยงจริงๆ คือตำแหน่งที่จะเกิดอาชญากรรมขึ้น และเราก็มีการพยากรณ์การเกิด อาชญากรรมจริงๆ การเกิดอาชญากรรมของแต่ละบุคคล องค์กรสื่อที่เรียกว่า โปรพับลิก้า ได้ทำการศึกษาเมื่อเร็วๆ นี้ เกี่ยวกับอัลกอริทึม ที่เรียกกันว่า "ความเสี่ยงที่จะทำผิดซ้ำซาก" ที่ถูกใช้ในรัฐฟลอริด้า ในระหว่างกระบวนการตัดสินของศาล เบอร์นาร์ด ชายผิวดำ ทางด้านซ้าย ได้คะแนน 10 เต็ม 10 ส่วนดีแลน ทางด้านขวา ได้ 3 เต็ม 10 10 เต็ม 10 ความเสี่ยงสูง 3 เต็ม 10 ความเสี่ยงต่ำ พวกเราถูกคุมตัวมาด้วยข้อหา มียาเสพติดในครอบครอง ทั้งคู่ต่างก็มีประวัติอาชญากรรม แต่ดีแลนมีความผิดอุกฉกรรจ์ร่วมด้วย แต่เบอร์นาร์ดไม่มี เรื่องนี้เป็นประเด็น เพราะว่า ยิ่งคุณได้คะแนนสูงเท่าไหร่ ยิ่งมีโอกาสที่จะต้องโทษ เป็นเวลาที่ยาวนานกว่า

What's going on? Data laundering. It's a process by which technologists hide ugly truths inside black box algorithms and call them objective; call them meritocratic. When they're secret, important and destructive, I've coined a term for these algorithms: "weapons of math destruction."

นี่มันเกิดอะไรขึ้น? การฟอกข้อมูล มันคือกระบวนการที่ นักเทคโนโลยีซ่อนความจริงที่น่าเกลียด เอาไว้ภายในกล่องดำของอัลกอริทึม แล้วเรียกมันว่า ภววิสัย เรียกมันว่า คุณธรรมนิยม เมื่อมันเป็นความลับ มีความสำคัญ และมีอำนาจทำลายล้าง ฉันเลยบัญญัติศัพท์เรียกอัลกอริทึมพวกนี้ว่า "อาวุธทำลายล้างด้วยคณิตศาสตร์"

(Laughter)

(เสียงหัวเราะ)

(Applause)

(เสียงปรบมือ)

They're everywhere, and it's not a mistake. These are private companies building private algorithms for private ends. Even the ones I talked about for teachers and the public police, those were built by private companies and sold to the government institutions. They call it their "secret sauce" -- that's why they can't tell us about it. It's also private power. They are profiting for wielding the authority of the inscrutable. Now you might think, since all this stuff is private and there's competition, maybe the free market will solve this problem. It won't. There's a lot of money to be made in unfairness.

พวกมันอยู่ทุกหนแห่ง และนี่ไม่ใช่ความผิดพลาด นี่คือเหล่าบริษัทเอกชน ที่สร้างอัลกอริทึมภาคเอกชน เพื่อผลประโยชน์ของเอกชน แม้กระทั่งเรื่องที่ฉันพูดถึง เกี่ยวกับครูและตำรวจสาธารณะ อัลกอริทึมเหล่านั้นถูกสร้างโดย บริษัทเอกชน และขายให้กับหน่วยงานของรัฐ พวกเขาเรียกมันว่า "สูตรลับ" และนั่นเป็นสาเหตุที่พวกเขาบอกเราไม่ได้ และมันยังเป็นอำนาจของเอกชนด้วย พวกเขาได้กำไรจากการใช้ อำนาจที่ลึกลับและตรวจสอบไม่ได้ ถึงตอนนี้คุณอาจจะคิดว่า ในเมื่อของพวกนี้เป็นของเอกชน และมันมีการแข่งขัน บางทีสภาพตลาดเสรี อาจจะช่วยแก้ปัญหานี้ให้ได้ มันแก้ไม่ได้ เพราะมีความร่ำรวยมหาศาล ที่ถูกสร้างขึ้นมาได้จากความไม่ยุติธรรม

Also, we're not economic rational agents. We all are biased. We're all racist and bigoted in ways that we wish we weren't, in ways that we don't even know. We know this, though, in aggregate, because sociologists have consistently demonstrated this with these experiments they build, where they send a bunch of applications to jobs out, equally qualified but some have white-sounding names and some have black-sounding names, and it's always disappointing, the results -- always.

และอีกอย่าง คนเราก็ไม่ได้มีความ เป็นเหตุเป็นผลนักในทางเศรษฐศาสตร์ เราต่างก็มีอคติกันอยู่ทุกคน เราต่างก็มีความเหยียดเชื้อชาติและอคติ ในแบบที่เราไม่คิดว่าจะมี หรือในแบบที่เราเองก็ไม่รู้ตัว แต่เรารู้ว่า ในภาพรวมระดับสังคม เรามีอคติเหล่านี้ เพราะว่านักสังคมวิทยา ได้สาธิตให้เราเห็นอคติเหล่านี้ อยู่บ่อยๆ ผ่านการทดลองต่างๆ เช่นการส่งใบสมัครงานออกไป โดยระบุคุณสมบัติพอๆ กัน แต่กลุ่มหนึ่งชื่อเหมือนคนขาว อีกกลุ่มมีชื่อเหมือนคนผิวสี และผลลัพธ์ที่ออกมาก็น่าผิดหวัง อยู่เสมอมา ตลอดมาจริงๆ

So we are the ones that are biased, and we are injecting those biases into the algorithms by choosing what data to collect, like I chose not to think about ramen noodles -- I decided it was irrelevant. But by trusting the data that's actually picking up on past practices and by choosing the definition of success, how can we expect the algorithms to emerge unscathed? We can't. We have to check them. We have to check them for fairness.

ดังนั้น พวกเรานี่เองแหละที่มีอคติ และเรากำลังใส่อคติเหล่านั้น ลงไปในอัลกอริทึม โดยผ่านการเลือกว่าจะเก็บข้อมูลอะไร เหมือนที่ฉันเลือกที่จะไม่ใช้ บะหมี่กึ่งสำเร็จรูป ฉันตัดสินใจว่า มันไม่ถือเป็นอาหาร แต่โดยการเชื่อข้อมูลที่เรา เก็บมาจากผลการกระทำในอดีต และโดยการเลือกนิยามของความสำเร็จ เราจะคาดหวังว่าอัลกอริทึมจะ ออกมาดีได้อย่างไร? เราคาดหวังไม่ได้ เราต้องตรวจสอบมัน เราต้องตรวจสอบมัน ในแง่ความเป็นธรรม

The good news is, we can check them for fairness. Algorithms can be interrogated, and they will tell us the truth every time. And we can fix them. We can make them better. I call this an algorithmic audit, and I'll walk you through it.

ข่าวดีก็คือ เราสามารถตรวจสอบ ความเป็นธรรมของมันได้ อัลกอริทึมสามารถถูกสอบสวนได้ และมันจะบอกความจริงเราทุกครั้ง และเราสามารถซ่อมแซมมันได้ เราทำให้มันดีขึ้นได้ ฉันเรียกมันว่า "การตรวจสอบอัลกอริทึม" และฉันจะเล่าให้พวกคุณฟัง

First, data integrity check. For the recidivism risk algorithm I talked about, a data integrity check would mean we'd have to come to terms with the fact that in the US, whites and blacks smoke pot at the same rate but blacks are far more likely to be arrested -- four or five times more likely, depending on the area. What is that bias looking like in other crime categories, and how do we account for it?

สิ่งแรกคือ การตรวจสอบความสอดคล้องของข้อมูล สำหรับอัลกอริทึมความเสี่ยง ที่จะทำผิดซ้ำที่ได้เล่าไปแล้ว การตรวจสอบข้อมูลหมายถึง การที่เราต้องตระหนักความจริงที่ว่า ในสหรัฐนั้น ทั้งคนขาวและคนดำ ต่างก็เสพยาในอัตราเดียวกัน แต่คนดำนั้นมีโอกาสสูงกว่ามาก ที่จะถูกจับ สูงกว่ามากถึง 4 หรือ 5 เท่า ขึ้นอยู่กับสถานที่ แล้วอคตินั้นเป็นอย่างไร ในอาชญากรรมประเภทอื่นๆ แล้วเราจะนำปัจจัยมาพิจารณาอย่างไร

Second, we should think about the definition of success, audit that. Remember -- with the hiring algorithm? We talked about it. Someone who stays for four years and is promoted once? Well, that is a successful employee, but it's also an employee that is supported by their culture. That said, also it can be quite biased. We need to separate those two things. We should look to the blind orchestra audition as an example. That's where the people auditioning are behind a sheet. What I want to think about there is the people who are listening have decided what's important and they've decided what's not important, and they're not getting distracted by that. When the blind orchestra auditions started, the number of women in orchestras went up by a factor of five.

อย่างที่สอง เราควรจะคิดถึง นิยามของความสำเร็จ ตรวจสอบนิยามเหล่านั้น จำไว้ว่า ในอัลกอริทึมการว่าจ้าง ที่เราได้พูดถึงนั้น ใครบางคนที่อยู่มาสี่ปีและได้ เลื่อนขั้นอย่างน้อยครั้งหนึ่ง นั่นถือว่าเป็นพนักงานที่ประสบความสำเร็จ แต่นั่นก็เป็นพนักงานที่ได้รับการสนับสนุน จากวัฒนธรรมของพวกเขาด้วย หมายความว่า มันสามารถที่จะลำเอียงได้มาก เราจำเป็นต้องแยกสองอย่างนี้ออกจากกัน เราน่าจะดูการปิดตาคัดตัว ที่ใช้ในวงออร์เคสตรา เป็นตัวอย่าง นั่นคือเมื่อคนที่กำลังทดสอบ ถูกกั้นอยู่หลังม่าน สิ่งที่ฉันคิดก็คือ นั่นทำให้คนที่กำลังฟังอยู่ สามารถตัดสินใจได้ว่า อะไรเป็นสิ่งสำคัญ และพวกเขาได้ตัดสินใจแล้วว่า อะไรไม่ใช่สิ่งสำคัญ และจะไม่ยอมให้เกิดการ เบี่ยงเบนความสนใจไปได้ เมื่อเริ่มมีการคัดตัวสมาชิก วงออร์เคสตราแบบปิดตา จำนวนของนักดนตรีหญิง ในวงออร์เคสตรา สูงขึ้นถึง 5 เท่า

Next, we have to consider accuracy. This is where the value-added model for teachers would fail immediately. No algorithm is perfect, of course, so we have to consider the errors of every algorithm. How often are there errors, and for whom does this model fail? What is the cost of that failure?

ต่อมา เราจำเป็นต้องพิจารณาเรื่องความแม่นยำ นี่คือจุดที่โมเดลคุณค่าเพิ่ม สำหรับครู จะล้มเหลวในทันที จริงอยู่ที่ไม่มีอัลกอริทึมใดจะสมบูรณ์แบบ ดังนั้นเราจะต้องพิจารณาถึง ความผิดพลาดต่างๆ ของทุกอัลกอริทึม ความผิดพลาดเกิดบ่อยแค่ไหน และมันส่งผลเสียต่อใครบ้าง ความเสียหายนั้นมีต้นทุนเป็นอย่างไร

And finally, we have to consider the long-term effects of algorithms, the feedback loops that are engendering. That sounds abstract, but imagine if Facebook engineers had considered that before they decided to show us only things that our friends had posted.

และท้ายที่สุดแล้ว เรายังต้องพิจารณาถึง ผลกระทบในระยะยาวของอัลกอริทึมต่างๆ ที่เกิดจากวงจรของเสียงตอบรับ นั่นฟังดูค่อนข้างเป็นนามธรรม แต่ลองนึกภาพว่าถ้าวิศวกรของเฟซบุ๊ค ได้เคยหยุดคิดถึงผลกระทบ ก่อนที่พวกเขาจะตัดสินใจแสดง แต่เฉพาะสิ่งที่เพื่อนๆ เราได้โพสไป

I have two more messages, one for the data scientists out there. Data scientists: we should not be the arbiters of truth. We should be translators of ethical discussions that happen in larger society.

ฉันมีอีกสองประเด็นที่อยากบอก เรื่องแรกสำหรับนักวิทยาการข้อมูลทั้งหลาย เราไม่ควรทำตัวเป็นผู้ชี้ขาดความจริง เราควรเป็นผู้สื่อประเด็น ข้อถกเถียงทางศีลธรรมที่กำลังเกิดขึ้น ในสังคมวงกว้าง

(Applause)

(เสียงปรบมือ)

And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability for our algorithmic overlords.

และสำหรับคุณที่เหลือ ที่ไม่ใช่นักวิทยาการข้อมูล นี่ไม่ใช่การทดสอบทางคณิตศาสตร์ นี่เป็นการต่อสู้ทางการเมือง เราจำเป็นต้องเรียกร้องความรับผิดชอบ ของผู้มีอำนาจบงการอัลกอริทึมเหล่านี้

(Applause)

(เสียงปรบมือ)

The era of blind faith in big data must end.

ยุคแห่งความเชื่อที่มืดบอด ในโลกแห่งข้อมูลมหาศาลจะต้องสิ้นสุดลง

Thank you very much.

ขอบคุณมากค่ะ

(Applause)

(เสียงปรบมือ)

(Laughter)

(เสียงหัวเราะ)

(Laughter)

(เสียงหัวเราะ)

What is that?

นั่นคืออะไร?

(Laughter)

(เสียงหัวเราะ)

That should never have been used for individual assessment. It's almost a random number generator.

(Applause)

(เสียงปรบมือ)

This is Roger Ailes.

นี่คือโรเจอร์ เอลส์

(Laughter)

(เสียงหัวเราะ)

(Laughter)

(เสียงหัวเราะ)

(Applause)

(เสียงปรบมือ)

(Applause)

(เสียงปรบมือ)

And the rest of you, the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability for our algorithmic overlords.

(Applause)

(เสียงปรบมือ)

The era of blind faith in big data must end.

ยุคแห่งความเชื่อที่มืดบอด ในโลกแห่งข้อมูลมหาศาลจะต้องสิ้นสุดลง

Thank you very much.

ขอบคุณมากค่ะ

(Applause)

(เสียงปรบมือ)

Cathy O'Neil: The era of blind faith in big data must end

Cathy O'Neil: The era of blind faith in big data must end

Related talks

Tricia Wang: The human insights missing from big data

Mona Chalabi: 3 ways to spot a bad statistic

Mallory Freeman: Your company's data could help end world hunger

Christian Rudder: Inside OKCupid: The math of online dating

Zeynep Tufekci: Machine intelligence makes human morals more important

Amy Webb: How I hacked online dating

Related talks

Tricia Wang: The human insights missing from big data

Mona Chalabi: 3 ways to spot a bad statistic

Mallory Freeman: Your company's data could help end world hunger

Christian Rudder: Inside OKCupid: The math of online dating

Zeynep Tufekci: Machine intelligence makes human morals more important

Amy Webb: How I hacked online dating