So I've been an AI researcher for over a decade. And a couple of months ago, I got the weirdest email of my career. A random stranger wrote to me saying that my work in AI is going to end humanity. Now I get it, AI, it's so hot right now.
我从事人工智能(AI)研究 已有十多年了。 几个月前,我收到了我职业生涯中 最奇怪的电子邮件。 一个陌生人写信给我, 说我在 AI 领域的工作将终结人类。 我懂了,AI 现在太火了。
(Laughter)
(笑声)
It's in the headlines pretty much every day, sometimes because of really cool things like discovering new molecules for medicine or that dope Pope in the white puffer coat. But other times the headlines have been really dark, like that chatbot telling that guy that he should divorce his wife or that AI meal planner app proposing a crowd pleasing recipe featuring chlorine gas. And in the background, we've heard a lot of talk about doomsday scenarios, existential risk and the singularity, with letters being written and events being organized to make sure that doesn't happen.
每天的头条新闻里几乎都有它, 有时候是因为一些非常酷的事情, 比如发现了新的药物分子, 或者教皇穿着白色羽绒服。 其他的头条新闻却非常黑暗, 比如聊天机器人 让一个人与妻子离婚, 或者一个 AI 食谱计划应用 提出了一份受人喜爱的食谱, 却以氯气为主。 我们听到了 关于世界末日的情景、 生存风险和奇点理论的议论纷纷, 人们写信、组织活动 确保这种情况不会发生。
Now I'm a researcher who studies AI's impacts on society, and I don't know what's going to happen in 10 or 20 years, and nobody really does. But what I do know is that there's some pretty nasty things going on right now, because AI doesn't exist in a vacuum. It is part of society, and it has impacts on people and the planet.
我是一名研究AI对社会 影响的研究人员, 我不知道 10 年 或 20 年后会发生什么, 没有人知道。 但我知道现在 有一些非常讨厌的事情, 因为 AI 不是凭空存在的。 它是社会的一部分, 会对人类和地球造成影响。
AI models can contribute to climate change. Their training data uses art and books created by artists and authors without their consent. And its deployment can discriminate against entire communities. But we need to start tracking its impacts. We need to start being transparent and disclosing them and creating tools so that people understand AI better, so that hopefully future generations of AI models are going to be more trustworthy, sustainable, maybe less likely to kill us, if that's what you're into.
AI 模型会影响气候变化。 它们的训练数据 在未经许可的情况下 使用着艺术家和作者们 创作的艺术和书籍。 它的部署可能会歧视整个群体。 我们得开始追踪它的影响。 我们得保持透明, 公开影响,创建工具, 让人们更好地了解 AI, 希望 AI 模型的更新换代 更值得信赖、更可持续, 如果你关心的是这一点, 也许也更不会把我们杀了。
But let's start with sustainability, because that cloud that AI models live on is actually made out of metal, plastic, and powered by vast amounts of energy. And each time you query an AI model, it comes with a cost to the planet. Last year, I was part of the BigScience initiative, which brought together a thousand researchers from all over the world to create Bloom, the first open large language model, like ChatGPT, but with an emphasis on ethics, transparency and consent. And the study I led that looked at Bloom's environmental impacts found that just training it used as much energy as 30 homes in a whole year and emitted 25 tons of carbon dioxide, which is like driving your car five times around the planet just so somebody can use this model to tell a knock-knock joke. And this might not seem like a lot, but other similar large language models, like GPT-3, emit 20 times more carbon. But the thing is, tech companies aren't measuring this stuff. They're not disclosing it. And so this is probably only the tip of the iceberg, even if it is a melting one.
让我们从可持续发展说起, 因为 AI 模型所在的“云” 其实来自金属、塑料, 需要大量的能量。 你每向 AI 模型查询一次, 都会让地球付出成本。 去年,我参与了 BigScience 计划, 该计划汇集了来自世界各地的 一千名研究人员, 共同创建了 Bloom 模型, 这是第一个开放的大语言模型, 类似 ChatGPT, 但强调道德、透明度和准许。 我带领的研究调查了 Bloom 对环境的影响, 发现仅仅是训练它,消耗的能量 就相当于 30 个家庭的消耗量, 排放了 25 吨的二氧化碳, 相当于驾车环绕地球五周, 只为了让人用这个模型 讲一个简单的笑话。 看起来可能并不多, 但其他类似的大语言模型, 如 GPT-3, 排放的碳量要多 20 倍。 但问题是,科技公司 并没有计算这些东西。 它们不会透露。 这可能只是冰山一角, 即使冰山正在融化。
And in recent years we've seen AI models balloon in size because the current trend in AI is "bigger is better." But please don't get me started on why that's the case. In any case, we've seen large language models in particular grow 2,000 times in size over the last five years. And of course, their environmental costs are rising as well. The most recent work I led, found that switching out a smaller, more efficient model for a larger language model emits 14 times more carbon for the same task. Like telling that knock-knock joke. And as we're putting in these models into cell phones and search engines and smart fridges and speakers, the environmental costs are really piling up quickly. So instead of focusing on some future existential risks, let's talk about current tangible impacts and tools we can create to measure and mitigate these impacts.
近年来,我们已经看到 AI 模型的规模激增, 因为 AI 当前的趋势是 “越大越好”。 但是请不要让我 开始解释为什么会这样。 无论如何,在过去的五年中, 我们已经看到大语言模型 的规模增长了 2000 倍。 当然,它们的环境成本也在上升。 我最近带领的研究发现,将更小、 更高效的模型 换成更大型的语言模型 所产生的碳排放量 是同一任务的 14 倍。 比如讲个简单的笑话。 当我们将这些模型 放进手机、搜索引擎 智能冰箱、扬声器时, 环境成本确实在迅速增加。 与其关注未来的某些生存风险, 不如谈谈当前的切实影响 以及我们可以创建什么工具 衡量、减轻这些影响。
I helped create CodeCarbon, a tool that runs in parallel to AI training code that estimates the amount of energy it consumes and the amount of carbon it emits. And using a tool like this can help us make informed choices, like choosing one model over the other because it's more sustainable, or deploying AI models on renewable energy, which can drastically reduce their emissions.
我帮助创建了 CodeCarbon, 一款与 AI 训练代码 并行运行的工具, 可以估算它消耗的能量 和排放的碳量。 使用这样的工具可以帮助我们 做出明智的选择, 如选择一种而不是另一种模型, 因为它更具可持续性, 或者在可再生能源领域 部署人工智能模型, 大大减少碳排放。
But let's talk about other things because there's other impacts of AI apart from sustainability. For example, it's been really hard for artists and authors to prove that their life's work has been used for training AI models without their consent. And if you want to sue someone, you tend to need proof, right? So Spawning.ai, an organization that was founded by artists, created this really cool tool called “Have I Been Trained?” And it lets you search these massive data sets to see what they have on you. Now, I admit it, I was curious. I searched LAION-5B, which is this huge data set of images and text, to see if any images of me were in there. Now those two first images, that's me from events I've spoken at. But the rest of the images, none of those are me. They're probably of other women named Sasha who put photographs of themselves up on the internet. And this can probably explain why, when I query an image generation model to generate a photograph of a woman named Sasha, more often than not I get images of bikini models. Sometimes they have two arms, sometimes they have three arms, but they rarely have any clothes on. And while it can be interesting for people like you and me to search these data sets, for artists like Karla Ortiz, this provides crucial evidence that her life's work, her artwork, was used for training AI models without her consent, and she and two artists used this as evidence to file a class action lawsuit against AI companies for copyright infringement. And most recently --
但我们来谈谈其他事, 因为除了可持续性之外, AI 还有其他影响。 例如,艺术家和作家很难 证明他们的毕生之作在未经允许的 情况下被用于训练 AI 模型。 而且,如果你想起诉某人, 你往往需要证据,对吧? 因此,由艺术家创立的组织 Spawning.ai 创建了这个非常酷的工具, 名为 “我被拿去训练了吗?”。 它可以让你搜索海量数据集, 看看它们对你做了些什么。 我承认我很好奇。 我搜索了 LAION-5B, 这是一个由图像和文本 组成的庞大数据集, 想看看里面有没有我的图片。 先是这两张照片, 是我在演讲活动中的照片。 但是其余图片中的都不是我。 她们可能是其他名叫萨沙的女性, 在互联网上发布了自己的照片。 这也许可以解释为什么 当我查询图像生成模型 生成一个名叫萨沙的女性的照片时, 我经常会得到比基尼模特的照片。 有时她们有两只手臂, 有时她们有三只手臂, 但她们很少穿着衣服。 虽然像你我这样的人 搜索这些数据集可能很有趣, 但对于卡拉·奥尔蒂兹(Karla Ortiz) 这样的艺术家来说, 这提供了重要的证据, 证明她一生的作品,她的作品, 在未经她同意的情况下 被用于训练 AI 模型, 她和两位艺术家以此作为证据, 以侵犯版权为由 对 AI 公司提起集体诉讼。 最近——
(Applause)
(掌声)
And most recently Spawning.ai partnered up with Hugging Face, the company where I work at, to create opt-in and opt-out mechanisms for creating these data sets. Because artwork created by humans shouldn’t be an all-you-can-eat buffet for training AI language models.
最近,Spawning.ai 与 我所在的公司 Hugging Face 合作, 在创建这些数据集的过程中 加入了选择加入和退出的机制。 因为人类创作的艺术品不应该成为 训练 AI 语言模型的畅吃自助餐。
(Applause)
(掌声)
The very last thing I want to talk about is bias. You probably hear about this a lot. Formally speaking, it's when AI models encode patterns and beliefs that can represent stereotypes or racism and sexism. One of my heroes, Dr. Joy Buolamwini, experienced this firsthand when she realized that AI systems wouldn't even detect her face unless she was wearing a white-colored mask. Digging deeper, she found that common facial recognition systems were vastly worse for women of color compared to white men. And when biased models like this are deployed in law enforcement settings, this can result in false accusations, even wrongful imprisonment, which we've seen happen to multiple people in recent months. For example, Porcha Woodruff was wrongfully accused of carjacking at eight months pregnant because an AI system wrongfully identified her.
我想谈的最后一点是偏见。 你可能经常听说这点。 严格来说,它出现在 AI 模型将代表刻板印象、 种族歧视、性别歧视的 模式或观点纳入其中的时候。 我的偶像之一,乔伊·布拉姆维尼 (Joy Buolamwini)博士 亲身经历了这一点, 她发现 AI 系统 甚至无法检测到她的脸, 除非她戴着白色面具。 深入研究后,她发现, 常见的面部识别系统 识别有色人种女性 要比白人男性差得多。 在执法场景中使用这样 带有偏见的模型时, 可能会导致虚假指控,甚至是冤狱, 近几个月来,我们已经在多人身上 看到了这种指控的发生。 例如,波恰·伍德拉夫 (Porcha Woodruff) 在怀孕八个月时被错误指控劫车, 因为 AI 系统错误地识别了她。
But sadly, these systems are black boxes, and even their creators can't say exactly why they work the way they do. And for example, for image generation systems, if they're used in contexts like generating a forensic sketch based on a description of a perpetrator, they take all those biases and they spit them back out for terms like dangerous criminal, terrorists or gang member, which of course is super dangerous when these tools are deployed in society.
但遗憾的是, 这些系统是个黑箱, 即使是它们的创造者也无法 确切地说出它们为什么会如此运行。 例如,就图像生成系统而言, 如果它们被用于 根据对嫌疑人的描述 生成法医素描这样的场景下, 它们会吸取所有偏见, 再原样返回, 比如“危险罪犯”、“恐怖分子”、 “黑社会成员”这样的词语, 这种工具用在社会中 当然是非常危险的。
And so in order to understand these tools better, I created this tool called the Stable Bias Explorer, which lets you explore the bias of image generation models through the lens of professions. So try to picture a scientist in your mind. Don't look at me. What do you see? A lot of the same thing, right? Men in glasses and lab coats. And none of them look like me. And the thing is, is that we looked at all these different image generation models and found a lot of the same thing: significant representation of whiteness and masculinity across all 150 professions that we looked at, even if compared to the real world, the US Labor Bureau of Statistics. These models show lawyers as men, and CEOs as men, almost 100 percent of the time, even though we all know not all of them are white and male.
因此,为了更好地理解这些工具, 我创建了这个名为 Stable Bias Explorer 的工具, 它可以让你从专业的角度 探索图像生成模型中的偏见。 试试在你的脑海中 想象一位科学家。 别看着我。 你看到了什么? 大家都差不多,对吧? 戴眼镜、穿着实验室外套的男人。 都不是我这个样子。 问题是我们研究了 各种图像生成模型, 发现了很多相同的东西: 在我们研究的 150 个职业中, 都有明显的白人和男性气质, 即使与现实世界相比, 美国劳工统计局也是如此。 这些模型将律师显示为男性, 将 CEO 显示为男性, 几乎每次都是, 虽然我们都知道不是所有 这些岗位都是白人男性。
And sadly, my tool hasn't been used to write legislation yet. But I recently presented it at a UN event about gender bias as an example of how we can make tools for people from all walks of life, even those who don't know how to code, to engage with and better understand AI because we use professions, but you can use any terms that are of interest to you.
遗憾的是,我的工具 还没有被用来起草立法。 但我最近在联合国的 一次关于性别偏见的活动上 介绍了这个工具,借此 说明我们如何为各行各业的人们, 即使是那些不会编程的人制作工具, 让他们参与、更好地理解 AI, 因为我们运用的是专业知识, 但你可以使用任何你感兴趣的术语。
And as these models are being deployed, are being woven into the very fabric of our societies, our cell phones, our social media feeds, even our justice systems and our economies have AI in them. And it's really important that AI stays accessible so that we know both how it works and when it doesn't work. And there's no single solution for really complex things like bias or copyright or climate change. But by creating tools to measure AI's impact, we can start getting an idea of how bad they are and start addressing them as we go. Start creating guardrails to protect society and the planet. And once we have this information, companies can use it in order to say, OK, we're going to choose this model because it's more sustainable, this model because it respects copyright. Legislators who really need information to write laws, can use these tools to develop new regulation mechanisms or governance for AI as it gets deployed into society. And users like you and me can use this information to choose AI models that we can trust, not to misrepresent us and not to misuse our data.
随着这些模型的部署, 它们正在与我们社会的千丝万缕交织, 我们的手机、我们的社交媒体发文, 甚至我们的司法系统 和经济都包含了 AI。 保证 AI 的可访问性很重要, 这样我们才能知道它是如何运作的, 什么情况下是用不了的。 对于偏见、版权或气候变化等 非常复杂的问题, 没有单一的解决方案。 但是,通过创建 衡量 AI 影响的工具, 我们可以开始了解 AI 有多恶劣, 随之着手处理这些问题。 开始采取防护措施, 保护社会和地球。 一旦我们有了这些信息, 公司就可以借此表示, 好吧,我们之所以选择这个模型, 是因为它更具可持续性, 选择这个模型,是因为它尊重版权。 真正需要信息来制定法律的立法者 可以在 AI 部署至社会之中时 利用这些工具制定 新的管理或治理机制。 像你我这样的用户 可以利用这些信息 选择我们可以信任的 AI 模型, 而不是歪曲我们的形象, 也不会滥用我们的数据。
But what did I reply to that email that said that my work is going to destroy humanity? I said that focusing on AI's future existential risks is a distraction from its current, very tangible impacts and the work we should be doing right now, or even yesterday, for reducing these impacts. Because yes, AI is moving quickly, but it's not a done deal. We're building the road as we walk it, and we can collectively decide what direction we want to go in together.
但是我是怎么回复那封 说我的作品将摧毁人类的邮件的? 我说,关注 AI 未来的生存风险 会分散人们关注眼下切实的影响 以及我们现在,甚至昨天 为减少这些影响而应该做的工作。 因为没错,AI 发展迅速, 但还没有尘埃落定。 我们边走边铺脚下的路, 我们可以共同决定 我们想共同前进的方向。
Thank you.
谢谢。
(Applause)
(掌声)