So I've been an AI researcher for over a decade. And a couple of months ago, I got the weirdest email of my career. A random stranger wrote to me saying that my work in AI is going to end humanity. Now I get it, AI, it's so hot right now.
我做人工智慧研究者做了十多年。 幾個月前,我收到了 我職涯中最奇怪的電子郵件。 一個不認識的陌生人寫信給我, 說我的研究會終結人類。 現在我明白了, 人工智慧,現在超熱門。
(Laughter)
(笑聲)
It's in the headlines pretty much every day, sometimes because of really cool things like discovering new molecules for medicine or that dope Pope in the white puffer coat. But other times the headlines have been really dark, like that chatbot telling that guy that he should divorce his wife or that AI meal planner app proposing a crowd pleasing recipe featuring chlorine gas. And in the background, we've heard a lot of talk about doomsday scenarios, existential risk and the singularity, with letters being written and events being organized to make sure that doesn't happen.
它幾乎每天上頭條, 有時是因為真的很酷的原因, 比如:發現新的醫藥分子, 或那個穿著白色羽絨外套的酷教宗。 但其他時候它的頭條卻很黑暗,如: 聊天機器人告訴那個傢伙 他應該和妻子離婚, 或者人工智慧餐點規劃 APP 提出頗受眾愛的食譜, 主打特色是氯氣。 在幕後,我們會聽到很多人 在談末日的情境、存在風險, 和奇點,且會有人寫信 和安排發起活動以確保 那些事不會發生。
Now I'm a researcher who studies AI's impacts on society, and I don't know what's going to happen in 10 or 20 years, and nobody really does. But what I do know is that there's some pretty nasty things going on right now, because AI doesn't exist in a vacuum. It is part of society, and it has impacts on people and the planet.
身為研究者,我研究的是 人工智慧對社會的影響, 我不知道十年或二十年後 會發生什麼事, 且沒有人真的知道。 但我確實知道的是:現在就有 一些非常糟糕的事情正在發生, 因為人工智能並非獨立存在的。 它是社會的一部分, 它會影響到人類和地球。
AI models can contribute to climate change. Their training data uses art and books created by artists and authors without their consent. And its deployment can discriminate against entire communities. But we need to start tracking its impacts. We need to start being transparent and disclosing them and creating tools so that people understand AI better, so that hopefully future generations of AI models are going to be more trustworthy, sustainable, maybe less likely to kill us, if that's what you're into.
人工智慧模型可能會促成氣候變遷。 它們的訓練資料會用到藝術家 和作家創作的藝術和書籍, 卻未取得他們的同意。 人工智慧的實際應用可能會 產生對整個族群的歧視。 但我們需要開始追蹤它的影響。 我們需要開始做到透明化, 揭露這些影響並創造工具 來讓大家更了解人工智慧, 也希望未來世代的人工智慧 模型能夠更可信賴、永續, 也許殺死我們的機率也低些, 若你對這議題感興趣的話。
But let's start with sustainability, because that cloud that AI models live on is actually made out of metal, plastic, and powered by vast amounts of energy. And each time you query an AI model, it comes with a cost to the planet. Last year, I was part of the BigScience initiative, which brought together a thousand researchers from all over the world to create Bloom, the first open large language model, like ChatGPT, but with an emphasis on ethics, transparency and consent. And the study I led that looked at Bloom's environmental impacts found that just training it used as much energy as 30 homes in a whole year and emitted 25 tons of carbon dioxide, which is like driving your car five times around the planet just so somebody can use this model to tell a knock-knock joke. And this might not seem like a lot, but other similar large language models, like GPT-3, emit 20 times more carbon. But the thing is, tech companies aren't measuring this stuff. They're not disclosing it. And so this is probably only the tip of the iceberg, even if it is a melting one.
但讓我們從永續性談起,因為 人工智慧模型所居住的雲端, 組成成份實際上是金屬、 塑膠,還有大量的能源供給電力。 你每用人工智慧做一次查詢, 地球都要付出代價。 去年,我是「大科學計畫」的一員, 該計畫集結了來自世界各地的 一千名研究者來創造 Bloom, 第一個開放的大型語言 模型,就像 ChatGPT, 但重視道德、透明度,和徵詢同意。 我主導的研究是在探究 Bloom 對環境的影響, 結果發現,光是訓練它, 使用的能源就等同於 三十個家庭使用一整年的量, 且會排放二十五公噸的二氧化碳, 差不多等同於開車繞地球五次, 只是為了讓某人可以用 這個模型來說個敲門笑話。 可能看似不多, 但其他類似的大型語言 模型,如 GPT-3, 排放的碳量是二十倍。 重點是科技公司並沒有在 衡量這些,沒有揭露出來。 因此,這可能只是冰山的一角, 即使是在融化的冰山。
And in recent years we've seen AI models balloon in size because the current trend in AI is "bigger is better." But please don't get me started on why that's the case. In any case, we've seen large language models in particular grow 2,000 times in size over the last five years. And of course, their environmental costs are rising as well. The most recent work I led, found that switching out a smaller, more efficient model for a larger language model emits 14 times more carbon for the same task. Like telling that knock-knock joke. And as we're putting in these models into cell phones and search engines and smart fridges and speakers, the environmental costs are really piling up quickly. So instead of focusing on some future existential risks, let's talk about current tangible impacts and tools we can create to measure and mitigate these impacts.
近年來,我們看到 人工智慧模型快速增大, 因為目前的人工智慧 趨勢是「越大越好」。 但請別讓我開始談為什麼如此。 無論如何,我們看到 特別是大型語言模型 在過去五年間就長大了兩千倍。 當然,它們的環境成本 也會跟著上升。 我最近期主導的研究發現, 將較小、較有效率的模型 更換為更大的語言模型 來做同樣的工作任務, 排放的碳會增為十四倍。 就像說那個敲門的笑話。 隨著我們將這些模型置入 手機、搜尋引擎、 智慧冰箱,和喇叭中, 環境成本很快就越堆越高。 因此,與其專注在未來的存在風險, 咱們不如來談談目前實質的影響 以及我們可以創建什麼工具 來衡量和降低這些影響。
I helped create CodeCarbon, a tool that runs in parallel to AI training code that estimates the amount of energy it consumes and the amount of carbon it emits. And using a tool like this can help us make informed choices, like choosing one model over the other because it's more sustainable, or deploying AI models on renewable energy, which can drastically reduce their emissions.
我協助創造了 CodeCarbon, 它是種與人工智慧訓練 程式碼平行運作的工具, 可以估計該程式碼 消耗的能量和排放的碳量。 使用這類工具可以協助我們 做出明智的選擇,例如 考量永續性所以選擇 這一個模型而非另一個, 或者採用靠可再生能源 運作的人工智慧模型, 它們的排放量會大大減少。
But let's talk about other things because there's other impacts of AI apart from sustainability. For example, it's been really hard for artists and authors to prove that their life's work has been used for training AI models without their consent. And if you want to sue someone, you tend to need proof, right? So Spawning.ai, an organization that was founded by artists, created this really cool tool called “Have I Been Trained?” And it lets you search these massive data sets to see what they have on you. Now, I admit it, I was curious. I searched LAION-5B, which is this huge data set of images and text, to see if any images of me were in there. Now those two first images, that's me from events I've spoken at. But the rest of the images, none of those are me. They're probably of other women named Sasha who put photographs of themselves up on the internet. And this can probably explain why, when I query an image generation model to generate a photograph of a woman named Sasha, more often than not I get images of bikini models. Sometimes they have two arms, sometimes they have three arms, but they rarely have any clothes on. And while it can be interesting for people like you and me to search these data sets, for artists like Karla Ortiz, this provides crucial evidence that her life's work, her artwork, was used for training AI models without her consent, and she and two artists used this as evidence to file a class action lawsuit against AI companies for copyright infringement. And most recently --
但咱們來談談其他議題, 因為除了永續性之外, 人工智慧還會造成其他影響。 例如,藝術家和作家真的很難證明 未經他們的同意,他們一生的 心血就被用來訓練人工智慧模型。 如果你想告別人, 通常會需要證據,對吧? Spawning.ai 是一個 由藝術家創立的組織, 創造了一個很酷的工具, 叫「我是怎麼被訓練的?」 它讓你能搜尋這些龐大的資料集, 看看有哪些你的資訊在其中。 我承認,我很好奇。 我搜尋大型的影像 及文本資料集 LAION-5B, 想知道裡面是否有我的影像。 前兩張影像 是我在活動上演說。 但其餘的影像通通都不是我。 可能是其他也叫莎夏的女性 把自己的照片放上網。 這可能可以解釋為什麼 當我要求影像生成模型 生成一個名叫莎夏的女子的 照片時,通常我會得到 比基尼模特兒的影像。 有時她們有兩隻手, 有時她們有三隻手, 但她們都很少有穿衣服。 雖然對你我這樣的人來說, 搜尋這些資料集可能很有趣, 但對卡拉‧歐提茲這種藝術家來說, 這些是重要的證據, 可證明她一生的作品, 她的藝術創作,被用來訓練 人工智慧模型且未取得她的同意。 她和兩位藝術家把搜尋結果當證據, 對人工智慧公司提起集體訴訟, 控告它們侵犯版權。 最近—— (掌聲)
(Applause)
And most recently Spawning.ai partnered up with Hugging Face, the company where I work at, to create opt-in and opt-out mechanisms for creating these data sets. Because artwork created by humans shouldn’t be an all-you-can-eat buffet for training AI language models.
最近, Spawning.ai 和我服務的 公司 Hugging Face 合作, 創造同意參與和退出機制, 供創建這些資料集時使用。 因為人類創作的藝術作品不該是訓練 人工智慧語言模型的吃到飽大餐。
(Applause)
(掌聲)
The very last thing I want to talk about is bias. You probably hear about this a lot. Formally speaking, it's when AI models encode patterns and beliefs that can represent stereotypes or racism and sexism. One of my heroes, Dr. Joy Buolamwini, experienced this firsthand when she realized that AI systems wouldn't even detect her face unless she was wearing a white-colored mask. Digging deeper, she found that common facial recognition systems were vastly worse for women of color compared to white men. And when biased models like this are deployed in law enforcement settings, this can result in false accusations, even wrongful imprisonment, which we've seen happen to multiple people in recent months. For example, Porcha Woodruff was wrongfully accused of carjacking at eight months pregnant because an AI system wrongfully identified her.
我想談的最後一點是偏見。 各位可能常常聽到。 正式的說就是:人工智慧 模型編碼可以代表刻板印象 或種族主義和性別主義的 模式及信念時。 我的偶像喬伊‧布蘭維尼 博士就有親身經歷, 她發現,人工智慧系統 不會偵測她的臉孔, 除非她戴上白色面具。 更深入研究後,她發現常見的 臉孔辨識系統對有色人種女性的 辨識能力遠不如白種男性。 當像這樣有偏見的模型 被用在執法的應用上時, 可能會導致錯誤指控, 甚至錯誤監禁, 最近幾個月我們就看到 不少人遇到這樣的現象。 例如,波莎‧伍德拉夫在懷孕 八個月時被錯誤指控劫車, 因為人工智慧系統誤認了她。
But sadly, these systems are black boxes, and even their creators can't say exactly why they work the way they do. And for example, for image generation systems, if they're used in contexts like generating a forensic sketch based on a description of a perpetrator, they take all those biases and they spit them back out for terms like dangerous criminal, terrorists or gang member, which of course is super dangerous when these tools are deployed in society.
但不幸的是,這些系統是黑盒子, 連創造它們的人也無法 明確說明為什麼它們 會採用它們現在的運行方式。 以影像生成系統為例, 如果使用它們的情境是要 根據嫌犯的描述 來產生一張法醫素描, 它們會接受所有這些偏見, 再丟回來給我們, 用危險的罪犯、恐怖分子、 幫派成員等用語呈現, 當這些工具被實際 用在社會中時,當然 就超危險。
And so in order to understand these tools better, I created this tool called the Stable Bias Explorer, which lets you explore the bias of image generation models through the lens of professions. So try to picture a scientist in your mind. Don't look at me. What do you see? A lot of the same thing, right? Men in glasses and lab coats. And none of them look like me. And the thing is, is that we looked at all these different image generation models and found a lot of the same thing: significant representation of whiteness and masculinity across all 150 professions that we looked at, even if compared to the real world, the US Labor Bureau of Statistics. These models show lawyers as men, and CEOs as men, almost 100 percent of the time, even though we all know not all of them are white and male.
因此,為了更了解這些工具, 我創造了「穩定偏見探索器」, 這項工具可讓你從職業的角度 來探索影像生成模型的偏見。 請各位腦海中想像一位科學家。 別看我。 你看到什麼? 很多相似點吧? 戴眼鏡穿實驗室白袍的男性。 沒有一個看起來像我。 重點是, 我們看了很多不同的影像生成 模型,發現很多相似點: 我們研究了一百五十個職業, 主要呈現的都是白種男性。 即使是和真實世界,美國 勞工統計局做比較也一樣。 這些模型呈現出來的律師是男性, 執行長也是男性, 幾乎 100% 都是如此, 即使我們都知道這類人 並非都是白人和男性。 遺憾的是,我的工具 尚未被用來起草立法。
And sadly, my tool hasn't been used to write legislation yet. But I recently presented it at a UN event about gender bias as an example of how we can make tools for people from all walks of life, even those who don't know how to code, to engage with and better understand AI because we use professions, but you can use any terms that are of interest to you.
但我最近在一個聯合國的 性別偏見活動上介紹這個工具, 用它當例子說明我們要如何創造工具 給各行各業的人用, 包括不知道如何寫程式的人, 讓他們能運用並更了解人工智慧 因為我們雖然用的是職業, 你可以用任何你感興趣的主題。
And as these models are being deployed, are being woven into the very fabric of our societies, our cell phones, our social media feeds, even our justice systems and our economies have AI in them. And it's really important that AI stays accessible so that we know both how it works and when it doesn't work. And there's no single solution for really complex things like bias or copyright or climate change. But by creating tools to measure AI's impact, we can start getting an idea of how bad they are and start addressing them as we go. Start creating guardrails to protect society and the planet. And once we have this information, companies can use it in order to say, OK, we're going to choose this model because it's more sustainable, this model because it respects copyright. Legislators who really need information to write laws, can use these tools to develop new regulation mechanisms or governance for AI as it gets deployed into society. And users like you and me can use this information to choose AI models that we can trust, not to misrepresent us and not to misuse our data.
隨著這些模型開始被實際運用, 被融入到我們社會的結構中, 我們的手機、社群媒體動態, 甚至我們的司法系統 和經濟裡面都有人工智慧。 保持人工智慧能夠被理解 是相當重要的事, 這樣我們才能知道它怎麼運作 以及何時它是行不通的。 沒有單一個解決方案可以處理 相當複雜的議題,如偏見、 版權,或氣候變遷。 但藉由創造工具來衡量 人工智慧的影響, 我們可以開始了解它們 有多糟糕,並在發展的 過程中開始做改善。 開始創造護欄,以保護社會和地球。 一旦我們有了這些資訊, 公司就可以用它當依據,說:好, 我們選這個模型是因為它比較永續, 選這個模型是因為它尊重版權。 很需要這些資訊來起草法案的 立法者可以運用這些工具 在人工智慧被應用到社會中時, 開發新的規範機制或管理方法。 像你我這種使用者 可以用這些資訊來選擇 我們可以信任的人工智慧模型, 不會扭曲我們的狀況, 也不會誤用我們的資料。
But what did I reply to that email that said that my work is going to destroy humanity? I said that focusing on AI's future existential risks is a distraction from its current, very tangible impacts and the work we should be doing right now, or even yesterday, for reducing these impacts. Because yes, AI is moving quickly, but it's not a done deal. We're building the road as we walk it, and we can collectively decide what direction we want to go in together.
但,我怎麼回覆那封說我的研究 會毀滅人類的電子郵件? 我說,把焦點放在 人工智慧未來的存在風險 反而是分散注意力,沒去注意 它目前非常實在的影響, 以及我們現在或甚至昨天該做什麼 來減少這些影響。 因為,是的,人工智慧發展迅速, 但它還沒定型,我們是在邊走邊做, 我們大家可以共同決定 我們想要一起走向哪個方向。
Thank you.
謝謝。
(Applause)
(掌聲)