Alex Gendler: The Turing test: Can a computer pass for a human?

What is consciousness? Can an artificial machine really think? Does the mind just consist of neurons in the brain, or is there some intangible spark at its core? For many, these have been vital considerations for the future of artificial intelligence. But British computer scientist Alan Turing decided to disregard all these questions in favor of a much simpler one: can a computer talk like a human?

意識是什麼？人工機器真的會思考嗎？心智構造除了大腦中的神經元，其核心是否有某種無形的火花？對許多人來說，這些問題對未來的人工智慧至關重要。但英國計算機科學家艾倫·圖靈決定忽略所有這些問題，而轉向一個更簡單的問題：電腦能像人一樣地交談嗎？

This question led to an idea for measuring aritificial intelligence that would famously come to be known as the Turing test. In the 1950 paper, "Computing Machinery and Intelligence," Turing proposed the following game. A human judge has a text conversation with unseen players and evaluates their responses. To pass the test, a computer must be able to replace one of the players without substantially changing the results. In other words, a computer would be considered intelligent if its conversation couldn't be easily distinguished from a human's.

這一問題引發了測量人工智慧的概念，也就是後來眾所周知的「圖靈測試」。在 1950 年的一篇《運算機器與智能》論文中，圖靈提出了以下的測試。一名人類裁判與看不見的參賽者進行文字對話，並評估他們的回應。要通過測試，電腦必須能夠取代其中一名參賽者而不明顯地改變結果。換句話說，電腦要被認為是聰明的，它的對話必得是與人類的對話難以區分。

Turing predicted that by the year 2000, machines with 100 megabytes of memory would be able to easily pass his test. But he may have jumped the gun. Even though today's computers have far more memory than that, few have succeeded, and those that have done well focused more on finding clever ways to fool judges than using overwhelming computing power. Though it was never subjected to a real test, the first program with some claim to success was called ELIZA. With only a fairly short and simple script, it managed to mislead many people by mimicking a psychologist, encouraging them to talk more and reflecting their own questions back at them. Another early script PARRY took the opposite approach by imitating a paranoid schizophrenic who kept steering the conversation back to his own preprogrammed obsessions. Their success in fooling people highlighted one weakness of the test. Humans regularly attribute intelligence to a whole range of things that are not actually intelligent. Nonetheless, annual competitions like the Loebner Prize, have made the test more formal with judges knowing ahead of time that some of their conversation partners are machines.

圖靈預測，到公元 2000 年，擁有 100 MB 記憶體的電腦將能輕易通過他的測試，但他可能言之過早了。即使到今天，配備更高記憶體的電腦，也沒幾個成功，那些測試結果還不錯的電腦，專注於如何投機取巧愚弄裁判，而非運用強大的運算能力。即使從未真正接受測試，第一個聲稱成功過關的軟體是 ELIZA。靠著簡短的程式腳本，模仿心理學家，誤導了許多人，藉由鼓勵他們多說一些，然後拿他們的問題再反問他們。另一個早期軟體 PARRY 則採取相反的方式，藉著模仿偏執的精神分裂症者，一直把對話轉回它自己預設的話題上。這兩個軟體成功地愚弄人們，暴露出這測驗的弱點── 人們經常把許多和聰明才智無關的事物視為「智能」。然而，一些年度競賽例如羅布納獎 (Loebner Prize)，為了使測試更加正式，會讓裁判預先知道對話夥伴中參雜了電腦。

But while the quality has improved, many chatbot programmers have used similar strategies to ELIZA and PARRY. 1997's winner Catherine could carry on amazingly focused and intelligent conversation, but mostly if the judge wanted to talk about Bill Clinton. And the more recent winner Eugene Goostman was given the persona of a 13-year-old Ukrainian boy, so judges interpreted its nonsequiturs and awkward grammar as language and culture barriers. Meanwhile, other programs like Cleverbot have taken a different approach by statistically analyzing huge databases of real conversations to determine the best responses. Some also store memories of previous conversations in order to improve over time. But while Cleverbot's individual responses can sound incredibly human, its lack of a consistent personality and inability to deal with brand new topics are a dead giveaway.

雖然品質已有改善，許多聊天機器人的設計方法仍是類似 ELIZA 和 PARRY。 1997 年的獲獎者── Catherine，能夠進行非常專注且聰明的對話，但多是在裁判聊到有關比爾·柯林頓的話題時。而最近的獲獎者── Eugene Goostman，則因它所被賦予的 13 歲烏克蘭男孩的身份，裁判將之前後不連貫以及蹩脚的語法視為是語言及文化的障礙所致。同時，有些類似 Cleverbot 的程式則採用不同的方法，藉著統計分析大量真實對話的資料庫，去研判出最佳回應。有些的記憶體中則存有先前的對話，以便與時俱進。 Cleverbot 的個別回應雖然聽起來跟人類極為相像，它缺乏前後一致的性格，而且無法應付全新的話題，因而明顯露出馬腳。

Who in Turing's day could have predicted that today's computers would be able to pilot spacecraft, perform delicate surgeries, and solve massive equations, but still struggle with the most basic small talk? Human language turns out to be an amazingly complex phenomenon that can't be captured by even the largest dictionary. Chatbots can be baffled by simple pauses, like "umm..." or questions with no correct answer. And a simple conversational sentence, like, "I took the juice out of the fridge and gave it to him, but forgot to check the date," requires a wealth of underlying knowledge and intuition to parse. It turns out that simulating a human conversation takes more than just increasing memory and processing power, and as we get closer to Turing's goal, we may have to deal with all those big questions about consciousness after all.

圖靈那一代的人有誰預料到現今的電腦能夠駕駛太空船、執行精密的手術、還能運算龐大的方程式，卻無法掌握最基本的閒聊呢？人類的語言原本就是一種極其複雜的現象，就連最大的字典也無法囊括。聊天機器人可能會被簡單的停頓，如「嗯…」或沒有正確答案的問題弄糊塗。一句簡單的對話，例如：「我把果汁從冰箱拿出給他，但忘了檢查到期日。」需要大量的背景知識和直覺來解析。事實証明模擬人類對話需要的不僅只於記憶體的增加及數據處理的能力，隨著人類越來越接近圖靈的目標，我們最終仍得先解開意識是什麼的重大問題。

Alex Gendler: The Turing test: Can a computer pass for a human?

Alex Gendler: The Turing test: Can a computer pass for a human?

Related talks

Briana Brownell: How does artificial intelligence learn?

Matt Porter and Margaret Hamilton: NASA's first software engineer: Margaret Hamilton

Chiara Decaroli: The high-stakes race to make quantum computers work

Related talks

Briana Brownell: How does artificial intelligence learn?

Matt Porter and Margaret Hamilton: NASA's first software engineer: Margaret Hamilton

Chiara Decaroli: The high-stakes race to make quantum computers work