Ioannis Papachimonas: How computers translate human language

Πώς γίνεται όλα τα διαγαλαξιακά είδη στις ταινίες και στην τηλεόραση να μιλούν τέλεια αγγλικά; Μια γρήγορη απάντηση είναι πως δεν θα είχε ενδιαφέρον μια ταινία όπου το διαστημικό πλήρωμα θα περνούσε αιώνες συντάσσοντας εξωγήινο λεξικό. Αλλά για να είμαστε ακριβείς, οι δημιουργοί του Σταρ Τρεκ και άλλων κόσμων επιστημονικής φαντασίας εισήγαγαν την ιδέα ενός καθολικού μεταφραστή, μιας φορητής συσκευής που μπορεί στη στιγμή να μεταφράσει όλες τις γλώσσες. Επομένως, γίνεται να υπάρξει αληθινά ένας καθολικός μεταφραστής; Ήδη έχουμε πολλά προγράμματα που ισχυρίζονται ότι αυτό ακριβώς κάνουν, μεταφράζουν μια λέξη, μια πρόταση, ένα ολόκληρο βιβλίο από μια γλώσσα σχεδόν σε όλες τις υπόλοιπες, είτε πρόκειται για σύγχρονα αγγλικά είτε για αρχαία σανσκριτικά. Αν η μετάφραση αφορούσε μόνο την αναζήτηση λέξεων στο λεξικό, θα ήταν παιχνιδάκι για αυτά τα προγράμματα. Η πραγματικότητα ωστόσο, είναι λίγο πιο περίπλοκη. Ένα μεταφραστικό πρόγραμμα χρησιμοποιεί μια λεξικολογική βάση δεδομένων, που περιλαμβάνει όλες τις λέξεις που υπάρχουν σε ένα λεξικό, όλους τους γραμματικούς τύπους των λέξεων της γλώσσας εισαγωγής, και ένα σύνολο κανόνων που αναγνωρίζουν τα στοιχειώδη γλωσσικά στοιχεία της. Σε μια φαινομενικά απλή πρόταση όπως: «Τα παιδιά τρώνε τα μάφιν», το πρόγραμμα πρώτα θα εντοπίσει τη σύνταξη ή τη γραμματική δομή, αναγνωρίζοντας τα παιδιά ως το υποκείμενο, και την υπόλοιπη πρόταση ως κατηγόρημα που αποτελείται από το ρήμα «τρώω», και το άμεσο αντικείμενο «τα μάφιν». Έπειτα πρέπει να αναγνωρίσει την αγγλική μορφολογία, ή το πως η γλώσσα αναλύεται στις ελάχιστες νοηματικές ενότητές της, όπως τη λέξη «μάφιν» και την κατάληξη «-s» που δηλώνει τον πληθυντικό. Τέλος, επεξεργάζεται τη σημασιολογία, το τι πράγματι σημαίνουν τα μέρη της πρότασης. Για να μεταφραστεί η πρόταση σωστά, το πρόγραμμα θα αναζητήσει σε άλλο σύνολο λεξιλογικών όρων και κανόνων το κάθε στοιχείο της γλώσσας προς μετάφραση. Αλλά εδώ γίνεται περίπλοκο. Το συντακτικό ορισμένων γλωσσών επιτρέπει κάθε γλωσσικό συνδυασμό ενώ σε άλλες, κάποιοι συνδυασμοί θα έκαναν τα μάφιν να τρώνε τα παιδιά. Η μορφολογία επίσης αποτελεί πρόβλημα. Τα σλοβένικα διακρίνουν τα δύο από τα τρία παιδιά ή περισσότερα παιδιά προσθέτοντας ένα δυϊκό επίθημα που απουσιάζει από πολλές άλλες γλώσσες, ενώ η απουσία οριστικών άρθρων στα ρώσικα θα σας κάνει να αναρωτιέστε αν τα παιδιά τρώνε κάποια συγκεκριμένα μάφιν ή αν γενικά τρώνε μάφιν. Εν κατακλείδι, ακόμα και αν η σημασιολογία είναι τεχνικά ορθή, το πρόγραμμα ίσως σφάλει σε λεπτότερες έννοιες, όπως αν τα παιδιά «τρώνε» τα μάφιν ή αν τα «καταβροχθίζουν». Ένας άλλος τρόπος είναι η στατική μηχανική μετάφραση, η οποία αναλύει μια βάση δεδομένων από βιβλία, άρθρα και έγγραφα που έχουν ήδη μεταφραστεί από ανθρώπους. Εντοπίζοντας ζεύγη μεταξύ αρχικού και μεταφρασμένου κειμένου που είναι απίθανο να εμφανίστηκαν τυχαία, το πρόγραμμα μπορεί να αναγνωρίσει αντιστοιχίες φράσεων και δομών, και τα χρησιμοποιεί σε μελλοντικές μεταφράσεις. Όπως και να έχει, η ποιότητα αυτών των μεταφράσεων εξαρτάται από το μέγεθος της αρχικής βάσης δεδομένων και τη διαθεσιμότητα δειγμάτων μετάφρασης των διαφόρων γλωσσών ή του ύφους γραφής. Η δυσκολία που έχουν οι μηχανές να αποδώσουν εξαιρέσεις, ανωμαλίες, και τα αποχρώντα νοήματα που ενστικτωδώς κατανοούν οι άνθρωποι, έχει πείσει μερικούς ερευνητές πως η γλωσσική μας κατανόηση είναι μοναδικό προϊόν της βιολογικής εγκεφαλικής μας δομής. Ένας από τους διασημότερους φανταστικούς καθολικούς μεταφραστές, το «Babel Fish» της σειράς: «Γυρίστε τον γαλαξία με ωτοστόπ», δεν είναι καν μηχάνημα αλλά ένα πλασματάκι που μεταφράζει τα εγκεφαλικά κύματα και νευρικά σήματα νοήμονων ειδών μέσω ενός είδους τηλεπάθειας. Προς στιγμήν, η εκμάθηση μιας γλώσσας με τον παραδοσιακό τρόπο εξακολουθεί να αποδίδει καλύτερα από κάθε διαθέσιμο λογισμικό. Αλλά αυτό δεν είναι εύκολο, καθώς μόνο ο τεράστιος αριθμός των γλωσσών στον κόσμο, καθώς και η διαρκής αλληλεπίδραση των ομιλητών τους, θα επιφέρουν μεγάλες εξελίξεις στην αυτόματη μετάφραση. Μπορεί μέχρι να συναντήσουμε διαγαλαξιακά όντα, να μπορούμε να επικοινωνήσουμε μαζί τους με μια μικρή συσκευή, ή μπορούμε, εναλλακτικά, να αρχίσουμε να συντάσσουμε το λεξικό της γλώσσας τους.

How is it that so many intergalactic species in movies and TV just happen to speak perfect English? The short answer is that no one wants to watch a starship crew spend years compiling an alien dictionary. But to keep things consistent, the creators of Star Trek and other science-fiction worlds have introduced the concept of a universal translator, a portable device that can instantly translate between any languages. So is a universal translator possible in real life? We already have many programs that claim to do just that, taking a word, sentence, or entire book in one language and translating it into almost any other, whether it's modern English or Ancient Sanskrit. And if translation were just a matter of looking up words in a dictionary, these programs would run circles around humans. The reality, however, is a bit more complicated. A rule-based translation program uses a lexical database, which includes all the words you'd find in a dictionary and all grammatical forms they can take, and set of rules to recognize the basic linguistic elements in the input language. For a seemingly simple sentence like, "The children eat the muffins," the program first parses its syntax, or grammatical structure, by identifying the children as the subject, and the rest of the sentence as the predicate consisting of a verb "eat," and a direct object "the muffins." It then needs to recognize English morphology, or how the language can be broken down into its smallest meaningful units, such as the word muffin and the suffix "s," used to indicate plural. Finally, it needs to understand the semantics, what the different parts of the sentence actually mean. To translate this sentence properly, the program would refer to a different set of vocabulary and rules for each element of the target language. But this is where it gets tricky. The syntax of some languages allows words to be arranged in any order, while in others, doing so could make the muffin eat the child. Morphology can also pose a problem. Slovene distinguishes between two children and three or more using a dual suffix absent in many other languages, while Russian's lack of definite articles might leave you wondering whether the children are eating some particular muffins, or just eat muffins in general. Finally, even when the semantics are technically correct, the program might miss their finer points, such as whether the children "mangiano" the muffins, or "divorano" them. Another method is statistical machine translation, which analyzes a database of books, articles, and documents that have already been translated by humans. By finding matches between source and translated text that are unlikely to occur by chance, the program can identify corresponding phrases and patterns, and use them for future translations. However, the quality of this type of translation depends on the size of the initial database and the availability of samples for certain languages or styles of writing. The difficulty that computers have with the exceptions, irregularities and shades of meaning that seem to come instinctively to humans has led some researchers to believe that our understanding of language is a unique product of our biological brain structure. In fact, one of the most famous fictional universal translators, the Babel fish from "The Hitchhiker's Guide to the Galaxy", is not a machine at all but a small creature that translates the brain waves and nerve signals of sentient species through a form of telepathy. For now, learning a language the old fashioned way will still give you better results than any currently available computer program. But this is no easy task, and the sheer number of languages in the world, as well as the increasing interaction between the people who speak them, will only continue to spur greater advances in automatic translation. Perhaps by the time we encounter intergalactic life forms, we'll be able to communicate with them through a tiny gizmo, or we might have to start compiling that dictionary, after all.

Ioannis Papachimonas: How computers translate human language

Ioannis Papachimonas: How computers translate human language

Related talks

Krystian Aparta: One of the most difficult words to translate...

Sajan Saini: The hidden network that makes the internet possible

John McWhorter: Are Elvish, Klingon, Dothraki and Na'vi real languages?

Related talks

Krystian Aparta: One of the most difficult words to translate...

Sajan Saini: The hidden network that makes the internet possible

John McWhorter: Are Elvish, Klingon, Dothraki and Na'vi real languages?