The One Weird Trick: Revealed Here

Part III of a V-part series

Dec 01, 2019

See how digestible this newsletter is now? These tiny, bite-sized morsels are like potato chips, aren’t they?

The Revolution in Machine Translation

In November 2017, Google unveiled a new automatic language translation application. It did so discreetly and with little fanfare.

But the accomplishment is revolutionary.

Research into machine translation, inspired by Claude Shannon’s work in information theory, began in earnest in the 1950s. Early prototypes relied upon bilingual dictionaries and hand-coded rules, and the results were garbled. So were the later prototypes. Characteristic was an infamous 2013 fiasco involving the Turkish daily Yeni Şafak and the old version of Google Translate. Taking improvisational license with an e-mail from Noam Chomsky, Yeni Şafak invented a few Chomsky quotes, ran them through the original Google Translate, and proudly published the result:

This complexity in the Middle East, do you think the Western states flapping because of this chaos? Contrary to what happens when everything that milk port, enters the work order, then begins to bustle in the West. I’ve seen the plans works …

“Milkport”—from the Turkish süt liman, meaning “smooth sailing”—became Turkish shorthand for an amalgam of ludicrous machine translation and fake news.

Most Americans, when they think of machine translation, still imagine that kind of gobbledygook—or perhaps they remember the legendary (but apocryphal) translation of “The spirit is willing but the flesh is weak,” rendered, according to one version of the story, as “The vodka is strong but the steak is rotten.”

During the Cold War, the U.S. government took a keen interest in machine translation, for obvious reasons, and 1964 established the Automatic Language Processing Advisory Committee to evaluate progress in computational linguistics in general, and machine translation in particular. The committee examined such specimens as this:

Thus, the examination of some from fundamental RADIOBIOLOGICESKIX problems shows, that in this a field still very much NEREWENNYX questions. This is clear, since cosmic RADIOBIOLOGI4 is very young RAZDELOM young science efforts of the scientific different specialties of the different countries of the world successful PRODOLJENY will be expanded there are. . . .

The Committee’s conclusions were grim. The committee shared the assessment of R.T. Beyer, an American physicist known for his translations of Russian and German physics journals into English:

I must confess that the results were most unhappy. I found that I spent at least as much time in editing as if I had carried out the entire translation from the start. Even at that, I doubt if the edited translation reads as smoothly as one which I would have started from scratch. I drew the conclusion that the machine today translates from a foreign language to a form of broken English somewhat comparable to pidgin English. But it then remains for the reader to learn this patois in order to understand what the Russian actually wrote. Learning Russian would not be much more difficult. Someday, perhaps, the machines will make it, but I as a translator do not yet believe that I must throw my monkey wrench into the machinery in order to prevent my technological unemployment.

In 1966, the committee published a report deeming machine translation hopeless. It discouraged the Department of Defense and the CIA from further funding the task.

The idea that machine translation cannot work, that the subtleties of human language will forever be beyond the grasp of machines, appeals to human vanity—or to human humility; the Tower of Babel story comes to mind. But metaphysical and theological speculation aside, consider the facts. Here is a translated article out of Mandarin from the front page of China’s Peoples’ Daily:

Only by looking back at history and remembering the past can we profoundly understand that the red political power is hard-won, that the new China is hard-won, and socialism with Chinese characteristics is hard to come by.
The key to the eternal vitality of our party and its continual success from victory to victory lies in its ability to remember its original heart and keep in mind its mission.
Yudu, Jiangxi Province, is the starting point for the 25,000-mile long march of the Central Red Army. On May 20th, General Secretary Xi Jinping came here to pay tribute to the Central Red Army’s Long March Departure Monument. He cordially met with the descendants of the Red Army and the representatives of the revolutionary martyrs in Dudu County. . . .
General Secretary Xi Jinping reaffirmed the communists’ initial intentions, missions, ideals and purposes in the old revolutionary districts, and injected powerful positive energy for the majority of party members and cadres to remember their initial intentions, keep their missions in mind, and continue to struggle.

How well would Google’s new, November 2017 machine translation tool translate this? That’s a trick question: That is Google’s new machine translation, directly from the Chinese, unedited by human hand, uncorrected, and now available for free to any English-speaker who consults the website of the People’s Daily. Any American, even one who knows not a single Chinese character, can now read any Chinese newspaper on the internet, from front page to last.

If this is astonishing, it is not surprising that few Americans realize what has happened: The major news aggregation algorithms only serve English-language results to their news feeds. For reasons of habit and (they think) efficiency, they never bother with non-English sources, because, insofar as they think about it all, they presume that the cost in human resources and time in accessing non-English sources cannot be justified by the market for their consumption.

This presumption is obsolete. The media managers do not yet realize that Google Translate can translate almost instantaneously and accurately all the major and most minor newspapers of the world, be they written in Mandarin, Farsi, or Russian. They do not realize this translation is free and very often of higher quality than that provided by professional translators. Only a handful of specialists truly grasp how much progress has been made in machine translation in the past year alone, as suggested by the fact that The New York Times published but a single article about the rollout of Google’s new service, titled “The Great AI Awakening”—and other major papers published no article at all.

The article notes that the day after Google rolled out the new system, “it demonstrated overnight improvements roughly equal to the total gains the old one had accrued over its entire lifetime.” The system continues to learn at this speed.

You can also support me on Patreon!

USER INSTRUCTIONS:

Do you have a short attention span? If so, stop here and skip right to the next newsletter. It will still make sense. No one will be the wiser.

Do you want to know more?

Keep reading.

How Did They Do It?

The original Google Translate used statistical machine translation. SMT presumes that for each segment of source text, there are a number of possible target segments, and the probability that one of them is the correct translation varies. The engine is asked to select the segment with the highest statistical probability. The engine is only as good as the available multilingual corpora: Google used United Nations and European Parliament transcripts. Although its sheer processing power gave Google an edge over other SMT engines, it was still a primitive product.

In 2017, Google rolled out neural machine translation. The results, in more than 100 languages, have been astonishing. Mother-tongue language speakers asked to rate Google’s translations on a scale from 0 to 6 now offer an average rating of 5.43.

In the new engine, words or even parts of words are are transformed into “word vectors.” “Milk” does not merely represent the characters m, i, l, and k, but information about the context in which it has been used from the training data. During the training phase, the system tries to set the parameter weights of the neural network based on the reference values—the source-target translation. Words appearing in similar context receive similar word vectors. The result is a network that can process source segments and transfer them to target segments.

This is how the New York Times described it:

A rarefied department within the company, Google Brain, was founded five years ago on this very principle: that artificial “neural networks” that acquaint themselves with the world via trial and error, as toddlers do, might in turn develop something like human flexibility. This notion is not new—a version of it dates to the earliest stages of modern computing, in the 1940s—but for much of its history most computer scientists saw it as vaguely disreputable, even mystical. Since 2011, though, Google Brain has demonstrated that this approach to artificial intelligence could solve many problems that confounded decades of conventional efforts.

The linguistic and technological explanation of their achievement is fascinating, but it’s not even my main point.

The main point is that this is a miracle. Language is no longer an obstacle. We can understand everything. This is an amazing time to be alive.

Google Translate may now be used to read every major newspaper in the world instantly, for free, and in English.

It’s not perfect. It’s better in some languages than others. The translations are still prone to mistakes and infelicities. Sometimes they’re hilariously wrong, but this is true of human translations, too.

Google translates “reset” as сброс, which actually means “reset” in Russian, as opposed to перегрузка, which does not (it means “overload” or, in some contexts, “override”). Score one for the machines.

Google’s translation engine is getting better very quickly—and getting better much faster than a human ever will. Machine translation is now in the ascent phase of the technology lifecycle. If evidence of its superiority to all but the best human translation is needed, there is this: Human translators, like journalists, now face technological unemployment. R.T. Beyer is toast.

So how can we take advantage of this?

I’ll explain this soon.

Meanwhile, I’ve started a PayPal pool to make it easier for you to support me. Please contribute! And please don’t forget to tell your friends they should subscribe. Give your friends Claire for Christmas!

Join the Claire Pool!

The One Weird Trick: Revealed Here

Part III of a V-part series

The Revolution in Machine Translation

USER INSTRUCTIONS:

How Did They Do It?

Discussion about this post