The fascinating evolution of typing Chinese characters

The idea of downloading a third-party keyboard to your phone may seem unnecessary to most people, but in China it’s the norm. 

Chinese is the only modern language that’s logographic, meaning that the way a character is written can be completely separate from its pronunciation (Japanese, Korean, and Vietnamese have their variations of the Chinese characters). Because of that, relying on a default keyboard would be incredibly difficult. So today, 800 million people in China use smart keyboard software that predicts what a user wants to type.

But a strong reliance on this technology also presents a security risk: most keyboard apps transmit keystrokes to the cloud to enable better text prediction, creating an opportunity for the content to be intercepted if the apps don’t have strong enough encryption protocols.

This week, I reported on one such encryption loophole found in Sogou, one of China’s most popular third-party keyboard apps. A group of researchers at the Citizen Lab, a University of Toronto–affiliated research group, managed to intercept almost everything they typed into Sogou by deploying a two-decade-old exploit.

Not only can this kind of software endanger people’s personal and financial information, but—perhaps more important—it can compromise otherwise encrypted messages in apps like Signal, and allow them to be caught by police or malicious actors.

But for the newsletter, I want to take you all on a geeky journey into the history of keyboard apps—or input method editors (IMEs), as they are formally called. IMEs are so ubiquitous and fundamental today that it’s easy to forget how much hard work was put into their creation. And they’re a fascinating example of how innovations can bridge the gap between the digital world and the real world.

In the ’80s, there was no way of processing Chinese characters with the personal computers on the market. Even after the laborious process of digitizing Chinese characters to be displayed on computer screens, a big question remained: How do you type those characters? Particularly, how do you match the tens of thousands of Chinese characters to the 26 letters on a QWERTY keyboard?

The first attempt was vastly different from the keyboard apps today, and centered on how Chinese characters are written.

In August 1983, exactly 40 years ago, a Chinese engineer named Wang Yongmin developed the first popular way to input Chinese characters into a computer: Wubi. He did it by breaking down a Chinese character into different strokes and assigning several strokes to each letter on the QWERTY keyboard.

The diagram above shows how each key is matched with three to 12 character components. The texts at bottom are poems to help users remember the combinations.

For example, the Chinese character for dog, 犬, has several shapes in it: 犬, 一, 丿, and丶.These shapes were matched with the keys D, G, T, and Y, respectively. So when a user typed “DGTY,” a Wubi input software would match that to the character 犬.

A guide on how the character 犬 should be typed into Wubi software.

Wubi was able to match every Chinese character with a keystroke combination using at maximum four QWERTY keys. It’s considered one of the fastest ways to type Chinese, but the downside is also pretty obvious: users need to memorize which keys correspond to which strokes, so the learning curve is quite steep. (One way people have remembered the keyboard designations? Jingles!)

The next step in the evolution of Chinese IMEs was the invention of typing by phonetic spelling.

It may be hard to believe, but pinyin, the modern way of

