As emoji become ever more ingrained in our online lives, the question asks itself: who decides which emoji we can type? As we learned last time, the answer is the Unicode Consortium, the body that oversees the lexicon of symbols with which computers communicate. Founded in California in 1991, the consortium, in its own words,
is a non-profit corporation devoted to developing, maintaining, and promoting software internationalization standards and data, particularly the Unicode Standard, which specifies the representation of text in all modern software products and standards.1
A noble aim indeed. But who’s behind the curtain?
As might be expected from its origins in Silicon Valley, Unicode’s voting members — that is, those with the final say on which characters will enter the Unicode standard — skew heavily towards big tech companies such as Google, Apple, Microsoft, Facebook and IBM. The German software company SAP is a rare non-American outlier; the Chinese telecoms giant Huawei is rarer still as the only voting company whose home country does not use the Latin alphabet. Rounding out these tech heavyweights are a pair of typographic specialists, Adobe and Monotype.2
There are also a number of non-commercial voting entities, and here things get more interesting. The governments of Oman, Bangladesh, India and Tamil Nadu, one of India’s constituent states, each spend between $12,000 and $18,000 per year for voting rights.2,3* Oman, which joined in 2015, spins its membership as a way to help promote Arabic script culture; more concretely, it wishes that Unicode would fix the way it encodes certain Qur’anic characters.5,6 Bangladesh and India, both of whom became members in 2010, make the case for better support for the scripts of the Indian subcontinent: some Bengali letters are cumbersome to use, for example, while Unicode’s treatment of Indic scripts in general has attracted criticism for a variety of technical reasons.7,8 Tamil Nadu joined a year later, partly to encourage Unicode to improve its approach to Tamil and partly to needle the federal Indian government, with which it had an ongoing feud about Tamil encoding.9 Yet another Indian state, Andhra Pradesh, was a member from 2011 to 2014, when it hoped to boost the profile and the interoperability of its own script, Telugu.10,5
Now, a cynic might look at all this and wonder: are India and the Arabian Peninsula blessed with community-minded governments who have joined up out of the goodness of their hearts? Or is it conceivable, perhaps, that Unicode may not always have been the best steward of non-Latin scripts? Unicode has also come under fire in China, Japan and the Korean Peninsula as the standard-bearer for “CJK unification”, a somewhat controversial effort to reduce the number of characters required to represent those regions’ closely-related scripts.11,12 CJK unification is driven by the Ideographic Rapporteur Group, a separate but related body, but Unicode’s rubber-stamping of the IRG’s work has opened it up to collateral damage all the same.13
These are the problems that arise when a small group of people — however well-informed they are, and however disinterested they try to be — take charge of the lingua franca of the world’s computers. And unfortunately for Unicode, emoji have only widened the scope for criticism.
As we saw last time, the Unicode Consortium’s core mission is to assign a unique number to each and every character that forms part of human writing. With the glyphs in Unicode 1.0 drawn from a collection of existing character sets such as the USA’s ASCII, Japan’s Shift-JIS, and (my personal favourite), the USSR’s spellbinding GOST 10859-64, or Alphanumerical Codes for Punchcards and Punchtapes,14 the question has since become: which new characters should be admitted?
For a textual character, such as we might find in the body of an email or the text of a book, the price of entry is relatively low. It must already be in use, and it must be a single indivisible character. Imagine, for instance, that you want to be able to type in the fictitious (and awesome) character g̈ with a single keystroke on any one of the millions of Unicode-compliant devices around the world. You submit a proposal to Unicode — and they summarily reject it, because no language in the world actually uses g̈. Even if your beloved g-umlaut did exist in the wild, Unicode would reject it all the same because it can already be created by combining the existing LATIN SMALL LETTER G (g) and COMBINING DIAERESIS ( ̈) characters.15†
When it comes to what we might call symbols — characters that live alongside our text but are not entirely of a piece with it — things get a little more complicated. The symbols in Zapf Dingbats and other pi fonts were grandfathered into Unicode 1.0 without a great deal of scrutiny, but the consortium has since become more choosy and today any proposed symbol will be evaluated against a litany of criteria. Is it already in use within computer applications? Is it in common use within an active community? Is its meaning fixed and understood? Can it be used within plain text? (Emoji can, for instance, whereas traffic signs cannot.) Is it part of a group of related symbols? Does it fill a gap in the Unicode standard? There is an equally long list of reasons for rejection. Is the symbol trademarked? Is it typically used in standalone contexts (such as the aforementioned road signs) rather than in running text? Does it occur more in handwritten text than within computer applications? Does it lack a supportive community?16 And so on.
Shigetaka Kurita’s emoji are certainly symbols, as Unicode understands them, and they meet almost all of the positive requirements while simultaneously avoiding most of the negative ones. As such, when Google asked for their admission into the standard, there was little debate. Yet emoji differ from other symbols in one important way: they are constantly changing, both in meaning and in number. And this is problematic for Unicode because, per the consortium’s own guidelines, a symbol that is “part of a set undergoing rapid changes” will normally be rejected.16 As the character set of record, Unicode prefers new additions to be fully baked before incorporating them.
Emoji’s uncertain status is one that Unicode has brought upon itself. In the four years between Unicode 6.0, when emoji first made their way into the standard, and the publication in 2014 of Unicode 7.0,17 the organisation largely ignored emoji. Only two new emoji glyphs were introduced in 2014: SLIGHTLY SMILING FACE and SLIGHTLY FROWNING FACE, intended mostly to complete the familiar airport-bathroom satisfaction spectrum of “😊”, “🙂”, “🙁” and “☹️”.18 In 2015, however, just a year later, the newly-minted Unicode 8.0 incorporated no fewer than forty-one new emoji. Among them were new smileys (🤑, 🤒, 🙄); animals (🦁, 🦀); places of worship (🕍, 🕋, 🕌); sports (🏏, 🏓); and many more besides.19 It heralded a sea-change in how Unicode treated emoji, the fallout from which the organisation is still learning how to handle.
Shortly after release of Unicode 7.0 and its paltry brace of new smileys, Mark Davis and Peter Edberg, respectively the consortium’s president and technical director, circulated a memo addressing the state of emoji. They wrote:
Emoji characters have become extremely popular. Yet the choice of emoji to be represented in Unicode has left many people confused or disappointed.20
They were not wrong. Emoji had spread far and wide since their debut in 2010, but it was apparent that users were dissatisfied with the icons on offer. Moreover, as Davis and Edberg explained, users were bamboozled by the seeming arbitrariness of the emoji on their keyboards — not to mention the basic unfairness of it, too.
To start with, there was a preponderance of Japan-related glyphs such as bullet trains (🚅), creatures from Japanese folklore (👹, 👺), cultural symbols (🎑, 🎍, 🎎) and more, all at the expense of similar icons for other countries. Professions such as construction worker, police officer and dancer (👷, 👮, 💃) were depicted as stereotypically male or female and, similarly, a number of emoji depicting couples and families hewed to heterosexual norms (💏, 👪). Nor was there any variation in skin tone. Emoji’s smiley faces and tiny human figures were drawn with deliberately unreal yellow skin, but it was clear that there were no dark-skinned emoji to be found. Lastly, people just wanted more emoji: Edberg and Davis cited articles from BuzzFeed, NYMag and Business Insider in which, they said, there was “some surprising consistency” among the asked-for symbols.20
All this prompted Unicode to take the unprecedented step of opening up the standard to wholly new emoji, a luxury denied to all other types of symbol before and since. In the run-up to the publication of Unicode 8.0, Edberg and Davis themselves put forward a host of new emoji in the hope of filling some of the most egregious gaps in the standard. Among their new symbols were non-Japanese sports such as cricket (🏏), ice hockey and field hockey (🏒, 🏑), table tennis (🏓) and badminton (🏸);17 foods items such as a taco (🌮), a champagne bottle (🍾), and a wedge of cheese (🧀), all of which had been mooted in magazine articles and publicity campaigns;17 and, more significantly, five “selector” characters that changed the skin tone of existing emoji (🏻, 🏼, 🏽, 🏾 and 🏿).17 From Shervin Afshar of HighTech Passport Ltd and Roozbeh Pournader of Google came an accompanying proposal for religious symbols such as prayer beads (📿), the Kaaba (🕋), a synagogue (🕍) and others.17
The result was a dramatically expanded set of emoji in Unicode version 8.0, formally approved in June 2015 and available on smartphones and computers later that year.19 The floodgates were open.
In this brave new world of user-submitted emoji, a new glyph may be accepted into Unicode if it fills an obvious gap (as in the sports listed above); if the Unicode Consortium thinks it will be popular, based on hashtag usage or other social media barometers; or, lastly, if the consortium knows it will be popular because so many people are asking for it. Corporate emoji, those that are too general or too specific, and those that are likely to be passing fads are rejected.21
What this means is that emoji are now subject to a kind of directed, and occasionally misdirected, evolution. Most simply, any sufficiently motivated individual can make the case for their own personal hobby horse. Erstwhile New York Times journalist Jennifer 8. Lee did just that in proposing the dumpling emoji (🥟), a globally-recognised food whose absence was one of emoji’s many cultural blindspots.22 When radio producer Mark Bramhill lobbied for PERSON IN LOTUS POSITION (🧘), on the other hand, he did so not out of altruism but rather to provide material for an episode of a popular design podcast called 99% Invisible.23‡ Similarly equivocal motives abound on the part of those companies who ask their customers to lobby for new emoji on their behalf. Taco Bell ran a textbook crowdsourcing campaign in which it exhorted its patrons to tweet in support of a taco emoji (🌮), which was ultimately added to Unicode 8.0;24 Durex, on the other hand, failed to raise the same level of support for a condom emoji.25
Non-profits, too, often see emoji as an alternative to traditional advertising. In 2016, a Catalan cultural group petitioned WhatsApp to add an emoji for the traditional Catalunyan porrón, or wine flask, failing to realise that the Unicode Consortium would have been a better bet.26§ Elsewhere, Scotland, England, and Wales recently received their own regional flag emoji (🏴, 🏴 and 🏴, although operating system support is still patchy) after a rather more successful proposal by Owen Wilson, a Welsh journalist.27 And corporate sponsorship and regional boosterism collided when Spanish food company Fallera enlisted a comedian named Eugeni Alemany to promote a paella emoji. The boisterous #PaellaEmoji campaign that ensued persuaded Unicode to approve it, but the resultant SHALLOW PAN OF FOOD emoji (🥘) put a few noses out of joint when Apple’s version did not display the traditional ingredients of rabbit, green beans, and garrafó beans.28
This last case illustrates another way in which emoji differ from other characters. If, having convinced the Unicode Consortium to approve your emoji, you will likely have little say in how it looks when it appears in the wild. In an ideal world, this would not be a problem — after all, how many type designers disagree on what the letter “A” should look like? — but emoji evolve so quickly that nothing can be taken for granted. In 2016, for instance, Apple decided to revise its PISTOL emoji so that it resembled a water pistol rather than a real gun. For two years, Apple’s “🔫” was at odds with the realistic guns depicted by more or less all of its competitors — until a collective change of heart in 2018 saw all of the other pistol emoji updated to match.29 Elsewhere, for a long time Samsung’s CROSSED FLAGS emoji (🎌) displayed Korean rather than the usual Japanese flags30 and, infamously, for most of 2014, Google’s YELLOW HEART (💛) incongruously presented itself as a pink, hairy heart.31 Unicode is a standard; emoji are not.
Today, Unicode finds itself at a crossroads. For most of the consortium’s twenty-seven–year history, its users — that is to say, us, the great unwashed of the internet — rarely gave text encoding a second thought. We typed our emails and text messages and read our web pages without worrying too much about how our letters and words were stored or communicated. That isn’t to say that computer manufacturers and software vendors were not jumping through hoops to make it all work, but nowadays, thanks to Unicode, the situation is much improved. As of October 2018, more than nine out of every ten websites use Unicode and mojibake, or garbled characters, are largely a thing of the past.32
With the advent of emoji, however, and particularly since the Great Emoji Expansion of ’14|| the public are paying a great deal more attention. Unicode’s yearly cadence of new versions brings with it “emoji season”, in which commentators critique the new emoji that will shortly appear on their computers, tablets and smartphones.33,34,35 Not all of this attention is positive (the Unicode Consortium has been criticised for its pale, male and stale membership; emoji for its cultural biases and lack of representation) but the limelight has encouraged the consortium to remake itself, and the little picture-characters it controls, in a more equitable light.
There are even whispers that Unicode would be happier to get out of the emoji business entirely so that it might rededicate itself to the job of encoding the world’s writing systems, but that is a story for another article. For now, it is enough to say that whatever the future holds, the 🧞 is out of the bottle and there is little prospect of it going back in.
- “The Unicode Consortium”, Unicode.Org, 2018. ↢
- “Unicode Members”, Unicode.Org, 2018. ↢
- “Membership Levels and Fees”, Unicode.Org, 2018. ↢
- “Script Encoding Initiative”, UC Berkeley Department of Linguistics, 2018. ↢
- “Unicode Consortium Membership History”, Unicode.Org, 2018. ↢
- Sheera Frenkel, “Why Is This Random Gulf Country Helping Pick Your Emojis?”, BuzzFeed News, 2016. ↢
- Aditya Mukerjee, “I Can Text You A Pile of Poo, But I Can’t Write My Name”, Model View Culture, 2015. ↢
- Jeroen Hellingman, “Indian Scripts and Unicode”, 1998. ↢
- Liat Berdugo, “Two Days With the Shadowy Emoji Overlords”, Rhizome, 2015. ↢
- “Telugu Joins Unicode Consortium As Full Member”, The Hindu, 2011. ↢
- Mirai No Moji kōdo Taikei Ni Watakushitachi Wa Fuan O Motteimasu, 1993. ↢
- Zhou Jing, “Combat over Chinese Character Unification”, china.org.Cn, 2008. ↢
- Ken Whistler, “Unicode Technical Note #26: On the Encoding of Latin, Greek, Cyrillic, and Han”, Unicode.Org, 2010. ↢
- “Source Standards and Specifications”, Unicode.Org, 2018. ↢
- “Submitting Character Proposals”, Unicode.Org, 2016. ↢
- “Criteria for Encoding Symbols”, Unicode.Org, 2016. ↢
- Unknown entry ↢
- Karl Pentzlin, “L2/10-429: Proposal to Encode Three Additional Emoticons”, 2010. ↢
- Jeremy Burge, “Unicode Version 8.0”, Emojipedia. ↢
- Peter Edberg and Mark Davis, “L2/14-172R: Proposed Enhancements for Emoji Characters: Background”, 2014. ↢
- “Submitting Emoji Character Proposals”, Unicode.Org, 2018. ↢
- Charlie Wurzel, “One Woman’s Bizarre, Delightful Quest To Change Emojis Forever”, BuzzFeed News, 2016. ↢
- Mark Bramhill, “Person in Lotus Position”, 99% Invisible, 2017. ↢
- Taco Bell, “The Taco Emoji Needs to Happen”, change.Org, 2014. ↢
- Durex Global, “#BreakingNews: We’re Launching an Exciting New Savoury #condom Range - Eggplant Flavour! ? #CondomEmoji”, Twitter, September–2016. ↢
- Sam Jones, “’A Symbol of Our land’: Catalan Group Pitches WhatsApp porrón Emoji”, The Guardian, August–2016. ↢
- “Wales Flag Emoji Finally Arrives on Twitter”, BBC News, 2017. ↢
- Sarah Miller, “The Secret History of the Paella Emoji”, Food & Wine, 2017. ↢
- Jeremy Burge, “Google Updates Gun Emoji”, Emojipedia Blog, 2018. ↢
- Jeremy Burge, “Samsung Puts Japan Back on the Map”, Emojipedia Blog, 2017. ↢
- John-Michael Bond, “You May Be Accidentally Sending Friends a Hairy Heart Emoji”, Engadget, 2014. ↢
- “Usage Statistics of Character Encodings for Websites”, W3Techs, 2018. ↢
- Jeremy Burge, “Issue 26 — Emoji Action Season”, Emoji Wrap, 2018. ↢
- “Drunk? Anaesthetised? Or Just Seen Your Bank Balance? – What the New Woozy Emoji Really Means”, The Guardian, November–2018. ↢
- Amelia Heathman, “World Emoji Day 2018: First Look at New Apple Emoji in IOS 12 Update”, Evening Standard, October–2018. ↢
- The University of California, Berkeley, also stumps up the cash for a vote, albeit with an educational discount. Its involvement centres around the work of the Script Encoding Initiative, an ongoing project to encode scripts that are not yet in the Unicode standard.4 ↢
- By convention, Unicode character names are given in upper case. ↢
- Full disclosure: I’ve contributed to two episodes of 99% Invisible. ↢
- WhatsApp duly declined, citing their inability to do actually do anything about the request. ↢
- I am trying to make this A Thing. ↢