A post from Shady Characters

Emoji, part 4: who owns emoji?

This is the fourth in a series of thirteen posts on Emoji (😂). Start at PART 1, continue to PART 5 or view ALL POSTS in the series.

As emoji become ever more ingrained in our online lives, the question asks itself: who decides which emoji we can type? As we learned last time, the answer is the Unicode Consortium, the body that oversees the lexicon of symbols with which computers communicate. Founded in California in 1991, the consortium, in its own words,

is a non-profit corporation devoted to developing, maintaining, and promoting software internationalization standards and data, particularly the Unicode Standard, which specifies the representation of text in all modern software products and standards.1

A noble aim indeed. But who’s behind the curtain?

As might be expected from its origins in Silicon Valley, Unicode’s voting members — that is, those with the final say on which characters will enter the Unicode standard — skew heavily towards big tech companies such as Google, Apple, Microsoft, Facebook and IBM. The German software company SAP is a rare non-American outlier; the Chinese telecoms giant Huawei is rarer still as the only voting company whose home country does not use the Latin alphabet. Rounding out these tech heavyweights are a pair of typographic specialists, Adobe and Monotype.2

There are also a number of non-commercial voting entities, and here things get more interesting. The governments of Oman, Bangladesh, India and Tamil Nadu, one of India’s constituent states, each spend between $12,000 and $18,000 per year for voting rights.2,3* Oman, which joined in 2015, spins its membership as a way to help promote Arabic script culture; more concretely, it wishes that Unicode would fix the way it encodes certain Qur’anic characters.5,6 Bangladesh and India, both of whom became members in 2010, make the case for better support for the scripts of the Indian subcontinent: some Bengali letters are cumbersome to use, for example, while Unicode’s treatment of Indic scripts in general has attracted criticism for a variety of technical reasons.7,8 Tamil Nadu joined a year later, partly to encourage Unicode to improve its approach to Tamil and partly to needle the federal Indian government, with which it had an ongoing feud about Tamil encoding.9 Yet another Indian state, Andhra Pradesh, was a member from 2011 to 2014, when it hoped to boost the profile and the interoperability of its own script, Telugu.10,5

Now, a cynic might look at all this and wonder: are India and the Arabian Peninsula blessed with community-minded governments who have joined up out of the goodness of their hearts? Or is it conceivable, perhaps, that Unicode may not always have been the best steward of non-Latin scripts? Unicode has also come under fire in China, Japan and the Korean Peninsula as the standard-bearer for “CJK unification”, a somewhat controversial effort to reduce the number of characters required to represent those regions’ closely-related scripts.11,12 CJK unification is driven by the Ideographic Rapporteur Group, a separate but related body, but Unicode’s rubber-stamping of the IRG’s work has opened it up to collateral damage all the same.13

These are the problems that arise when a small group of people — however well-informed they are, and however disinterested they try to be — take charge of the lingua franca of the world’s computers. And unfortunately for Unicode, emoji have only widened the scope for criticism.

As we saw last time, the Unicode Consortium’s core mission is to assign a unique number to each and every character that forms part of human writing. With the glyphs in Unicode 1.0 drawn from a collection of existing character sets such as the USA’s ASCII, Japan’s Shift-JIS, and (my personal favourite), the USSR’s spellbinding GOST 10859-64, or Alphanumerical Codes for Punchcards and Punchtapes,14 the question has since become: which new characters should be admitted?

For a textual character, such as we might find in the body of an email or the text of a book, the price of entry is relatively low. It must already be in use, and it must be a single indivisible character. Imagine, for instance, that you want to be able to type in the fictitious (and awesome) character with a single keystroke on any one of the millions of Unicode-compliant devices around the world. You submit a proposal to Unicode — and they summarily reject it, because no language in the world actually uses . Even if your beloved g-umlaut did exist in the wild, Unicode would reject it all the same because it can already be created by combining the existing LATIN SMALL LETTER G (g) and COMBINING DIAERESIS ( ̈) characters.15

When it comes to what we might call symbols — characters that live alongside our text but are not entirely of a piece with it — things get a little more complicated. The symbols in Zapf Dingbats and other pi fonts were grandfathered into Unicode 1.0 without a great deal of scrutiny, but the consortium has since become more choosy and today any proposed symbol will be evaluated against a litany of criteria. Is it already in use within computer applications? Is it in common use within an active community? Is its meaning fixed and understood? Can it be used within plain text? (Emoji can, for instance, whereas traffic signs cannot.) Is it part of a group of related symbols? Does it fill a gap in the Unicode standard? There is an equally long list of reasons for rejection. Is the symbol trademarked? Is it typically used in standalone contexts (such as the aforementioned road signs) rather than in running text? Does it occur more in handwritten text than within computer applications? Does it lack a supportive community?16 And so on.

Shigetaka Kurita’s emoji are certainly symbols, as Unicode understands them, and they meet almost all of the positive requirements while simultaneously avoiding most of the negative ones. As such, when Google asked for their admission into the standard, there was little debate. Yet emoji differ from other symbols in one important way: they are constantly changing, both in meaning and in number. And this is problematic for Unicode because, per the consortium’s own guidelines, a symbol that is “part of a set undergoing rapid changes” will normally be rejected.16 As the character set of record, Unicode prefers new additions to be fully baked before incorporating them.

Emoji’s uncertain status is one that Unicode has brought upon itself. In the four years between Unicode 6.0, when emoji first made their way into the standard, and the publication in 2014 of Unicode 7.0,17 the organisation largely ignored emoji. Only two new emoji glyphs were introduced in 2014: SLIGHTLY SMILING FACE and SLIGHTLY FROWNING FACE, intended mostly to complete the familiar airport-bathroom satisfaction spectrum of ‘😊’, ‘🙂’, ‘🙁’ and ‘☹️’.18 In 2015, however, just a year later, the newly-minted Unicode 8.0 incorporated no fewer than forty-one new emoji. Among them were new smileys (🤑, 🤒, 🙄); animals (🦁, 🦀); places of worship (🕍, 🕋, 🕌); sports (🏏, 🏓); and many more besides.19 It heralded a sea-change in how Unicode treated emoji, the fallout from which the organisation is still learning how to handle.

Shortly after release of Unicode 7.0 and its paltry brace of new smileys, Mark Davis and Peter Edberg, respectively the consortium’s president and technical director, circulated a memo addressing the state of emoji. They wrote:

Emoji characters have become extremely popular. Yet the choice of emoji to be represented in Unicode has left many people confused or disappointed.20

They were not wrong. Emoji had spread far and wide since their debut in 2010, but it was apparent that users were dissatisfied with the icons on offer. Moreover, as Davis and Edberg explained, users were bamboozled by the seeming arbitrariness of the emoji on their keyboards — not to mention the basic unfairness of it, too.

To start with, there was a preponderance of Japan-related glyphs such as bullet trains (🚅), creatures from Japanese folklore (👹, 👺), cultural symbols (🎑, 🎍, 🎎) and more, all at the expense of similar icons for other countries. Professions such as construction worker, police officer and dancer (👷, 👮, 💃) were depicted as stereotypically male or female and, similarly, a number of emoji depicting couples and families hewed to heterosexual norms (💏, 👪). Nor was there any variation in skin tone. Emoji’s smiley faces and tiny human figures were drawn with deliberately unreal yellow skin, but it was clear that there were no dark-skinned emoji to be found. Lastly, people just wanted more emoji: Edberg and Davis cited articles from BuzzFeed, NYMag and Business Insider in which, they said, there was “some surprising consistency” among the asked-for symbols.20

All this prompted Unicode to take the unprecedented step of opening up the standard to wholly new emoji, a luxury denied to all other types of symbol before and since. In the run-up to the publication of Unicode 8.0, Edberg and Davis themselves put forward a host of new emoji in the hope of filling some of the most egregious gaps in the standard. Among their new symbols were non-Japanese sports such as cricket (🏏), ice hockey and field hockey (🏒, 🏑), table tennis (🏓) and badminton (🏸);17 foods items such as a taco (🌮), a champagne bottle (🍾), and a wedge of cheese (🧀), all of which had been mooted in magazine articles and publicity campaigns;17 and, more significantly, five “selector” characters that changed the skin tone of existing emoji (🏻, 🏼, 🏽, 🏾 and 🏿).17 From Shervin Afshar of HighTech Passport Ltd and Roozbeh Pournader of Google came an accompanying proposal for religious symbols such as prayer beads (📿), the Kaaba (🕋), a synagogue (🕍) and others.17

The result was a dramatically expanded set of emoji in Unicode version 8.0, formally approved in June 2015 and available on smartphones and computers later that year.19 The floodgates were open.

In this brave new world of user-submitted emoji, a new glyph may be accepted into Unicode if it fills an obvious gap (as in the sports listed above); if the Unicode Consortium thinks it will be popular, based on hashtag usage or other social media barometers; or, lastly, if the consortium knows it will be popular because so many people are asking for it. Corporate emoji, those that are too general or too specific, and those that are likely to be passing fads are rejected.21

What this means is that emoji are now subject to a kind of directed, and occasionally misdirected, evolution. Most simply, any sufficiently motivated individual can make the case for their own personal hobby horse. Erstwhile New York Times journalist Jennifer 8. Lee did just that in proposing the dumpling emoji (🥟), a globally-recognised food whose absence was one of emoji’s many cultural blindspots.22 When radio producer Mark Bramhill lobbied for PERSON IN LOTUS POSITION (🧘), on the other hand, he did so not out of altruism but rather to provide material for an episode of a popular design podcast called 99% Invisible.23 Similarly equivocal motives abound on the part of those companies who ask their customers to lobby for new emoji on their behalf. Taco Bell ran a textbook crowdsourcing campaign in which it exhorted its patrons to tweet in support of a taco emoji (🌮), which was ultimately added to Unicode 8.0;24 Durex, on the other hand, failed to raise the same level of support for a condom emoji.25

Non-profits, too, often see emoji as an alternative to traditional advertising. In 2016, a Catalan cultural group petitioned WhatsApp to add an emoji for the traditional Catalunyan porrón, or wine flask, failing to realise that the Unicode Consortium would have been a better bet.26§ Elsewhere, Scotland, England, and Wales recently received their own regional flag emoji (🏴󠁧󠁢󠁳󠁣󠁴󠁿, 🏴󠁧󠁢󠁥󠁮󠁧󠁿 and 🏴, although operating system support is still patchy󠁧󠁢󠁷󠁬󠁳󠁿) after a rather more successful proposal by Owen Wilson, a Welsh journalist.27 And corporate sponsorship and regional boosterism collided when Spanish food company Fallera enlisted a comedian named Eugeni Alemany to promote a paella emoji. The boisterous #PaellaEmoji campaign that ensued persuaded Unicode to approve it, but the resultant SHALLOW PAN OF FOOD emoji (🥘) put a few noses out of joint when Apple’s version did not display the traditional ingredients of rabbit, green beans, and garrafó beans.28

This last case illustrates another way in which emoji differ from other characters. If, having convinced the Unicode Consortium to approve your emoji, you will likely have little say in how it looks when it appears in the wild. In an ideal world, this would not be a problem — after all, how many type designers disagree on what the letter ‘A’ should look like? — but emoji evolve so quickly that nothing can be taken for granted. In 2016, for instance, Apple decided to revise its PISTOL emoji so that it resembled a water pistol rather than a real gun. For two years, Apple’s ‘🔫’ was at odds with the realistic guns depicted by more or less all of its competitors — until a collective change of heart in 2018 saw all of the other pistol emoji updated to match.29 Elsewhere, for a long time Samsung’s CROSSED FLAGS emoji (🎌) displayed Korean rather than the usual Japanese flags30 and, infamously, for most of 2014, Google’s YELLOW HEART (💛) incongruously presented itself as a pink, hairy heart.31 Unicode is a standard; emoji are not.

Today, Unicode finds itself at a crossroads. For most of the consortium’s twenty-seven–year history, its users — that is to say, us, the great unwashed of the internet — rarely gave text encoding a second thought. We typed our emails and text messages and read our web pages without worrying too much about how our letters and words were stored or communicated. That isn’t to say that computer manufacturers and software vendors were not jumping through hoops to make it all work, but nowadays, thanks to Unicode, the situation is much improved. As of October 2018, more than nine out of every ten websites use Unicode and mojibake, or garbled characters, are largely a thing of the past.32

With the advent of emoji, however, and particularly since the Great Emoji Expansion of ’14|| the public are paying a great deal more attention. Unicode’s yearly cadence of new versions brings with it “emoji season”, in which commentators critique the new emoji that will shortly appear on their computers, tablets and smartphones.33,34,35 Not all of this attention is positive (the Unicode Consortium has been criticised for its pale, male and stale membership; emoji for its cultural biases and lack of representation) but the limelight has encouraged the consortium to remake itself, and the little picture-characters it controls, in a more equitable light.

There are even whispers that Unicode would be happier to get out of the emoji business entirely so that it might rededicate itself to the job of encoding the world’s writing systems, but that is a story for another article. For now, it is enough to say that whatever the future holds, the 🧞 is out of the bottle and there is little prospect of it going back in.

“The Unicode Consortium”, Unicode.Org, 2018. 
“Unicode Members”, Unicode.Org, 2018. 
“Membership Levels and Fees”, Unicode.Org, 2018. 
“Script Encoding Initiative”, UC Berkeley Department of Linguistics, 2018. 
“Unicode Consortium Membership History”, Unicode.Org, 2018. 
Sheera Frenkel, “Why Is This Random Gulf Country Helping Pick Your Emojis?”, BuzzFeed News, 2016. 
Aditya Mukerjee, “I Can Text You A Pile of Poo, But I Can’t Write My Name”, Model View Culture, 2015. 
Jeroen Hellingman, “Indian Scripts and Unicode”, 1998. 
Liat Berdugo, “Two Days With the Shadowy Emoji Overlords”, Rhizome, 2015. 
“Telugu Joins Unicode Consortium As Full Member”, The Hindu, 2011. 
Mirai No Moji kōdo Taikei Ni Watakushitachi Wa Fuan O Motteimasu, 1993. 
Zhou Jing, “Combat over Chinese Character Unification”, china.org.Cn, 2008. 
Ken Whistler, “Unicode Technical Note #26: On the Encoding of Latin, Greek, Cyrillic, and Han”, Unicode.Org, 2010. 
“Source Standards and Specifications”, Unicode.Org, 2018. 
“Submitting Character Proposals”, Unicode.Org, 2016. 
“Criteria for Encoding Symbols”, Unicode.Org, 2016. 
Unknown entry 
Karl Pentzlin, “L2/10-429: Proposal to Encode Three Additional Emoticons”, 2010. 
Jeremy Burge, “Unicode Version 8.0”, Emojipedia
Peter Edberg and Mark Davis, “L2/14-172R: Proposed Enhancements for Emoji Characters: Background”, 2014. 
“Submitting Emoji Character Proposals”, Unicode.Org, 2018. 
Charlie Wurzel, “One Woman’s Bizarre, Delightful Quest To Change Emojis Forever”, BuzzFeed News, 2016. 
Mark Bramhill, “Person in Lotus Position”, 99% Invisible, 2017. 
Taco Bell, “The Taco Emoji Needs to Happen”, change.Org, 2014. 
Durex Global, “#BreakingNews: We’re Launching an Exciting New Savoury #condom Range - Eggplant Flavour! ? #CondomEmoji”, Twitter, September–2016. 
Sam Jones, “’A Symbol of Our land’: Catalan Group Pitches WhatsApp porrón Emoji”, The Guardian, August–2016. 
“Wales Flag Emoji Finally Arrives on Twitter”, BBC News, 2017. 
Sarah Miller, “The Secret History of the Paella Emoji”, Food & Wine, 2017. 
Jeremy Burge, “Google Updates Gun Emoji”, Emojipedia Blog, 2018. 
Jeremy Burge, “Samsung Puts Japan Back on the Map”, Emojipedia Blog, 2017. 
John-Michael Bond, “You May Be Accidentally Sending Friends a Hairy Heart Emoji”, Engadget, 2014. 
“Usage Statistics of Character Encodings for Websites”, W3Techs, 2018. 
Jeremy Burge, “Issue 26 — Emoji Action Season”, Emoji Wrap, 2018. 
“Drunk? Anaesthetised? Or Just Seen Your Bank Balance? – What the New Woozy Emoji Really Means”, The Guardian, November–2018. 
Amelia Heathman, “World Emoji Day 2018: First Look at New Apple Emoji in IOS 12 Update”, Evening Standard, October–2018. 
The University of California, Berkeley, also stumps up the cash for a vote, albeit with an educational discount. Its involvement centres around the work of the Script Encoding Initiative, an ongoing project to encode scripts that are not yet in the Unicode standard.4 
By convention, Unicode character names are given in upper case. 
Full disclosure: I’ve contributed to two episodes of 99% Invisible
WhatsApp duly declined, citing their inability to do actually do anything about the request. 
I am trying to make this A Thing. 

10 comments on “Emoji, part 4: who owns emoji?

  1. Comment posted by Screwtape on

    My understanding is that it’s a bit unfair to claim Unicode as Western-centric because of the Unicode consortium. While the consortium does control Unicode, the ISO 10646 Working Group also controls Unicode, and working under the auspices of ISO (and thus, indirectly under the auspices of the United Nations) brings all the international and diplomatic considerations one might reasonably expect. National governments generally don’t need to join the consortium to have their voices heard, they can have their say through ISO.

    1. Comment posted by Keith Houston on

      Hi Screwtape — thanks for the comment! That’s all true, and I thank you for explaining it so concisely.

      What I perhaps failed to get across in the article is that even today, outsiders watching the Unicode Consortium at work often seem to come away with the impression that it isn’t as representative as it might be.

      Separately, my understanding (which may well be faulty) is that Unicode essentially lead on emoji. It was Mark Davis who first championed emoji at Google’s request, and it was under the Unicode banner that much of the ensuing discussion took place. The organisation that drives emoji forward is the same one that has been criticised for a lack of diversity. That said, I will admit that I don’t understand the interplay between ISO/IEC 10646 and the Unicode Consortium as well as I might. I’d be happy to be educated!

  2. Comment posted by John Cowan on

    I realize that your focus is on emoji, but I feel it necessary to correct two major misemphases in your introduction.

    First, the Unicode Consortium is not the sole master of the Universal Character Set, also known as Unicode. There is also the contribution of ISO, the International Organisation for Standards, to the process. I admit that “ISO/IEC JTC1/SG2/WG2” is not a very sexy name, so its very existence rarely gets mentioned in popular articles, but not a single character, including emoji, gets in the standard without being approved by both the Consortium and the Working Group through their separate processes.

    ISO working groups are made up of representatives of the standards organizations of any country that has one (as well as invited experts), and WG2 sees attendance and participation from Canada, Ireland, China, Japan, and 23 other countries, with all countries having one vote. This substantially counterbalances any perceived inequities in Unicode Consortium membership. So while the Consortium does the work of collecting data on emoji, it does not by itself have the final say on whether they are put into the standard.

    Second, the unification of Han characters is and always has been in the hands of the countries who actually use them: it is the furthest thing possible from a Western plot. The Ideographic Rapporteur Group is a sub-group of WG2, with national members from China, Japan, North Korea, South Korea, Singapore, Vietnam (which used to use Han characters and has many historical documents written in them), Hong Kong, Macau, and the Taipei Computer Association (a proxy for Taiwan). The only Western countries regularly participating are the UK and the U.S. (partly as a proxy for the Consortium). Both WG2 and the Consortium depend on the IRG for all new Han characters added to the standard.

    1. Comment posted by Keith Houston on

      Hi John — thanks for the comment!

      As I mention in my response to Screwtape’s comment, I will admit that I don’t understand the interplay between ISO/IEC 10646 and the Unicode Consortium as well as I might. That said, from my limited understanding so far it seems to me that Unicode is very much driving the bus on emoji. It was Mark Davis who first championed emoji at Google’s request, and it is under the Unicode banner that much of the ensuing discussion continues to take place. I note in particular that in Unicode 8.0 the consortium included a large number of emoji before ISO/IEC 10646 had voted on them.

      On CJK unification, I’m happy to defer to you. Thanks for pointing out the inaccuracy! I’ll modify the post accordingly.

      Thanks again for the comment!

  3. Comment posted by John Cowan on

    Thanks for the reply. The fast-tracked emoji you mention are the gender and skin-tone alternatives to existing emoji, which were felt to be extremely urgent because they were causing extremely bad publicity. WG2 had already agreed to them, and it was just the grindingly slow ISO publication process that was keeping them out of ISO 10646. Eventually of course the ISO standard caught up.

    The CJK characters are those officially described as “urgently needed”, and genuinely new characters like the names of newly discovered chemical elements and the character for the upcoming Japanese era (which has not been created yet, but an agreed-upon spot is needed for it). The Iranian rial sign, and before that the euro sign, got the same expedited treatment, although in the case of the euro sign WG2 was actually ahead of Unicode, which had to issue a new release with just one new character.

    1. Comment posted by Keith Houston on

      Hi John — that clears things up for me. Thanks!

      Aside from the rubber-stamping applied by WG2, is there a purpose to the two-track system? It seems odd that both sides can choose to unilaterally advance their respective version of Unicode before the other side has ratified the change.

    2. Comment posted by John Cowan on

      It’s important to distinguish between the process of decision and the process of publication. The Unicode Consortium has only one decision-making body, its Technical Committee, and after it decides, only editorial work remains. ISO is inherently more complex: when WG2 decides (which, believe me, is no rubber stamp), then SC2 must approve (a rubber stamp indeed, as WG2 is the only working group of SC2, but it takes time), and then JTC1 must approve (another rubber stamp, taking even more time), and then editorial work is done by ISO’s publication arm (often very backed up with other work).

      So it was faster and easier for these urgently needed characters to be published by Unicode before ISO’s comparatively clunky publication process could get around to them. Nevertheless, WG2 as the decision-maker for ISO 10646 had already agreed to the identities, names, and code points of these characters before Unicode published them.

      The origin of the dual structure is historical (the two forces originally were going to have two separate universal character encodings before their efforts were merged as Unicode 1.1 and ISO 10646 1st edition), but it seems to work well, for the same kinds of reasons that bicameral legislatures can work well. Two different groups of representatives, consisting for the most part of different people, and representing two different kinds of organizations (mostly companies for Unicode, exclusively countries for WG2), must agree that a character is important and useful enough to be encoded forever (no character is never removed from the standard, even if it turns out to be encoded in error). That helps protect against the kinds of mistakes that human beings are naturally prone to. I think everyone involved in the process agrees that it is a Good Thing. And if WG2 mostly defers to Unicode on emoji, Unicode mostly defers to WG2 on ideographs, without either group giving up their powers of independent judgement. Characters of other kinds can and do originate on either side, and are reviewed simultaneously, as each has access (as does the public) to the other’s working documents.

    3. Comment posted by Keith Houston on

      Hi John — that is very informative. Thanks! Your bicameral parliament metaphor drives it home, and I will almost certainly half-inch it for use in the future (with due credit given, of course).

    1. Comment posted by Keith Houston on

      Hi Marion — well, consider me educated. Thanks for the comment, and I’m glad you’re enjoying the site!

Leave a comment

Required fields are marked *. Your email address will not be published. If you prefer to contact me privately, please see the Contact page.

Leave a blank line for a new paragraph. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>. Learn how your com­ment data is pro­cessed.