Emoji: the future of text?


It started with a heart, so the story goes. Emoji’s founding myth tells that telecoms operator NTT DOCOMO, at the height of Japan’s pager boom of the 1990s, removed a popular ‘♥️’ icon from their pagers to make room for business-oriented symbols such as kanji and the Latin alphabet. Stung by a backlash from their customers, in 1999 NTT invented emoji as compensation, and the rest is history.

Only, not quite. It is true that in 1999, NTT used emoji to jazz up their nascent mobile internet service, but emoji had been created some years earlier by a rival mobile network. It was only in 2019, twenty years after emoji’s supposed birth at NTT, that the truth came out.

Does it matter, when considering the structure and transmission of text, that for two decades our understanding of emoji’s history was wrong? Safe to say that it does not, but it serves as a salutary reminder for those who care about such things that emoji are slippery customers. And now, having colonised SMS messages and social media, blogs and books, court filings and comic books, these uniquely challenging characters can no longer be ignored.


As with essentially all modern digital text encodings, emoji lie within the purview of the Unicode Consortium. Almost by accident, what was once a head-down, unhurried organisation now finds itself to be responsible for one of the most visible symbols of online discourse. And, unlike the scripts with which Unicode has traditionally concerned itself, emoji are positively alive with change. Almost from the very beginning – that being 2007, when Google and Unicode standardised Japan’s divergent emoji sets for use in Gmail – Unicode has been on the receiving end of countless requests for new emoji, or variations on existing symbols. (Of note has been a commendable and ongoing drive to improve emoji’s representation of gender, ethnicity and religious practices.) Thus “emoji season” was born, that time of the year when Unicode’s annual update has journalists and bloggers scouring code charts for new emoji.

And therein lies a problem: emoji updates are so frequent, and so comprehensive, that it is by no means certain that the reader of any given digital text possesses a device that can render it faithfully. The appearance of placeholder characters – ‘☒’, colloquially called “tofu” – is not uncommon, especially in the wake of emoji season as computing devices await software upgrades to bring them up to date. Smartphones, which rely on the generosity of their manufacturers for such updates, are worst off: a typical smartphone will fall off the upgrade wagon two or three years after it first goes on sale, so that there is a long tail of devices that are perpetually stranded in bygone emoji worlds.


If missing emoji are at least obvious to the reader, the problem of misleading emoji is not. Although Unicode defines code points for all emoji, the consortium does not specify a standard visual appearance for them. It suggests, yes, but it does not insist. As such, Google, Apple, Facebook and other emoji vendors have each crafted their own interpretations of Unicode’s sample symbols, but those interpretations do not always agree. As such, in choosing an emoji, the writer of a text may inadvertently select a quite different icon than the one that is ultimately displayed to their correspondent.

Consider the pistol emoji (🔫), which, at different times and on different platforms, has been displayed as a modern handgun, a flintlock pistol, and a sci-fi ray-gun. (Only now is a consensus emerging that a harmless water pistol is the most appropriate design.) Or that for many years, smartphones running Google’s Android operating system displayed the “yellow heart” emoji (💛) as a hairy pink heart — the result of a radical misinterpretation of Unicode’s halftone exemplar — that was at odds with every other vendor’s design.

These are isolated cases, to be sure, but it is perhaps more concerning that Samsung, undisputed champion of the smartphone market, once took emoji noncomformity to new height. Prior to its most recent operating system update, Samsung’s emoji keyboards sported purple owls, rather than the brown species native to other devices (🦉); savoury crackers rather than sweet cookies (🍪); Korean flags rather than Japanese (🎌); and many other idiosyncrasies.

Today, most vendors are gradually harmonising their respective emoji, while still preserving their individual styles. (Samsung, too, has toned down its more outlandish deviations from the norm.) But although the likelihood of misunderstandings is diminished, it is still impossible to be sure that reader and writer are on the same page: with emoji, the medium may yet betray the message.


Finally, and as absurd as it sounds, is the prospect of emoji censorship. From 2016 to 2019, for example, Samsung devices did not display the Latin cross (✝️) or the star and crescent (☪️). These omissions had mundane technical explanations, but it is not difficult to imagine more sinister motives for suppressing such culturally significant symbols. In fact, one need not look far to find a genuinely troubling case. Starting in 2017, Apple modified its iOS software at China’s behest so that devices sold on mainland China would not display the Taiwanese flag emoji (🇹🇼). At the time of writing, as protests against Chinese rule rock Hong Kong, ‘🇹🇼’ has disappeared from onscreen keyboards there, too.

In this there are echoes of Amazon’s notorious deletion of George Orwell’s 1984 from some users’ Kindles because of a copyright dispute. A missing emoji might seem like small fry by comparison, but it is every bit as Orwellian: is a text written in this time of crisis devoid of Taiwanese flags because the writer did not use that emoji, or because it had been withheld from them? The case of the missing ‘🇹🇼’ shows how emoji, often derided as a frivolous distraction from “real” writing, can be every bit as vital as our letters and words. We owe it to them to treat them with respect.


Miscellany № 89: 2020, year of the asterisk

The asterisk is old. Really old. Granted, it is not 5,000 years old, as Robert Bringhurst claims in the otherwise impeccable Elements of Typographic Style1 (Bringhurst confuses it with a star-like cuneiform mark that represents “deity” or “heaven”2), but it has more than two millennia under its belt nonetheless. I go into greater detail in the Shady Characters book, but the abridged version of the asterisk’s origin story goes something like this.


In the third century BCE, at Alexandria in Egypt, a librarian named Zenodotus was was struggling to edit the works of Homer into something approaching their original form. I say a librarian, but really Zenodotus was the librarian, the first in a long line to be employed at Alexandria by the Ptolemaic pharaohs.3 Many spurious additions, deletions and alterations had been made to the Odyssey and Iliad since the time of their composition, but Zenodotus lacked the tools to deal with them. As such, he started drawing a short dash (—) in the margin beside each line he considered to be superfluous, and, in doing so, inaugurated the field of literary criticism.4 Named the obelos, or “roasting spit”, in the seventh century Isidore of Seville captured the essence of Zenodotus’s mark when he wrote that “like an arrow, it slays the superfluous and pierces the false”.5

The asterisk, in turn, was created by one of Zenodotus’s successors. In the second century BCE, Aristarchus of Samothrace introduced an array of new critical symbols: the diple (>) called out noteworthy features in the text; the diple periestigmene (⸖) marked lines where Aristarchus disagreed with Zenodotus’s edits; and, finally, the asteriskos (※), or “little star”, denoted duplicate lines.6,7 Occasionally, Aristarchus paired an asterisk and obelus to indicate lines that belonged elsewhere in the poem.8

Thus the asterisk was born. And right from the beginning, it came with a warning: a text with an asterisk attached to it is not the whole story.


Having survived the intervening millennia with its visual form largely intact, by the medieval period the asterisk had moved into a new role as an “anchor” for readers’ notes: where a reader wanted to link a note scribbled in the margin to a particular passage in the text, a pair of asterisks would do the trick. Later, in printed books, authors used the asterisk to call out their own asides.9

By the twentieth century, the asterisk had become the de facto leader of the footnote clan. In 1953, a lexicographer named Eric Partridge explained that “the following are often used”: ‘*’, ‘†’, ‘**’, ‘‡’ or ‘††’, ‘***’ or ‘’ or ‘⁂’, and finally ‘†††’.10 Things have calmed down a little since Partridge’s time, but ‘*’, ‘†’, and ‘‡’ are still relatively common and even ‘§’, ‘||’ and ‘¶’ appear on occasion. Should a writer’s penchant for footnotes extend past five or six per page, lettered or numbered notes may be a better option and, indeed, the frequency of typographic footnote markers does seem to have waned over the past few decades.


Yet even as the asterisk is used less often as a footnote marker, its implied meaning — that there is more here than meets the eye — is as strong as ever. For American newspapers, merely to use the word “asterisk” is to tarnish its subject by association; for American sports writers, doubly so.

It all goes back to 1961, and a baseball establishment unwilling to see one of its all-time greats toppled from his pedestal. That year, Roger Maris of the New York Yankees had beaten George Herman “Babe” Ruth’s record of 60 home runs in a single season — but Maris’s record-breaking season had been eight games longer than Ruth’s record-setting 1927 season. Baseball commissioner Ford Frick announced that:

Any player who has hit more than 60 home runs during his club’s first 154 games would be recognized as having established a new record. However, if the player does not hit more than 60 until after this club has played 154 games, there would have to be some distinctive mark on the record books to show that Babe Ruth’s record was set under a 154-game schedule.

Reporter Dick Young of the New York Daily News is said to have suggested that “Maybe you should use an asterisk on the new record. Everybody does that when there’s a difference of opinion.” An asterisk: a little star to diminish Maris’s brilliance on the diamond. Young’s asterisk was never actually employed, but for many years baseball almanacs carried both Maris’s and Ruth’s records side by side.11

Since Maris’s time, the asterisk has become the go-to metaphor for sports writers seeking to hedge some apparently remarkable achievement or another. In the early 2000s, Barry Bonds, one of baseball’s all-time greats, was awarded a plethora of asterisks in the wake of a doping scandal (“Tarnished Records Deserve an Asterisk”;12 “An Asterisk Is Very Real, Even When It’s Not”13). Lance Armstrong, another era-defining athlete, was pelted with asterisks after his own doping revelations (“Armstrong, best of his time, now with an asterisk”;14 “Armstrong: an era of asterisks*”15). The sporting asterisk travels, too: Mo Farah, one of Britain’s most celebrated athletes, has faced questions about his relationship with a disgraced sports doctor (“Sir Mo Farah’s link to a notorious doper leaves an asterisk next to his name”16).

Less often, the asterisk makes itself felt in the news proper. The Boston Globe reported George W. Bush’s contentious victory in the 2000 US presidential election with the headline “Bush Wins Election*”, accompanying it with a subtitle that read “*Pending Gore Challenges, Possible Supreme Court Ruling”.17 More recently, the controversial appointments of Brett Kavanaugh and Amy Coney Barrett to the US Supreme Court have both attracted asterisks (“Hirono: Kavanaugh’s SCOTUS seat has ‘big asterisk’”;18 “Welcome, Justice Barrett. Now here’s your asterisk”19). And, needless to say, the president who made those two appointments found himself labelled with an asterisk of his own on the occasion of his impeachment in 2019 (“Now Trump’s legacy bears an asterisk of shame”20). Who’s to say he won’t attract a few more before the 20th of January next year?


But that was then, and this is now. In the shadow of the coronavirus pandemic that continues to rage across the globe, the asterisk has been promoted to the top shelf of the sub-editor’s toolbox and, as a result, headlines on both the back pages and the front are suffering from a rash of little stars. It seemed remiss to let this go without remark, so I present to you a lightly annotated and extremely partial survey of 2020’s asterisk-bearing headlines. Enjoy, and please add your own examples in the comments!

Sports news in the USA

Other news in the USA

Sports news outside the USA

Other news outside the USA

Pre-2020 bonus asterisks


As the little stars continue to roll in, please do take care of yourself. Remember: in 2020, you only have one asterisk.

1.
Bringhurst, Robert. “Asterisk”. In The Elements of Typographic Style : Version 3.2, 303+. Hartley and Marks, Publishers, 2008.

 

2.
Kramer, Samuel Noah. “The Origin and Development of the Cuneiform System of Writing”. In The Sumerians : Their History, Culture, and Character, 302-304. University of Chicago Press, 1963.

 

3.
Smith, William. “Zenodotus”. In Dictionary of Greek and Roman Biography and Mythology., 951+. C.C. Little and J. Brown; [etc., etc.], 1849.

 

4.
Pfeiffer, Rudolf. “Zenodotus and His Contemporaries”. In History of Classical Scholarship from the Beginnings to the End of the Hellenistic Age, 105-122. Clarendon, 1968.

 

5.
, and Stephen A. Barney. “Punctuated Clauses (De Posituris)”. In The Etymologies of Isidore of Seville. Cambridge University Press, 2006.

 

6.
Pfeiffer, Rudolf. “Aristarchus: The Art of Interpretation”. In History of Classical Scholarship from the Beginnings to the End of the Hellenistic Age, 210-233. Clarendon, 1968.

 

7.
OED Online. “Asterisk”.

 

8.
McNamee, Kathleen. “Sigla”. In Sigla and Select Marginalia in Greek Literary Papyri, 9+. Fondation Égyptologique Reine Élisabeth, 1992.

 

9.
Parkes, M. B. “The Technology of Printing and the Stabilization of the Symbols”. In Pause and Effect: Punctuation in the West, 50-64. University of California Press, 1993.

 

10.
Partridge, E. “Oddments”. In You Have a Point There: A Guide to Punctuation and Its Allies, 226+. Hamilton, 1953.

 

11.
Barra, Allen. “Roger Maris’s Misunderstood Quest to Break the Home Run Record”. The Atlantic Monthly Group, July 27, 2011.

 

12.
Wilbon, Michael. “Tarnished Records Deserve an Asterisk”. Washington Post.

 

13.

 

14.

 

15.
Arnold, Rob. “Armstrong: An Era of Asterisks*”. Ride Media.

 

16.

 

17.
Kranish, Michael, and Susan Milligan. “Bush Wins Election*”. Boston Globe.

 

18.

 

19.
Seeley, George. “Welcome, Justice Barrett. Now here’s Your Asterisk”. Boston Globe. October 29, 2020.

 

20.
Robinson, Eugene. “Now Trump’s Legacy Bears an Asterisk of Shame”. Washington Post. December 19, 2019.

 

Miscellany № 88½: come for the punctuation, stay for the street signs

My last post, where we took a look at Birmingham’s over-punctuated street signs, stirred up quite a bit of discussion. Rich Greenhill suggested that Birmingham’s commas-and-tilde motif could have come from an abbreviated medieval ‘a’ or, perhaps, “ditto” marks. H James Lucas wondered if the paired commas might be a single inverted comma, used by the ironmonger to save typesetting effort; Brian Inglis took the opposite tack and suggested the commas could have been added purely so the manufacturer could invoice for an extra couple of characters. And, on street signs in general, Korhomme pointed out Bern’s colour-coded street signs, imposed by Napoleon’s invading armies. Read about these and other ideas in the comments from last time round.

Two new readers have since been in touch with more examples of elaborately typeset abbreviations, and I couldn’t keep them to myself.


Dr Cathy Gale (@PlayMakeThink on Twitter, or visit her web site at playmakethink.com) sent in some examples of street signs in Margate, a seaside town in the south east of England. The first one shows what I’m going to call a Birmingham-style abbreviation for “Road”, complete with a dash and a pair of commas:

Street sign show "Road" abbreviated with a dash and two commas
A Margate street sign abbreviated with a dash and two commas. (Image courtesy of Dr Cathy Gale.)

Hold onto your hats, because Dr Gale’s next image has a similar style of abbreviation, only used for the word “Crescent” instead:

Street sign show "Road" abbreviated with a dash and two commas
A Margate street sign abbreviated with a dash and two commas. (Image courtesy of Dr Cathy Gale.)

The fact that “Crescent” doesn’t contain an ‘a’ would suggest that this style of abbreviation isn’t derived from an abbreviated medieval ‘a’, but that’s the only potential explanation I think we can rule out.


George Pollard (@porges on Twitter, or visit porg.es) sent in some examples of similar abbreviations in different contexts. This first one, a hand-painted plaque in Nantucket, Massachusetts, is dedicated to Captain Ahab’s real-life counterpart, also called George Pollard.

Plaque dedicated to Captain George Pollard
A plaque dedicated to Captain George Pollard. (Image courtesy of Wikimedia user “Le grand Cricri”.)

An unexpected intersection of punctuation and Moby Dick — does it get any better than this? And we get four abbreviations for the price of one, even if we make do without the traditional Birmingham dash or tilde.

George also pointed out a pair of uses of the abbreviation “Co.”, for “Company”, on two very different playing card decks. First is this Japanese hanafuda card deck box:

A deck of Japanese "hanafuda" playing cards
A deck of Japanese hanafuda playing cards, showing an abbreviation of the word “Company”. (Image courtesy of George Pollard.)
Playing card showing abbreviation for "Company"
A De La Rue playing card demonstrating an abbreviation of the word “Company”. (Image courtesy of George Pollard.)

Finally, we have this lovely ace of spades shown on an undated card printed by De La Rue of London. Thomas De La Rue got started in business in 1821, making straw hats; within a decade he had moved on to printing playing cards, and the Co. that now bears his name is best known for the high-tech printing of banknotes.


I must thank Dr Cathy Gale and George Pollard for sharing these images. I’m not sure I’ve learned anything more about why some abbreviations are typeset in the Birmingham style, but I’m glad I’ve had the chance to publish these fantastic pictures. Perhaps one of them has jogged your memory? If so, please leave a comment below!

Miscellany № 88: a tale of two signs

Cast-iron street sign for Harborne Road, Birmingham
Cast-iron street sign with an enthusiastic piece of typography for the “Rd.” abbreviation. (Image by the author.)

We moved from London to Birmingham a couple of years ago now, and one of the first things I noticed when we arrived were the street signs: extravagant, cast-iron behemoths far removed from London’s restrained licence plates for buildings. Above is a typical street sign in Edgbaston, our then-new neighbourhood; below is an old-style enamelled sign from Wandsworth, our previous one.

Street sign for Caithness Terrace, Tooting Bec
Street sign for Caithness Terrace, Tooting Bec. (CC BY-NC-ND 2.0 image by R~P~M on Flickr.)

Granted, Birmingham’s modern street signs, as used in much of the rest of the city, are significantly less interesting than the black-and-white battleship above, but then the same is also true of London. Birmingham once had standards to maintain; London didn’t.

Anyway, back to the two signs above: useful, legible both. But only the Brummie sign packs in an abbreviation, a tilde and two commas, all while bellowing “God save Queen Victoria!!1!!111” with foam-flecked lips, and for that it is my pick for the coveted Best Street Sign I Have Seen in the Past Two Years award.

Read more

Miscellany № 87: a coronavirus conundrum

In the midst of the ongoing SARS-CoV-2 pandemic, Twitter user @talkporty* got in touch to ask:

Dear @shadychars, you are the only one I can turn to in this situation. I am being hounded by EM-DASHES! Help! How has #covidー19uk become a trending hashtag? Nobody types them in.

And crazier still: paste it in and it seems it is [the] chōonpu symbol instead!1

It isn’t often that international health crises and punctuation intersect, but these are the times in which we find ourselves.


First things first: I tapped on the #covidー19uk hashtag to discover that is indeed it a valid hashtag, and also that a lot of people are using it. Next, I did some copy-and-pasting of my own to confirm that “ー”, a character that looks very much an em dash, is not, in fact, an em dash. As @TalkPorty said, it is the chōonpuAKA the “long sound symbol”,2 AKA Unicode’s KATAKANA-HIRAGANA PRO­LONGED SOUND MARK3 — a dash-like mark used to indicate long vowels in Japan’s katakana and hiragana syllabaries.

How did a dash-like non-dash end up in one of the most common hashtags at a time of global crisis?

At first, I assumed that it must have been a cut-and-paste error. As @TalkPorty suggested, rare is the person who knows how to enter an em dash on the computer’s keyboard, and I wondered if the hashtag’s original creator had perhaps browsed a list of Unicode characters until they found a likely-looking candidate. But that didn’t seem entirely plausible: you have to stray pretty far from Unicode’s Latin alphabet and its accompanying marks before you reach katakana and its punctuation. The idea that this was accidental, or coincidental, didn’t quite fit.

Next, I wondered if it could have been a typo caused by a smartphone’s software keyboard. Perhaps a Twitter user hunting for an em dash alighted on a visually similar mark by mistake. Probably not, I thought, for the same reason as before: if you have your phone set to display a QWERTY keyboard for a Western alphabet, you almost certainly won’t have ready access to the chōonpu. It takes a deliberate effort to switch languages and go hunting to find one.

In summary, someone must have chosen this character deliberately, though I was none the wiser as to why they had done so. In the end, I blundered into what I thought was a plausible solution. Here are my original tweeted replies to @TalkPorty:

Well, that’s weird. I can only imagine that some cut-and-paste or soft keyboard error has gone viral (sorry) along with the hashtag.

If I tap on the hashtag in Twitter’s Android app, I’m given the option to compose a tweet containing that same hashtag. I’d imagine that’s how it’s spreading. #covid-19uk (with a hyphen-minus) is also doing fairly well.

Wait! I lie. #covid-19uk isn’t a valid hashtag! Presumably someone has figured out that KATAKANA-HIRAGANA PROLONGED SOUND MARK can be used to “hyphenate” rather than break apart a hashtag. Very clever.

In other words, it seemed very much as if some savvy tweeter had used the chōonpu — a character that looks like a dash but works more like a letter — to construct a hyphenated term that sneaked past Twitter’s rules for valid hashtags. I left it at that.


As I was writing this post, though, I couldn’t help but wonder why the chōonpu in particular had been used. I’m not a Unicode expert, but it seemed unlikely that there was only one dash-like character among its 143,000 code points that could have been used to pull off this piece of hashtag hacking. Why did this Japanese mark end up in a hashtag otherwise comprised of Latin characters?

Google Japanese keyboard
Google’s Japanese keyboard on an Android phone. The chōonpu is on the right, immediately above the backspace key.

Now, some Japanese computer keyboards have a QWERTY layout, where roman letters are mapped to Japanese symbols in a system called romaji, and so, out of curiosity, I installed Google’s Japanese keyboard4 on my Android phone and took a look at its romaji mode. Right there, beside the ‘L’, was a chōonpu. Could #covidー19uk have been created by a native Japanese speaker with access to a romaji keyboard?

To find out, I used the Who Tweeted it First search engine to search for both #covidー19uk and #covidー19. The latter turned up the earliest tweet by over a fortnight, posted on the 11th of February:

なるほど、翌年のバージョンアップ(変異)にも対応できるのですね。
病名に続きそうな名前つけてしまうなんて、こういうのは普通なの?
#コロナウイルス
#COVIDー195

Well, what do you know? The earliest tweet to use the chōonpu in a #covidー19 hashtag was posted by a Japanese speaker with the Twitter username @spreadnewsxxx. Google’s mostly intelligible translation is as follows:

Indeed, it can handle version upgrades (mutations) the following year. Is it normal to give a name that seems to follow the disease name?
#コロナウイルス
#COVIDー19

Twitter reports that @spreadnewsxxx posted their tweet with an iPhone, whose Japanese keyboard I’m not familiar with, and so I can’t know whether they used the chōonpu for convenience or for its aforementioned hashtag friendliness. Either way, we now have a Patient Zero, if you’ll forgive the expression, in the form of the first use of the hashtag that has been plaguing @TalkPorty. The mystery is solved, or at least diminished.

I’d love to know if any readers have encountered the chōonpu in non-Japanese texts. Is this a common usage? Are its Twitter-defying powers commonly known? Drop me a line in the comments below!

1.

 

2.
Wiktionary. “ー”.

 

3.
FileFormat.info. “Unicode Character ’KATAKANA-HIRAGANA PROLONGED SOUND MARK’ (U+30FC).”

 

4.

 

5.

 

*
For the curious, Talk Porty is a community discussion forum based in Portobello, a suburb of my old home town of Edinburgh.