A post from Shady Characters

Miscellany № 87: a coronavirus conundrum

In the midst of the ongoing SARS-CoV-2 pandemic, Twitter user @talkporty* got in touch to ask:

Dear @shadychars, you are the only one I can turn to in this situation. I am being hounded by EM-DASHES! Help! How has #covidー19uk become a trending hashtag? Nobody types them in.

And crazier still: paste it in and it seems it is [the] chōonpu symbol instead!1

It isn’t often that international health crises and punctuation intersect, but these are the times in which we find ourselves.


First things first: I tapped on the #covidー19uk hashtag to discover that is indeed it a valid hashtag, and also that a lot of people are using it. Next, I did some copy-and-pasting of my own to confirm that “ー”, a character that looks very much an em dash, is not, in fact, an em dash. As @TalkPorty said, it is the chōonpuAKA the “long sound symbol”,2 AKA Unicode’s KATAKANA-HIRAGANA PRO­LONGED SOUND MARK3 — a dash-like mark used to indicate long vowels in Japan’s katakana and hiragana syllabaries.

How did a dash-like non-dash end up in one of the most common hashtags at a time of global crisis?

At first, I assumed that it must have been a cut-and-paste error. As @TalkPorty suggested, rare is the person who knows how to enter an em dash on the computer’s keyboard, and I wondered if the hashtag’s original creator had perhaps browsed a list of Unicode characters until they found a likely-looking candidate. But that didn’t seem entirely plausible: you have to stray pretty far from Unicode’s Latin alphabet and its accompanying marks before you reach katakana and its punctuation. The idea that this was accidental, or coincidental, didn’t quite fit.

Next, I wondered if it could have been a typo caused by a smartphone’s software keyboard. Perhaps a Twitter user hunting for an em dash alighted on a visually similar mark by mistake. Probably not, I thought, for the same reason as before: if you have your phone set to display a QWERTY keyboard for a Western alphabet, you almost certainly won’t have ready access to the chōonpu. It takes a deliberate effort to switch languages and go hunting to find one.

In summary, someone must have chosen this character deliberately, though I was none the wiser as to why they had done so. In the end, I blundered into what I thought was a plausible solution. Here are my original tweeted replies to @TalkPorty:

Well, that’s weird. I can only imagine that some cut-and-paste or soft keyboard error has gone viral (sorry) along with the hashtag.

If I tap on the hashtag in Twitter’s Android app, I’m given the option to compose a tweet containing that same hashtag. I’d imagine that’s how it’s spreading. #covid-19uk (with a hyphen-minus) is also doing fairly well.

Wait! I lie. #covid-19uk isn’t a valid hashtag! Presumably someone has figured out that KATAKANA-HIRAGANA PROLONGED SOUND MARK can be used to “hyphenate” rather than break apart a hashtag. Very clever.

In other words, it seemed very much as if some savvy tweeter had used the chōonpu — a character that looks like a dash but works more like a letter — to construct a hyphenated term that sneaked past Twitter’s rules for valid hashtags. I left it at that.


As I was writing this post, though, I couldn’t help but wonder why the chōonpu in particular had been used. I’m not a Unicode expert, but it seemed unlikely that there was only one dash-like character among its 143,000 code points that could have been used to pull off this piece of hashtag hacking. Why did this Japanese mark end up in a hashtag otherwise comprised of Latin characters?

Google Japanese keyboard
Google’s Japanese keyboard on an Android phone. The chōonpu is on the right, immediately above the backspace key.

Now, some Japanese computer keyboards have a QWERTY layout, where roman letters are mapped to Japanese symbols in a system called romaji, and so, out of curiosity, I installed Google’s Japanese keyboard4 on my Android phone and took a look at its romaji mode. Right there, beside the ‘L’, was a chōonpu. Could #covidー19uk have been created by a native Japanese speaker with access to a romaji keyboard?

To find out, I used the Who Tweeted it First search engine to search for both #covidー19uk and #covidー19. The latter turned up the earliest tweet by over a fortnight, posted on the 11th of February:

なるほど、翌年のバージョンアップ(変異)にも対応できるのですね。
病名に続きそうな名前つけてしまうなんて、こういうのは普通なの?
#コロナウイルス
#COVIDー195

Well, what do you know? The earliest tweet to use the chōonpu in a #covidー19 hashtag was posted by a Japanese speaker with the Twitter username @spreadnewsxxx. Google’s mostly intelligible translation is as follows:

Indeed, it can handle version upgrades (mutations) the following year. Is it normal to give a name that seems to follow the disease name?
#コロナウイルス
#COVIDー19

Twitter reports that @spreadnewsxxx posted their tweet with an iPhone, whose Japanese keyboard I’m not familiar with, and so I can’t know whether they used the chōonpu for convenience or for its aforementioned hashtag friendliness. Either way, we now have a Patient Zero, if you’ll forgive the expression, in the form of the first use of the hashtag that has been plaguing @TalkPorty. The mystery is solved, or at least diminished.

I’d love to know if any readers have encountered the chōonpu in non-Japanese texts. Is this a common usage? Are its Twitter-defying powers commonly known? Drop me a line in the comments below!

1.

 

2.
Wiktionary. “ー”.

 

3.
FileFormat.info. “Unicode Character ’KATAKANA-HIRAGANA PROLONGED SOUND MARK’ (U+30FC).”

 

4.

 

5.

 

*
For the curious, Talk Porty is a community discussion forum based in Portobello, a suburb of my old home town of Edinburgh. 

11 comments on “Miscellany № 87: a coronavirus conundrum

  1. Comment posted by timotarou on

    To my knowledge, most JP-input users on iPhone use the Japanese Kana keyboard (somewhat replicating the touch pad of a telephone), not a qwerty layout. When typing in Latin characters or Arabic numerals this way, there’s a little extra effort involved in selecting the chōonpu, so I’d have to guess this is a combination of trying to sneak past Twitter’s hashtag rules with a familiar character not far out of reach.

    1. Comment posted by Keith Houston on

      Interesting! Thanks for the comment. I did come across an article at Japanese Level Up that pointed to a 2014 study that suggested smartphone users were moving away from romaji and towards the 12-key/kana input method.

  2. Comment posted by unekdoud on

    There’s some commentary on Language Log about this, landing on the same hypothesis that the character was deliberately choosen to sneak into a hashtag.

    And once you know where to look you can find it without even touching your keyboard: any time a webpage includes “page”, its Japanese version is likely to have ページ.

    1. Comment posted by Keith Houston on

      Thanks for the link! I also found Language Log’s post, although only after I’d written the bulk of this one. I’m glad we’ve come to the same conclusion.

  3. Comment posted by Kozmo IGM Kliegl on

    For a Windows PC, just need to hold down the Alt key while entering ‘0150’ in the numeric keypad to get ‘–’ (provided unit’s keyboard has one).

    1. Comment posted by Kozmo IGM Kliegl on

      Error: it’s ‘0151’ for a em-dash ( — ), what I posted above was code for an a en-dash

  4. Comment posted by Kevan Pegley on

    “I’m not an Uni­code ex­pert…”

    “an Unicode”??
    “a Unicode” surely?

    Interesting article nonetheless. Thanks

  5. Comment posted by Ben Karlin on

    “Hasht- ag” end of line 3 of the paragraph beginning “At first, I assumed it must have been…”

    I know better than to try holding you responsible for every line ending in every format on every screen but, Good God, man, this is an iPad, not some obscure or antiquated Blackberry or Android device!

    You must start peppering your prose with abundant discretionary hyphens in future to prevent this in future. If I may assist by reviewing your archive, please let me know.

    1. Comment posted by Keith Houston on

      Hi Ben — thanks for the comment!

      shadycharacters.co.uk already uses discretionary hyphens. They’re applied automatically by the software that generates each page. If you copy and paste that same paragraph into a text editor, you should see them appear.

      There are at least two problems with hyphenation on the web. First is that is has to account for every possible text block width from very narrow up the maximum allowed value. There’s no single text flow to be carefully tweaked, as there is in a book or newspaper, so so manual hyphenation is not practical. Next is that browsers’ automatic hyphenation is generally terrible. Either they don’t hyphenate at all, or they don’t do it very well.

      Given all that, this site applies soft hyphens using a variation of an algorithm devised by Frank Liang for the TeX text formatting program. It isn’t perfect, as you’ve seen, but it’s a reasonable compromise between the occasional ugly hyphen and no hyphenation at all.

Leave a Reply to Ben Karlin Cancel reply

Required fields are marked *. Your email address will not be published. If you prefer to contact me privately, please see the Contact page.

Leave a blank line for a new paragraph. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>. Learn how your com­ment data is pro­cessed.