Ghost Kanji: The Lore of Unicode and the 12 Uncanny Characters Without a Meaning
by Danyson and Elgin
Have you ever wondered how we are able to input just about any character, in any language, into a computer? The answer lies in Unicode. Unicode is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all the world’s writing systems that can be digitized. Every character and symbol you see on a computer has a unique 16-bit code point – meaning that all characters and symbols can be represented using 16 binary digits (bits), and different combinations of the 16 bits correspond to different characters, in fact, almost 1 million of them!
The need for Unicode traces back to a time when different computers often relied on different encoding systems for the same characters, resulting in text displayed on one being rendered as gibberish on the other – a phenomenon known as mojibake. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
A vast library of over 150,000 characters and code points, Unicode includes and encodes anything you can imagine, from basic Latin characters to the most obscure 15th-century Russian Orthodox musical notation or old indigenous characters. Even if a character is only mentioned once in an archaic text and is otherwise unknown to the rest of the world, Unicode will ensure that character makes its way into our digital database.
Han characters in Unicode are called CJK unified ideographs, as they are shared between Chinese (Han4 zi4), Japanese (Kanji) and Korean (Hanja). The earliest writing systems of Japan and Korea were Chinese characters spread from mainland China. Over thousands of years, the number of distinct characters in these systems, subsequently encoded in Unicode, has grown to a staggering 97,680. This encompasses far more than the 2,000-3,000 known by a native Japanese speaker or 5,000-8,000 known by a native Chinese speaker.
It makes sense, then, that thousands of CJK characters are completely unknown to the general population. But their sources can be traced – to imperial texts, literature works, or personal names, for example. However, within CJK characters themselves, twelve characters stand out. Their meanings and sources are entirely unknown, making them the subject of much speculation and lore. They lie within the 6,355 characters of the JIS X 0208 (JIS = Japanese Industrial Standard) block:
墸 壥 妛 挧 暃 椦 槞 袮 閠 蟐 駲 彁
So, what is the backstory behind them?
The JIS X 0208 characters themselves were sourced from several tables of Japanese kanji from the 1970s. Encodings were made with a policy that the source had to be trusted, and all characters, once encoded, will be permanently a part of Unicode. With such a vast collection of CJK characters out there, there are bound to be “ghosts” like the 12 listed characters – no source, no meaning, and no reason to exist.
However, theories have been formulated. Comparing characters similar in shape to the 12 ghost ones, one can deduce that they rose out of misspellings or mis-readings. 袮 is likely a misinterpretation of 祢 (ancestral shrine, mausoleum) and 閠 is likely a corruption of 閏 (intercalary/leap year), whereas 妛 is a possible typo of 𡚴 (akebi, used in place names)
Despite that, not all of them can be traced to misspellings. For some of them, it is likely that their meanings have been forever lost in time, never to be known again. These characters that continue to be passed down through computer software are mere remnants, shells of morphemes which once had presumably widespread tangible meaning and usage.
Fictional literature and other types of media have helped expand on the uncanny lore of these characters, giving them new meanings. 5A73, a Japanese murder mystery uses one of these “ghosts”, 暃, as a clue given by a serial killer who scrawls it on bodies. More recently, the analog horror film “漢字.mp4” sheds light on these characters as paranormal spirits, cursed and to be avoided at all costs.
As long as Unicode’s CJK characters persist in the digital realm, they will endure – timeless and enigmatic – untethered from their origins or meaning, shrouded in a web of captivating mystery and perplexing lore.
https://www.youtube.com/watch?v=tfk3dgpAals
LikeLike