What is Unicode? Part II

By Mark Ward

September 8, 2008

Share this article

In Part I of “What is Unicode?” we learned why Unicode will help you, my target audience, type Greek and Hebrew.

Now let’s get more specific.

When you hit the “a” key on your keyboard, your computer tells Microsoft Word, let’s say, to output not “a” but “0041.” Then Word checks which font you’re using and supplies the character that font ties to “0041.” In Arial, that’s “a.” But in the BibleWorks Greek font, that’s alpha (“α”).

So Unicode decided to give alpha it’s own code number, which happens to be “03B1.”

But you don’t want just alpha. You want all the characters possible when taking into account the various diacritical markings alpha can have: smooth and rough breathers, acute and grave accents, and all their possible combinations. So Unicode has provided this. Here are the various characters available in the “alpha” part of the Greek table (plus beta, gamma, and delta at the end, for good measure):

Picture 3.png

α is 03B1, but ἀ is 1F00, ἁ is 1F01, and ἂ is 1F02. Each character has its own code. And alpha is always 03B1 no matter what computer you’re on or which Unicode font you’re using. That’s because Unicode is an international standard.

Why all these separate characters? One simple benefit comes when I try to select a whole Greek word in a word-processing program. With the old-way-of-doing-things, so many punctuation marks were secretly gunking up my Greek word—in order for me to have diacritics—that I couldn’t double-click on a word to select it. Word would think I was only wanting the portion up to the first punctuation mark.

In Part III of this series we’ll talk about how to install unicode for Mac or Windows.

In Part IV we’ll talk about how to type in it—how to get everything from α to ἀ to ά to ᾶ to ἁ to ᾳ.