|
Character display problems? |
|
Are you getting ???? or
or
or Yíäýñíèé or other mojibake instead of the correct text for some languages? It's probably because your computer system can't display Unicode correctly. The good news is that most Unicode display problems can be fixed.
How do I fix Unicode display problems on my computer?
To display text in many different alphabets on one web page (e.g. Languages A-Z), we use Unicode, even though Unicode can create display problems for some computer systems. This web page offers solutions for those problems.
It may be that to "see" everything correctly on our Unicode pages, you only need to upgrade your browser and install, at most, one font, Code2000. Basically, you need:
- a Unicode compatible operating system (see Assistance: Introduction);
- a Unicode enabled browser (Assistance: Step 1); and
- Unicode-compatible font(s) (Assistance: Step 2);
and then you may need to (depending on which languages you want to display): - configure your browser (Assistance: Step 3 and Assistance: Step 4).
See also Display Problems? on the Unicode site and Help: Multilingual support on Wikipedia.
Definitions: What is Unicode? Encoding Code Language Script Font
- What is Unicode? It is one of several systems (called encodings) that have been developed for to manage the display of characters on-screen, but it is the first system that can assign a unique number (code) to every character in each of the world's major languages. (Other systems don't allow for enough characters and they also conflict with one another. That is, two encodings might use the same number for two different characters, or use different numbers for the same character.) Not all computer systems in current use are fully Unicode compatible.
- Encoding: a system of assigning numbers to characters (i.e. letters, punctuation, and mathematical notations) so a computer knows which character to display. Hundreds of different systems (encodings) have been developed and used. Unicode is one of them. Here are examples of how encodings are specified in the head of an html page:
- charset=iso-8859-1 (for Western No.1),
- charset=BIG5 (for Traditional Chinese), and
- charset=utf-8 (for Unicode).
- Code: the number assigned to the character. Problems happen when different encodings use the same code for two different characters, or use different codes for the same character. Synonyms for "code" that are also in use: code position, code number, code value, code element, code set value.
- Language Script: the group of characters used to express a language in writing. Also called the "character set" or "character repertoire" or "alphabet" or "writing system" of a language.
- Font: the font determines the way a character will actually look on the screen (or on a printed page). For instance, this "A" in a sans-serif font looks different than this "A" in a serif font, but it is still the same character. (The "A" and the "A" are known as different "glyphs" of the same character. A font is basically a collection of glyphs. Also note that "A" and "a" are two different characters.)
Most fonts don't come close to containing all possible characters in the world—instead they contain ranges (also called "blocks") of characters (e.g. in Unicode, the codes (i.e. numbers) for Arabic characters are found in the range of 0660 to 06FF). Unicode currently defines over 100 ranges, and for example, the newest, Unicode-compatible versions of:
- Arial (with 2792 characters and 3381 glyphs) and
- Times New Roman (with 2790 characters and 3380 glyphs),
contain only 39 ranges, while the:
- Akaash font (409 characters; 642 glyphs), specifically for Bengali,
is also Unicode-compatible, yet contains only 4 ranges: Basic Latin; Latin-1 Supplement; Latin Extended-A; and Bengali. - NOTE: "language script" and "range" are sometimes synonymous, but some languages require characters from more than one range and even non-contiguous ranges (e.g. Vietnamese, and especially CJK (Chinese-Japanese-Korean). CJK ideographs now encompass at least three ranges in two separate "planes" of Unicode.
- For more information, see also:
- Unicode Issues by Languagegeek,
- Unicode by Wikipedia,
- A tutorial on character code issues by Jukka "Yucca" Korpela
- A review of script characteristics affecting computer-based script support and Unicode by richard ishida, and
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Assistance: Introduction
Because Unicode is a relatively recent development that is not yet in widespread usage, and because surfers use a wide range of operating systems:
- Windows 95/98/ME/NT/2000/XP/Vista,
- Mac OS 9/OS X,
- Linux, etc.,
and a wide range of browsers:
- FireFox,
- Internet Explorer,
- Opera,
- Mozilla,
- Netscape,
- Safari, etc.,
not all computer systems are currently fully Unicode compatible.
- Some Unicode support has been included in Mac OS since Mac OS 8.5, but prior to Mac OS X (10) only limited use was been made of it by applications.
- Windows NT/2000/XP/Vista are based on Unicode, and some Unicode support has been included in Microsoft Windows since Windows 95.
- I've never used Linux.
If you have display problems with some of the links and/or text on our pages, you can try the steps set out below. My intention is to bring together, in one place, useful information I found when I was trying to figure out how to fix my own display problems, and to make that information as easy to understand as possible. Do keep in mind, though, that you don't have to understand everything here in order to get the hoped for results from carrying out the steps. Again, it may turn out that to "see" everything on our pages correctly, you may only need to upgrade your browser and install, at most, one font.
The suggestions I offer come from my experience using the following browsers and operating systems:
- with a Windows 98 operating system, I've used Netscape 4.79 and 7, Mozilla 1.2.1 and 1.3b, and Internet Explorer (IE) 5.5, and
- with a Windows XP operating system, I've used FireFox 2 & 3, Netscape 7 & 8, Mozilla 1.5 & 1.7, Opera 7 to 10, Safari 3 & 4 for Windows, and Internet Explorer (IE) 6, 7 & 8,
although I think my suggestions could be useful for those with Windows 2000, NT 4 and Vista, and maybe even Windows 95.
Because I only do Windows, the best I can offer those with other operating systems is to send you off-site to:
- Mac OS 9 (Browsers and Fonts),
- Mac OS X (Browsers and Fonts) and
- Unix/Linux (Browsers and Fonts) and to
- Help: Multilingual support on Wikipedia,
although some of what I say below may be applicable.
Assistance: Step-by-step
Step 0: You need a a Unicode compatible operating system (see Introduction above for information)
Step 1: Selecting a browser
Step 2: Obtaining Unicode compatible fonts
Step 3: Configuring your browser by selecting fonts
Step 4: Configuring your browser by selecting encodings
NOTE: Most encodings are still used somewhere on the web, and these steps can be applied to all encodings, not just Unicode. However, if you are interested in viewing pages in a different encoding, such as Big5 (for Traditional Chinese) for example, in Step 2 you would need to make sure you had Big5-compatible fonts, rather than Unicode-compatible fonts.
Step 1: Selecting a browser
NOTE: Upgrade your browser to the latest version: e.g. IE, FireFox, Opera, Safari, Chrome.
After I went through everything in Steps 1 to 4, and then browsed the Unicoded Hot Peach Pages and EarthWords pages:
- FireFox 2 & 3, Opera 8, 9 & 10 and Netscape 7 & 8, on Windows XP, display everything correctly. (To get conjuncts & re-ordering for Khmer to work properly, I found and installed KhmerUnicode2 for Window XP on April 22/10.)
- IE 5.5 (Win 98), and 6, 7 & 8 (Win XP) display everything correctly. (Again conjuncts & re-ordering for Khmer didn't work properly until I installed KhmerUnicode2 for Window XP on April 22/10.)
- Chrome 5 doesn't display Sinhala and has the same problem described above for IE in the Caveat.
- Netscape 7 (Win 98), Mozilla 1.2, 1.3 (Win 98), Mozilla 1.5, 1.7 (Win XP) and Opera 7.2 & 7.5 (Win XP) all displayed Arabic and Hebrew correctly right-to-left, but didn't produce conjuncts or re-ordering for Indic scripts.
- Netscape 4.79 (Win 98) and Opera 7.1 (Win XP) displayed Arabic and Hebrew incorrectly left-to-right and didn't produce conjuncts or re-ordering for Indic scripts.
- Safari:
- Safari 4 & 5 for Windows XP don't seem to support Amharic, Bengali, Sinhala, and Tigrigna, but seem to support conjuncts and re-ordering for Indic scripts, and also display connected Arabic font. Khmer was not displayed properly until I installed KhmerUnicode2 for Window XP on April 22/10. (There's also a Khmer Unicode MacOSX Lite 2 for MacOSX.)
- Safari 3 for Windows XP doesn't seem to support conjuncts or re-ordering for Indic scripts and also doesn't display connected Arabic font. Both of these problems may be because I have the Arial and Times New Roman fonts installed by Microsoft Office, as explained below for Safari 2 for Mac OS X.
- Safari 2 for Mac OS X supports conjuncts and re-ordering for Indic scripts (see Viewing Indic text at Wikipedia). There seems to be a problem with displaying connected Arabic font if you have Arial and Times New Roman fonts installed by Microsoft Office (this may also affect conjuncts or re-ordering for Indic scripts as well). Regarding the problem with Arabic scripts, see Apple Discussions:
- Apr 20/07: Arabic websites not turning out right
- Apr 28/07: connected Arabic language in Safari
- Jun 11/07: Arabic or Unicode Display
Caveat: On my current computer configuration (Win XP and IE 8), for the HTML <title> attribute, IE displays empty rectangles () for Amharic, Sinhala and Tigrigna (even though the text for the link itself displays fine), whereas Moz-based browsers and Opera display the title text correctly. To see if you have the same issue, go to Domestic violence is more than just physical abuse using IE, and hover over the Amharic and Tigrigna language links at the top of the page to make the title boxes pop-up. Let me know by email if you know how to fix it, or if you don't even have the issue in IE.
For more information about these and other browsers, go to:
Step 2: Obtaining Unicode compatible fonts
Make sure you have a Unicode-compatible font for either all the Unicode ranges, or for each of the language scripts you want to be able to display.
NOTE: To see what fonts you already have in your system, look in your Control Panel under Fonts. This will also give you the address of your FONTS file for when you want to intall a new font.
- Easiest: If you have:
- Arial Unicode MS* (with almost 39,000 characters and over 50,000 glyphs in 65 ranges) is supplied with Microsoft Office 2000 and later, FrontPage 2000 and later, Office XP and later, and Publisher 2002 and later (the only way to get Arial Unicode MS is to buy one of those Microsoft products); see also Description of the Arial Unicode MS
OR - Code2000* (over 50,000 characters and 60,000 glyphs in 105 ranges) is a free download, $5 honour-system registration,
you should be OK for most languages on our pages. In other words, to "see" everything on our pages, as I've said, you only need to upgrade your browser and install, at most, Code2000. Easy. (And the reason it's so easy, and inexpensive, is because James Kass worked on Code2000 for years as a labour of love and then basically gifted it to the world. James, you rock!)
*Note: Code2000 is OK in a pinch but not recommended for Chinese Simplified or Traditional, or for Japanese, and Arial Unicode MS is not OK for Lao (as of Office XP), but anyone who can read them probably already has appropriate fonts on their computer.
- Arial Unicode MS* (with almost 39,000 characters and over 50,000 glyphs in 65 ranges) is supplied with Microsoft Office 2000 and later, FrontPage 2000 and later, Office XP and later, and Publisher 2002 and later (the only way to get Arial Unicode MS is to buy one of those Microsoft products); see also Description of the Arial Unicode MS
- Extra work: Because fonts designed for just one particular language script often present that script better than fonts that contain several scripts, you may want to download further specific Unicode-compatible fonts for certain languages. On our EarthWords pages, for instance, we code a preference for the following fonts:
- Aboriginal Serif Unicode for Cree and Inuktitut,
- Aboriginal Serif for Dene,
- David for Hebrew,
- PakType Tehreer for Kurdish,
- Alice0 Unicode for Lao, and
- Urdu Nastaleeq Like for Urdu,
and we leave the rest up to the user's choices in Step 3, for which you need at least:
- Arial Unicode MS (supplied with Microsoft Office 2000 and later, FrontPage 2000 and later, Office XP and later, and Publisher 2002 and later; see also Description of the Arial Unicode MS) OR
- Code2000 (free download, $5 honour-system registration).
In other words, to "see" everything on our pages almost exactly the way we intended, you only need to upgrade your browser and install, at most, five or six fonts. No big deal. - Maximum effort: Because sites other than ours will prompt for fonts other than those mentioned above, you may want to download a whole whack of fonts. I suggest starting at Alan Wood's Unicode Resources*.
*NOTE: even though this page of Alan's is entitled "Unicode Fonts for Windows computers", it also has links for Mac and Unix.
*ALSO: Raghindi (listed on Alan's page under Devanagari Fonts) has been known to cause a conflict with other fonts on Windows 9x, including Code2000. It seems that many fonts produced for Windows 2000-and-up lack the ASCII characters required for backwards compatibility on earlier versions of Windows. Installing such fonts on Win 9x is not recommended, as they have a tendency to "take over" the system. The Raghindi is the only one I know about, but apparently there are others.
Step 3: Configuring your browser by selecting fonts
This is where you can choose a font for each language (aka writing system aka language script), but most languages are displayed fine with the default font your browser has chosen, so really, you only need to go in there if you don't like the default font for a particular language, or if a particular language is not displaying correctly with the default font. Here's where you go to select fonts for various browswers:
- IE: Tools > Internet Options > Fonts > Language script
- Opera: Tools > Preferences > Advanced > Fonts > International Fonts > Writing system
- Firefox: Tools > Options > Content > Advanced (Fonts & Colors) > Fonts for
- Netscape 8: Tools > Options > Browser Options, General > Fonts & Colors > Fonts for
- Safari: Edit > Preferences > ?
This step reveals a significant difference between Mozilla-based browsers (FireFox, Netscape and Mozilla) on the one hand, and IE (& Opera) on the other:
- for any particular language, IE and Opera only make you choose from fonts that will work with that language (usually no more than 10 will be on the list on my system).
- for every language, Mozilla browsers give you every font on your system to choose from (hundreds on mine), and if you have no idea what you are looking for, you'll be lost.
So what I do is, I use IE to see which fonts work with a particular language, and then I know what to look for in Firefox. If there's no font listed for a particular language, and it isn't displaying correctly, you have to go back to Step 2: Obtaining Unicode compatible fonts.
Alan Wood offers directions for configuring various browsers (not the latest versions, but probably still helpful) at Unicode and Multilingual Web Browsers. To help with the decisions about which fonts to choose for what, the following chart sets out font options for Netscape encodings and for IE language scripts that should work (it's very outdated, but I just can't bring myself to delete it.)
(all fonts listed below should be available at Alan Wood's Unicode Resources)
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Step 4: Configuring your browser by selecting the right encoding
You really only need to do this if the text on a page is gibberish. When that happens, the first thing you want to check is what encoding your browser is using. It may need changing. It's quite easy to check and even change encoding. Just click on 'View' on the top menu bar of any browser and then click:
- 'Character Encoding' for the Mozilla browsers (Firefox, Netscape and Mozilla);
- 'Encoding' for IE and Opera; and
- 'Text Encoding' for Safari
The encoding with the dot or check mark is the one being used. You can take an educated guess as to what you should change to depending what language the gibberish is supposed to be. E.g. choose one of the Japanese enclodings if the gibberish is supposed to be Japanese. Then just keep choosing till it works.
More detailed directions on how to select encodings for various versions of different browsers (again, not the latest versions, but probably still helpful) can be found on the same pages where the directions for Step 3 are located (i.e. go to Alan Wood's Unicode and Multilingual Web Browsers, click on a browser, then scroll down to the end of the instructions for selecting Fonts till you see the instructions for Encodings).
Language Support
adapted on July 4, 2001 from page at
http://www.palevich.com/ja/faq_japanese.htm
(page no longer seems
to exist)
These questions and answers refer specifically to Japanese, but the same process works for all international characters, such as Chinese, Korean, Vietnamese, Cyrillic, Arabic, Hebrew, Greek, Extended Latin, and African and Latin American Symbols.
Q: I don't care about Unicode and all that information, I just want to know how to see Japanese text on the web. I'm using Netscape Navigator 4.x on a non-Japanese version of Windows 95/98/NT/2000, and I see funny characters when I visit a web site like http://www.yahoo.co.jp/.
A: Either upgrade to Netscape 7 or higher or install Internet Explorer (IE) 5.5 or higher. (Don't worry, you can still use Navigator as your browser. Once you have the necessary fonts installed for IE, they should also work for Netscape, although I still had display problems sometimes with Hebrew, Chinese, Japanese, and Korean in Netscape 4. To solve this problem, I used IE when visiting a site with one of those four languages. When I upgraded to Netscape 7, the problem went away.)
To install Internet Explorer, go to this URL http://www.microsoft.com/windows/ie/default.htm and follow directions. Don't worry, even if you choose the complete, non-custom version, you will still be able to make Netscape your default browser.
Once you have installed IE, start IE, and go to the View menu and choose the menu item "Encoding:Auto Select". Then use IE to visit the web site http://www.yahoo.co.jp/ Once you're there, you'll either see the site in Japanese, or you will get a dialog box that says something like "I need to install Japanese Language Support to view this page". If you get the dialog box, click "Yes", or "OK", to install the Japanese Language Support (it's actually a font plus some stuff for the operating system). After that's downloaded and installed, you should be able to surf the web in Japanese using IE.
You may need to repeat this "installation of the Language Support" process for each language you want to access.
Now, you can either continue to use IE, or you can go back to using Netscape 4, and just use IE for Hebrew, Chinese, Japanese and Korean (or you can get with it and upgrade).
Q: How come some Japanese pages look fine, but others are garbled?
A: Japanese web pages are encoded in one of several different text encodings. Some Japanese web pages contain special HTML tags that tell your browser which encoding the page is using. Unfortunately, most Japanese web pages don't contain these tags. As a result, your browser has to guess which of the possible encodings is being used. Most browsers have a "View:Encoding" menu (Netscape's is "View:Character Set" until Netscape 7, which is "View:Character Coding") with an option named "Auto-Select" that tells the browser to try and guess which encoding is being used. You should normally select that option. (Netscape 7 calls it "Auto-Detect" and previous versions of Netscape have neither "Auto-Detect" nor "Auto-Select".)
Even with that "Auto-Select" selected, you may occasionally find a web page that displays garbled Japanese text. In that case, use the View:Encoding menu to manually select each of the Japanese encoding options in turn. First try "Japanese (Auto Select)", then the others. One of them should work for that page. If you know that the author of the page used a PC or a Mac to create the page, the encoding is probably Shift-JIS. If the author used unix, the encoding is probably EUC. You might also have to do some "font selecting" for your browser as set out in Step 3 above.
Netscape users: Because Netscape 4.x doesn't have an "Auto-Select", you often have to manually select. Go to the "View" menu, choose "Character Set" and keep selecting until it works. Or why not just take the plunge and upgrade?
Acknowledgements
Thank you especially to James Kass, Jukka "Yucca" Korpela and Alan Wood. Were it not for their work and excellent material freely available on the web (and James Kass' generous help and suggestions), I would understand very little about encoding systems, or about Unicode and how to use it, and the above would not exist.




