Saturday, July 4, 2009

wchar and unicode

Instead of using the standard ASCII characters, there is sometimes a need to support an alphabet that has more than 256 (2^8) characters. That is where you use Unicode where every character is 16 bits, thus giving you 65536 (2^16) characters.

In C for example, a wchar would be a Unicode character. The regular C string functions won't work on Unicode strings, so instead you use the C runtime library functions available for Unicode strings, prefixed with wcs.
Example: wcslen, wcscpy and such.

Of course the C compiler and the runtime library must have support for Unicode if you plan to use it.

You tell the C compiler that you plan to use Unicode through a macro:

_UNICODE // Tell C we're using Unicode, notice the _
#include
// Include Unicode support functions

Then define a Unicode string:

wchar_t string[] = L"Ubuntu rocks";

The L macro tell the compiler that this is a Unicode literal and not a char (unsigned short) variable.

Here is the wchar.h header file.


No comments: