Wednesday, January 24, 2007

What is Unicode?

----------------------------------------
an extract from an article by Sarah Ellis
-----------------------------------------
In this era of globalization, the ability for systems to be able to handle data from
around the world is becoming paramount. However, workstations and servers can use
different code pages, depending on the native language of the workstation user. In
effect the workstation and servers are speaking “different languages” and this makes
communication difficult.

For example, if a workstation inserts some data into a DB2 for z/OS system, the data
is converted fro m ASCII to EBCDIC using a conversion table, which maps the code
points from the source (ASCII) CCSID to the target (EBCDIC) CCSID.
In addition to a conversion cost, a more serious issue is the potential loss of
characters. For example, if a Japanese workstation were inserting data into a
European DB2 system, many characters would not have a code point in the CCSID used
by DB2. Either the characters must be lost (enforced subset conversions) or DB2 must
map them to code points that are not already used (a round trip conversion). The
problem with the second option is that another system reading the data will not know
about this mapping and may not read the data correctly, perhaps mapping the
characters to some of its own characters.

The design objective of Unicode is to avoid these issues by having a single code page
that has a code point mapping for every character in the world. The Unicode
Consortium has devised a number of Universal Transformation Formats (UTFs) which
include unique code points for most current and historical languages, mathematical and
scientific symbols, and can be extended as new characters emerge. These UTFs have
become widely accepted, being used by technologies such as Java, XML and LDAP.
Many consider Unicode as the foundation for globalization of data and it is becoming a
strategic direction for many companies. For example, Microsoft has adopted Unicode
with products such as Word by storing data in Unicode and by providing Unicode APIs
for ODBC


Note:

Unicode only affects character data or numeric data stored as characters
ie. CHAR, VARCHAR, GRAPHIC, VARGRAPHIC, CLOB, DBCLOB. Numeric data stored as
as binary, packed or floating point are not affected.

1 comment:

தினேஷ் said...

Good information, Thanks

Dinesh