Domain Registration Related News
Is Internationalization finally getting
somewhere, or is the DNS just becoming balkanised
March 2004
Although Tim Berners-Lee richly deserves his
knighthood for creating one of the most important
technologies of the 20th century, in one respect
the World Wide Web has failed to deliver. It may
have been global from the start - potentially
accessible anywhere in the world - but making it
truly international - able to reflect all
cultures, irrespective of their language or
writing system - has been an enormous struggle for
the non- Anglophone world.
The first problem to be addressed was how to
create Web pages with characters other than
standard ASCII. The solution seemed simple enough:
the use of extended sets, which allowed different
non-ASCII characters to be employed on a per-page
basis. But the solution brought its own problems,
with many alternative extensions for a given
script.
Therefore, an overarching approach called Unicode
was developed that defined a single, universal
coding scheme embracing all scripts. Unicode may
not yet include everything, but all the major
families are there, and many of the less common
ones will be added soon (even Egyptian hieroglyphs
are being worked on).
Unicode addresses part of the problem that
international Web pages pose: how to bring in
extra characters in a consistent manner. But it
leaves open another question: how to represent
digitally the tens of thousands of different
characters that go to make up the Unicode set. In
fact, online, the challenge is even greater: how
to represent those characters compactly in binary
while preserving backward compatibility with
existing systems.
The most popular solution is UTF-8 (short for
Universal Multiple-Octet Coded Character Set
Transformation Format 8). It was invented in 1992
by no less a person than Ken Thompson, writing on
the proverbial place-mat; together with the
co-inventor Rob Pike he later published a paper on
the subject, aptly entitled "Hello World". A
useful FAQ on Unicode and UTF-8 issues fills in
the details.
There are a wide range of practical resources in
this area. For example, test pages, help in
setting up Unicode support in browsers and other
programs, and in resolving display problems, as
well as how to create multilingual Web pages.
Even this is by no means the end of the story.
Unicode may make the content truly international,
but does nothing to solve an equally pressing
issue: how to create domain names using non-ASCII
characters.
|