Character Encodings

Historically, few MultiValue developers have needed to immerse themselves in the murky waters of character encodings. But in today's world, with ever greater internationalization, a basic grasp of encoding is becoming essential — whether working with email, web pages, client languages such as those in the .NET platform, or for data storage and transfer.

Unicode and internationalization are far too large a subject to cover in a single article, so this will give a high level understanding of how modern character encoding works, how it came about — and why you need to know about it.

The ASCII Era

MultiValue databases typically use an eight bit encoding scheme, which dates back to the days when all computers spoke English and IBM was still clinging on to EBCDIC whilst the rest of the industry was standardising around ASCII. Within the ASCII model all necessary characters in the English speaking world could be fitted neatly into seven bytes along with certain control characters and punctuation. This even left a whole bit to spare on the then-prevalent eight bit architectures, and different technology groups decided to use this for their own devious purposes: giving rise to our own marker characters — the field, value, subvalue and text marks — sitting at the top of the table.

BRIAN LEACH

View more articles

Featured:

Mar/Apr 2011