Character Sets & Encodings

Before Unicode, different computing systems used different character encodings to represent text. Each encoding maps byte values (0-255) to specific characters. While the first 128 characters (standard ASCII) are consistent across most encodings, the upper 128 characters (128-255) vary significantly between different character sets.

Understanding Character Encodings

ASCII (0-127)

The first 128 characters are identical across all common encodings. This is the standard ASCII set including control characters, digits, uppercase and lowercase letters, and basic punctuation.

Extended Range (128-255)

The upper 128 byte values are where encodings differ. CP437 uses box-drawing characters and Greek letters. Windows-1252 adds smart quotes and the Euro sign. ISO 8859 variants serve different language groups.

Unicode (UTF-8)

Modern systems use Unicode (usually encoded as UTF-8) which supports over 140,000 characters from all writing systems. UTF-8 is backward-compatible with ASCII for the first 128 characters.

Why It Matters

Understanding character encodings is crucial for working with international text, debugging garbled characters (mojibake), parsing legacy data files, and ensuring correct text display across systems.