Character Sets & Encodings

Before Unicode, different computing systems used different character encodings to represent text. Each encoding maps byte values (0-255) to specific characters. While the first 128 characters (standard ASCII) are consistent across most encodings, the upper 128 characters (128-255) vary significantly between different character sets.

Code Page 437

The original IBM PC character set, used in DOS and the BIOS.

View full table →

Windows-1252

The default character encoding for Windows in Western languages.

View full table →

ISO 8859-1 (Latin-1)

The standard Western European character encoding.

View full table →

ISO 8859-2 (Latin-2)

Character encoding for Central European languages.

View full table →

Understanding Character Encodings

ASCII (0-127)

The first 128 characters are identical across all common encodings. This is the standard ASCII set including control characters, digits, uppercase and lowercase letters, and basic punctuation.

Extended Range (128-255)

The upper 128 byte values are where encodings differ. CP437 uses box-drawing characters and Greek letters. Windows-1252 adds smart quotes and the Euro sign. ISO 8859 variants serve different language groups.

Unicode (UTF-8)

Modern systems use Unicode (usually encoded as UTF-8) which supports over 140,000 characters from all writing systems. UTF-8 is backward-compatible with ASCII for the first 128 characters.

Why It Matters

Understanding character encodings is crucial for working with international text, debugging garbled characters (mojibake), parsing legacy data files, and ensuring correct text display across systems.