0
0
Intro to Computingfundamentals~6 mins

How text is stored (ASCII, Unicode) in Intro to Computing - Step-by-Step Explanation

Choose your learning style9 modes available
Introduction
Imagine you want to send a message to a friend using a secret code that both of you understand. Computers face a similar challenge when they need to store and share text. They use special codes to turn letters and symbols into numbers that machines can handle.
Explanation
ASCII
ASCII stands for American Standard Code for Information Interchange. It uses numbers from 0 to 127 to represent English letters, digits, and some special symbols. Each character is stored as a 7-bit number, which means it can only represent basic English text and a few control codes.
ASCII stores basic English characters as numbers from 0 to 127.
Limitations of ASCII
ASCII cannot represent letters with accents, symbols from other languages, or emojis. This makes it unsuitable for global communication where many languages and symbols are used. Computers needed a better system to handle all kinds of text.
ASCII is limited to basic English characters and cannot handle global text.
Unicode
Unicode is a universal system that assigns a unique number to every character from almost all writing systems, symbols, and emojis. It uses more bits per character, allowing it to represent over a million different characters. This makes it possible to store text from any language in the world.
Unicode can represent characters from all languages and many symbols using unique numbers.
Encoding Forms of Unicode
Unicode characters are stored using encoding forms like UTF-8, UTF-16, or UTF-32. UTF-8 is the most common and uses one to four bytes per character, saving space for English text while supporting all characters. These encodings translate Unicode numbers into bytes that computers store.
Unicode uses encoding forms like UTF-8 to efficiently store characters as bytes.
Real World Analogy

Think of ASCII as a small dictionary that only has English words, while Unicode is a giant dictionary that includes words from every language and even emojis. When you write a letter, ASCII can only understand simple English words, but Unicode can understand any word or symbol you use.

ASCII → A small English-only dictionary with 128 words
Limitations of ASCII → Trying to write a letter with foreign words that the small dictionary doesn't have
Unicode → A giant dictionary with words from all languages and symbols
Encoding Forms of Unicode → Different ways to write down words from the giant dictionary efficiently
Diagram
Diagram
┌───────────────┐
│   Text Input  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│    ASCII      │
│ (7-bit codes) │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Limited to    │
│ English chars │
└───────────────┘


┌───────────────┐
│   Text Input  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│   Unicode     │
│ (many bits)   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ All languages │
│ and symbols   │
└───────────────┘
This diagram shows how ASCII stores only basic English characters with 7-bit codes, while Unicode can store all languages and symbols using more bits.
Key Facts
ASCIIA character encoding using 7 bits to represent 128 basic English characters and control codes.
UnicodeA universal character encoding system that assigns unique numbers to characters from all languages and symbols.
UTF-8A Unicode encoding that uses 1 to 4 bytes per character, optimizing space for English text.
Character EncodingA method to convert characters into numbers that computers can store and process.
Limitations of ASCIIASCII cannot represent accented letters, non-English alphabets, or emojis.
Common Confusions
Thinking ASCII and Unicode are the same because both store text as numbers.
Thinking ASCII and Unicode are the same because both store text as numbers. ASCII is a small subset of Unicode; Unicode includes ASCII but also many more characters from other languages and symbols.
Believing Unicode stores characters as fixed-size bytes only.
Believing Unicode stores characters as fixed-size bytes only. Unicode characters can be stored using different encoding forms like UTF-8, which uses variable byte lengths per character.
Summary
Computers store text by converting characters into numbers using encoding systems.
ASCII uses 7 bits to represent basic English characters but cannot handle global text.
Unicode assigns unique numbers to characters from all languages and symbols, using encoding forms like UTF-8 to store them efficiently.