Bird
Raised Fist0
Intro to Computingfundamentals~6 mins

How text is stored (ASCII, Unicode) in Intro to Computing - Step-by-Step Explanation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Imagine you want to send a message to a friend using a secret code that both of you understand. Computers face a similar challenge when they need to store and share text. They use special codes to turn letters and symbols into numbers that machines can handle.
Explanation
ASCII
ASCII stands for American Standard Code for Information Interchange. It uses numbers from 0 to 127 to represent English letters, digits, and some special symbols. Each character is stored as a 7-bit number, which means it can only represent basic English text and a few control codes.
ASCII stores basic English characters as numbers from 0 to 127.
Limitations of ASCII
ASCII cannot represent letters with accents, symbols from other languages, or emojis. This makes it unsuitable for global communication where many languages and symbols are used. Computers needed a better system to handle all kinds of text.
ASCII is limited to basic English characters and cannot handle global text.
Unicode
Unicode is a universal system that assigns a unique number to every character from almost all writing systems, symbols, and emojis. It uses more bits per character, allowing it to represent over a million different characters. This makes it possible to store text from any language in the world.
Unicode can represent characters from all languages and many symbols using unique numbers.
Encoding Forms of Unicode
Unicode characters are stored using encoding forms like UTF-8, UTF-16, or UTF-32. UTF-8 is the most common and uses one to four bytes per character, saving space for English text while supporting all characters. These encodings translate Unicode numbers into bytes that computers store.
Unicode uses encoding forms like UTF-8 to efficiently store characters as bytes.
Real World Analogy

Think of ASCII as a small dictionary that only has English words, while Unicode is a giant dictionary that includes words from every language and even emojis. When you write a letter, ASCII can only understand simple English words, but Unicode can understand any word or symbol you use.

ASCII → A small English-only dictionary with 128 words
Limitations of ASCII → Trying to write a letter with foreign words that the small dictionary doesn't have
Unicode → A giant dictionary with words from all languages and symbols
Encoding Forms of Unicode → Different ways to write down words from the giant dictionary efficiently
Diagram
Diagram
┌───────────────┐
│   Text Input  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│    ASCII      │
│ (7-bit codes) │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Limited to    │
│ English chars │
└───────────────┘


┌───────────────┐
│   Text Input  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│   Unicode     │
│ (many bits)   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ All languages │
│ and symbols   │
└───────────────┘
This diagram shows how ASCII stores only basic English characters with 7-bit codes, while Unicode can store all languages and symbols using more bits.
Key Facts
ASCIIA character encoding using 7 bits to represent 128 basic English characters and control codes.
UnicodeA universal character encoding system that assigns unique numbers to characters from all languages and symbols.
UTF-8A Unicode encoding that uses 1 to 4 bytes per character, optimizing space for English text.
Character EncodingA method to convert characters into numbers that computers can store and process.
Limitations of ASCIIASCII cannot represent accented letters, non-English alphabets, or emojis.
Common Confusions
Thinking ASCII and Unicode are the same because both store text as numbers.
Thinking ASCII and Unicode are the same because both store text as numbers. ASCII is a small subset of Unicode; Unicode includes ASCII but also many more characters from other languages and symbols.
Believing Unicode stores characters as fixed-size bytes only.
Believing Unicode stores characters as fixed-size bytes only. Unicode characters can be stored using different encoding forms like UTF-8, which uses variable byte lengths per character.
Summary
Computers store text by converting characters into numbers using encoding systems.
ASCII uses 7 bits to represent basic English characters but cannot handle global text.
Unicode assigns unique numbers to characters from all languages and symbols, using encoding forms like UTF-8 to store them efficiently.

Practice

(1/5)
1. What is the main purpose of ASCII in text storage?
easy
A. To compress text files
B. To store images and videos
C. To represent English letters and symbols as numbers
D. To encrypt text data

Solution

  1. Step 1: Understand ASCII's role

    ASCII is a code that assigns numbers to English letters and symbols so computers can store and process them.
  2. Step 2: Compare with other options

    Options A, B, and D describe unrelated functions like storing images, compressing, or encrypting, which ASCII does not do.
  3. Final Answer:

    To represent English letters and symbols as numbers -> Option C
  4. Quick Check:

    ASCII = English letters as numbers [OK]
Hint: ASCII is for English letters and symbols only [OK]
Common Mistakes:
  • Thinking ASCII stores images or videos
  • Confusing ASCII with encryption
  • Assuming ASCII compresses text
2. Which of the following is a correct ASCII code for the uppercase letter 'A'?
easy
A. 97
B. 65
C. 128
D. 256

Solution

  1. Step 1: Recall ASCII codes for letters

    In ASCII, uppercase 'A' is represented by the number 65.
  2. Step 2: Check other options

    97 is lowercase 'a', 128 and 256 are outside standard ASCII range.
  3. Final Answer:

    65 -> Option B
  4. Quick Check:

    ASCII 'A' = 65 [OK]
Hint: Uppercase 'A' in ASCII is 65 [OK]
Common Mistakes:
  • Mixing uppercase and lowercase ASCII codes
  • Choosing numbers outside ASCII range
  • Confusing ASCII with Unicode codes
3. Given the Unicode code point U+1F600, what character does it represent?
medium
A. Smiling face emoji 😀
B. Latin capital letter A
C. Greek letter alpha
D. Digit zero '0'

Solution

  1. Step 1: Identify Unicode code point

    U+1F600 is a Unicode code point in the emoji range.
  2. Step 2: Match code point to character

    U+1F600 corresponds to the smiling face emoji 😀, not letters or digits.
  3. Final Answer:

    Smiling face emoji 😀 -> Option A
  4. Quick Check:

    Unicode U+1F600 = 😀 emoji [OK]
Hint: Unicode U+1F600 is a common emoji code [OK]
Common Mistakes:
  • Assuming all Unicode codes are letters
  • Confusing emoji codes with ASCII
  • Picking digits or Greek letters incorrectly
4. A program tries to store the character 'ñ' using ASCII encoding. What is the likely problem?
medium
A. The character 'ñ' is not in ASCII, causing incorrect storage
B. 'ñ' is stored correctly because ASCII supports all characters
C. The program will convert 'ñ' to uppercase automatically
D. ASCII will store 'ñ' as the number 10

Solution

  1. Step 1: Check ASCII character range

    ASCII supports only basic English letters and symbols, not special characters like 'ñ'.
  2. Step 2: Understand encoding limitations

    Trying to store 'ñ' in ASCII will cause incorrect storage or errors because it is outside ASCII's range.
  3. Final Answer:

    The character 'ñ' is not in ASCII, causing incorrect storage -> Option A
  4. Quick Check:

    ASCII lacks 'ñ' character [OK]
Hint: ASCII covers only basic English letters [OK]
Common Mistakes:
  • Assuming ASCII supports all characters
  • Thinking ASCII converts characters automatically
  • Believing ASCII stores 'ñ' as number 10
5. You want to store text containing English letters, Chinese characters, and emojis. Which encoding should you use?
hard
A. ASCII only
B. Morse code
C. Binary code for numbers only
D. Unicode (like UTF-8)

Solution

  1. Step 1: Identify text types

    The text includes English letters, Chinese characters, and emojis, which require a wide range of characters.
  2. Step 2: Choose suitable encoding

    ASCII supports only English letters; binary code and Morse code are not text encodings. Unicode (like UTF-8) supports all these characters.
  3. Final Answer:

    Unicode (like UTF-8) -> Option D
  4. Quick Check:

    Unicode supports all languages and emojis [OK]
Hint: Use Unicode for all languages and emojis [OK]
Common Mistakes:
  • Choosing ASCII for non-English text
  • Confusing binary code with text encoding
  • Selecting Morse code for digital text storage