Overview - String as character array

What is it?

A string is a sequence of characters stored one after another in memory, much like an array of letters. Each character occupies a position called an index, starting from zero. This means a string can be thought of as a list of characters that together form words or sentences. Understanding strings as character arrays helps us see how computers store and manipulate text.

Why it matters

Without viewing strings as arrays of characters, it would be hard to understand how text is stored, changed, or searched inside computers. This concept solves the problem of representing words and sentences in a way that machines can handle easily. If strings were not stored as character arrays, operations like finding a letter, changing a word, or counting characters would be much slower or impossible.

Where it fits

Before learning this, you should know what arrays are and how data is stored in memory. After this, you can learn about string operations like concatenation, searching, and slicing, or explore how different programming languages handle strings internally.

Mental Model

Core Idea

A string is simply a list of characters stored one after another in memory, just like an array.

Think of it like...

Imagine a string as a row of mailboxes, each holding one letter. Together, the letters form a word, and you can open any mailbox by its position to see or change the letter inside.

String: "HELLO"
Indexes:  0 1 2 3 4
Storage: [H][E][L][L][O]

Build-Up - 7 Steps

1

FoundationUnderstanding arrays as sequences

Concept: Arrays store multiple items in order, each accessible by a number called an index.

An array is like a row of boxes, each with a number starting at zero. You can put a value in each box and get it back by using its number. For example, an array of numbers [10, 20, 30] has 10 at index 0, 20 at index 1, and 30 at index 2.

Result

You can store and access multiple values quickly and in order.

Knowing arrays helps you understand how strings store characters in a similar ordered way.

2

FoundationCharacters as individual data units

3

IntermediateStrings as arrays of characters

4

IntermediateNull terminator and string length

5

IntermediateImmutable vs mutable strings

6

AdvancedMemory layout and performance implications

7

ExpertEncoding and multi-byte characters impact

Under the Hood

Strings as character arrays are stored in contiguous memory locations, where each character occupies a fixed or variable number of bytes depending on encoding. The computer accesses characters by calculating the memory address using the base address plus the index times the size of each character. In languages with null-terminated strings, the end is marked by a special zero byte, while others store length separately. This layout allows fast random access but can make insertions or deletions costly because characters must be shifted.

Why designed this way?

Storing strings as arrays of characters was chosen because it matches how memory works—linear and addressable by offsets. Early computers had limited resources, so using arrays was efficient and simple. Null terminators were introduced to avoid storing length explicitly, saving space. Alternatives like linked lists or trees for strings exist but are slower for common tasks. The design balances speed, simplicity, and memory use.

Memory layout of string "DOG":

Base address -> [D][O][G][\0]
               |  |  |   |
Index          0  1  2   end marker

Access: address = base + index * size_of_char

Myth Busters - 4 Common Misconceptions

Quick: Do you think strings are always mutable and can be changed character by character? Commit to yes or no.

Common Belief:Strings are always mutable arrays of characters that you can change anytime.

Tap to reveal reality

Quick: Do you think each character in a string always uses exactly one byte? Commit to yes or no.

Common Belief:Each character in a string uses one byte of memory.

Tap to reveal reality

Quick: Do you think strings always store their length explicitly in memory? Commit to yes or no.

Common Belief:Strings always store their length as a separate value in memory.

Tap to reveal reality

Quick: Do you think accessing characters in a string is always slow because strings are complex? Commit to yes or no.

Common Belief:Accessing characters in a string is slow because strings are complicated data structures.

Tap to reveal reality

Expert Zone

1

Some languages optimize strings by sharing memory for identical strings (string interning), reducing memory use.

2

Mutable strings often require careful memory management to avoid buffer overflows or leaks, especially in low-level languages.

3

Handling multi-byte characters requires special libraries or functions to avoid breaking characters when slicing or indexing.

When NOT to use

Using strings as simple character arrays is not ideal when frequent insertions or deletions in the middle are needed; data structures like linked lists or ropes are better. Also, for very large texts or streaming data, specialized string buffers or streams are preferred.

Production Patterns

In real-world systems, strings as character arrays are used for fast text processing, parsing, and storage. Developers often combine this with encoding-aware libraries to handle internationalization. Immutable strings are favored in multi-threaded environments to avoid synchronization issues.

Connections

Array data structure

Strings are a specific use case of arrays specialized for characters.

Understanding arrays deeply helps grasp string storage and access patterns.

Unicode encoding

String storage depends on encoding schemes like Unicode to represent diverse characters.

Knowing encoding explains why string length and indexing can be complex.

Human language processing

Strings as character arrays are the digital foundation for representing human language text.

Understanding string storage helps bridge computing with linguistics and text analysis.

Common Pitfalls

#1Trying to change a character in an immutable string directly.

Wrong approach:string = "hello" string[0] = 'H' # Attempt to change first character

Correct approach:string = "hello" string = 'H' + string[1:] # Create a new string with changed first character

Root cause:Misunderstanding that strings are immutable in many languages, so individual characters cannot be changed in place.

#2Assuming string length equals number of bytes in memory.

Wrong approach:length = len(string) # Using length as byte count without considering encoding

Correct approach:length = len(string.encode('utf-8')) # Get actual byte size considering encoding

Root cause:Confusing character count with byte count, ignoring multi-byte characters in encodings like UTF-8.

#3Reading past the end of a null-terminated string.

Wrong approach:for i in range(100): print(string[i]) # No check for null terminator

Correct approach:i = 0 while string[i] != '\0': print(string[i]) i += 1 # Stop at null terminator

Root cause:Not recognizing that null terminator marks string end, leading to out-of-bounds access.

Key Takeaways

Strings are stored as arrays of characters, each accessible by an index starting at zero.

Many languages use a null terminator or store length separately to mark where the string ends.

Strings can be immutable or mutable depending on the language, affecting how you modify them.

Character encoding affects how many bytes each character uses, complicating indexing and length calculations.

Understanding strings as character arrays helps write efficient and correct programs that handle text.