0
0
MySQLquery~15 mins

LENGTH and CHAR_LENGTH in MySQL - Deep Dive

Choose your learning style9 modes available
Overview - LENGTH and CHAR_LENGTH
What is it?
LENGTH and CHAR_LENGTH are two MySQL functions used to measure the size of strings. LENGTH returns the number of bytes in a string, while CHAR_LENGTH returns the number of characters. This difference matters because some characters can use more than one byte, especially in multi-byte character sets like UTF-8. These functions help you understand string size in different ways.
Why it matters
Knowing the difference between LENGTH and CHAR_LENGTH is important when working with text data, especially in languages with special characters or emojis. Without this knowledge, you might miscalculate storage needs or truncate strings incorrectly, causing data loss or errors. It ensures your database handles text accurately and efficiently.
Where it fits
Before learning LENGTH and CHAR_LENGTH, you should understand basic string data types and character encoding in databases. After mastering these functions, you can explore string manipulation functions, data validation, and optimizing storage for text data.
Mental Model
Core Idea
LENGTH counts bytes used by a string, while CHAR_LENGTH counts the actual characters, which may use multiple bytes each.
Think of it like...
Imagine a book where LENGTH counts the number of letters printed on the pages, including spaces and punctuation, while CHAR_LENGTH counts the number of words you read. Some words might be longer, but CHAR_LENGTH cares about how many words, not how many letters.
String: "café"

Bytes (LENGTH): 5 (c=1, a=1, f=1, é=2 bytes)
Characters (CHAR_LENGTH): 4 (c, a, f, é)

┌─────────────┐
│ c │ a │ f │ é │
├─┬─┬─┬───────┤
│1│1│1│  2    │  ← bytes
└─┴─┴─┴───────┘

LENGTH = 5 bytes
CHAR_LENGTH = 4 characters
Build-Up - 6 Steps
1
FoundationUnderstanding String Storage Basics
🤔
Concept: Strings are stored as bytes in a database, and each character can use one or more bytes depending on encoding.
In MySQL, text is stored using character sets like UTF-8. Each character can take different bytes: simple letters like 'a' use 1 byte, while special characters like 'é' or emojis use multiple bytes. This means the number of bytes and the number of characters can differ.
Result
You learn that string length can be measured in bytes or characters, which are not always the same.
Understanding that characters can use multiple bytes is key to grasping why LENGTH and CHAR_LENGTH differ.
2
FoundationBasic Use of LENGTH Function
🤔
Concept: LENGTH returns the total number of bytes used by a string in MySQL.
Example: SELECT LENGTH('hello'); -- Returns 5 because each letter is 1 byte. SELECT LENGTH('café'); -- Returns 5 because 'é' uses 2 bytes in UTF-8.
Result
You get the byte size of strings, which helps understand storage size.
Knowing LENGTH tells you how much space a string takes in the database.
3
IntermediateBasic Use of CHAR_LENGTH Function
🤔
Concept: CHAR_LENGTH returns the number of characters in a string, regardless of byte size.
Example: SELECT CHAR_LENGTH('hello'); -- Returns 5 characters. SELECT CHAR_LENGTH('café'); -- Returns 4 characters, counting 'é' as one character.
Result
You get the count of characters, which is useful for user-facing text length.
CHAR_LENGTH helps measure what users see, not how much space it uses.
4
IntermediateComparing LENGTH and CHAR_LENGTH Results
🤔Before reading on: do you think LENGTH and CHAR_LENGTH always return the same number for any string? Commit to your answer.
Concept: LENGTH and CHAR_LENGTH can return different values when strings contain multi-byte characters.
Example: SELECT LENGTH('😊'), CHAR_LENGTH('😊'); -- LENGTH returns 4 bytes (emoji uses 4 bytes in UTF-8) -- CHAR_LENGTH returns 1 character This shows LENGTH counts bytes, CHAR_LENGTH counts characters.
Result
You see clear differences when strings have special characters.
Recognizing this difference prevents bugs in string processing and storage estimation.
5
AdvancedImpact of Character Sets on LENGTH and CHAR_LENGTH
🤔Before reading on: do you think changing the character set affects LENGTH or CHAR_LENGTH? Commit to your answer.
Concept: The character set of a column or string affects how many bytes each character uses, impacting LENGTH but not CHAR_LENGTH.
For example, in latin1 encoding, 'é' uses 1 byte, but in utf8mb4, it uses 2 bytes. SELECT LENGTH(_latin1 'é'), CHAR_LENGTH(_latin1 'é'); -- LENGTH = 1, CHAR_LENGTH = 1 SELECT LENGTH(_utf8mb4 'é'), CHAR_LENGTH(_utf8mb4 'é'); -- LENGTH = 2, CHAR_LENGTH = 1
Result
You understand that LENGTH depends on encoding, CHAR_LENGTH does not.
Knowing encoding effects helps avoid storage miscalculations and data corruption.
6
ExpertUsing LENGTH and CHAR_LENGTH in Real-World Queries
🤔Before reading on: do you think LENGTH and CHAR_LENGTH can be used to detect invalid or truncated strings? Commit to your answer.
Concept: Advanced use includes validating data integrity, detecting truncation, or filtering strings by byte size or character count.
Example: -- Find strings where byte size exceeds character count by more than 1 (likely multi-byte chars): SELECT name FROM users WHERE LENGTH(name) > CHAR_LENGTH(name) + 1; -- Detect truncated UTF-8 strings by comparing byte length to expected character count. These queries help maintain data quality in multilingual databases.
Result
You can write queries that handle complex text data issues.
Mastering these functions enables robust text data validation and storage optimization.
Under the Hood
MySQL stores strings as sequences of bytes according to the column's character set. LENGTH counts these bytes directly. CHAR_LENGTH counts characters by decoding the byte sequence according to the character set rules, recognizing multi-byte characters as single units. Internally, CHAR_LENGTH parses the byte stream to count characters, while LENGTH simply counts bytes.
Why designed this way?
This design separates physical storage size (bytes) from logical content size (characters). It allows developers to optimize storage and handle user-visible text length independently. Early databases used single-byte encodings, but with globalization, multi-byte encodings required this distinction to avoid confusion and errors.
┌───────────────┐
│ Stored String │
│ (byte array)  │
└──────┬────────┘
       │
       │ LENGTH counts bytes directly
       ▼
  [Byte Count]

       │
       │ CHAR_LENGTH decodes bytes into characters
       ▼
  [Character Count]

Character count ≤ Byte count

Example: 'é' in UTF-8
Bytes: 2
Characters: 1
Myth Busters - 4 Common Misconceptions
Quick: Does LENGTH always equal CHAR_LENGTH for any string? Commit to yes or no.
Common Belief:LENGTH and CHAR_LENGTH always return the same number because strings are just sequences of characters.
Tap to reveal reality
Reality:LENGTH counts bytes, which can be more than characters if multi-byte characters exist; CHAR_LENGTH counts characters regardless of byte size.
Why it matters:Assuming they are equal can cause bugs in string length checks, leading to incorrect truncation or storage allocation.
Quick: Does CHAR_LENGTH depend on the character set encoding? Commit to yes or no.
Common Belief:CHAR_LENGTH depends on encoding and can vary if the character set changes.
Tap to reveal reality
Reality:CHAR_LENGTH counts characters logically and is independent of how many bytes each character uses; it always returns the number of characters.
Why it matters:Misunderstanding this can lead to wrong assumptions about string length when changing encodings.
Quick: Can LENGTH be used to count characters in all cases? Commit to yes or no.
Common Belief:LENGTH can be used to count characters because each character is one byte.
Tap to reveal reality
Reality:LENGTH counts bytes, not characters, so it fails for multi-byte characters like emojis or accented letters.
Why it matters:Using LENGTH to count characters causes errors in user interfaces and data validation.
Quick: Is CHAR_LENGTH slower than LENGTH because it decodes characters? Commit to yes or no.
Common Belief:CHAR_LENGTH is always slower than LENGTH because it must decode multi-byte characters.
Tap to reveal reality
Reality:While CHAR_LENGTH does more work, modern databases optimize this operation, making the difference often negligible for typical use.
Why it matters:Overestimating performance cost may lead to premature optimization or avoiding CHAR_LENGTH when needed.
Expert Zone
1
CHAR_LENGTH counts characters as defined by the character set, so combining characters or surrogate pairs in UTF-16 may affect counts differently than expected.
2
LENGTH can be used to estimate storage size, but actual disk space used may differ due to compression or storage engine overhead.
3
In some collations, characters that look the same may have different byte representations, affecting LENGTH but not CHAR_LENGTH.
When NOT to use
Avoid using LENGTH when you need to count user-visible characters; use CHAR_LENGTH instead. For byte-level operations like encryption or hashing, LENGTH is appropriate. When working with fixed-width encodings, LENGTH and CHAR_LENGTH may behave similarly, but always verify encoding. For substring operations, use functions designed for characters, not bytes.
Production Patterns
In production, LENGTH is used to monitor storage usage and optimize database size. CHAR_LENGTH is used for input validation, UI limits, and ensuring text fits display constraints. Combining both helps detect data corruption or encoding issues. Developers also use these functions in migrations to convert or clean data between encodings.
Connections
Character Encoding
Builds-on
Understanding character encoding is essential to grasp why LENGTH and CHAR_LENGTH differ and how bytes map to characters.
Data Validation
Builds-on
Using LENGTH and CHAR_LENGTH helps validate user input length correctly, preventing errors and security issues.
Human Perception of Text
Related concept from psychology
Humans count characters visually, not bytes; CHAR_LENGTH aligns with this perception, showing how databases must bridge technical storage and human experience.
Common Pitfalls
#1Using LENGTH to limit user input length causes truncation of multi-byte characters.
Wrong approach:SELECT * FROM users WHERE LENGTH(username) <= 10;
Correct approach:SELECT * FROM users WHERE CHAR_LENGTH(username) <= 10;
Root cause:Confusing byte length with character count leads to incorrect input validation.
#2Assuming CHAR_LENGTH returns byte size for storage calculations.
Wrong approach:SET @size = CHAR_LENGTH(text_column); -- expecting bytes
Correct approach:SET @size = LENGTH(text_column); -- actual byte size
Root cause:Misunderstanding that CHAR_LENGTH counts characters, not bytes.
#3Ignoring character set differences when interpreting LENGTH results.
Wrong approach:Comparing LENGTH results across columns with different encodings without adjustment.
Correct approach:Ensure consistent character sets or use CHAR_LENGTH for character counts.
Root cause:Not accounting for encoding differences causes inconsistent byte counts.
Key Takeaways
LENGTH counts the number of bytes a string uses, while CHAR_LENGTH counts the number of characters.
Multi-byte characters cause LENGTH and CHAR_LENGTH to return different values, especially in UTF-8 encoding.
Use CHAR_LENGTH to measure user-visible string length and LENGTH to measure storage size in bytes.
Character set encoding affects LENGTH but not CHAR_LENGTH, so always consider encoding when working with string sizes.
Understanding these functions prevents common bugs in string handling, storage estimation, and data validation.