Overview - String length and position functions

What is it?

String length and position functions are tools in PostgreSQL that help you find out how long a text is and where certain parts of that text appear. For example, you can ask how many characters a word has or where a specific letter shows up in a sentence. These functions work with text data and make it easier to analyze or manipulate strings in your database.

Why it matters

Without these functions, it would be hard to work with text data efficiently. Imagine trying to find a word inside a sentence or counting characters without any help — it would be slow and error-prone. These functions save time and make your queries smarter, helping applications like search engines, data validation, and formatting work correctly.

Where it fits

Before learning these functions, you should understand basic SQL queries and how text data is stored in databases. After mastering them, you can explore more advanced text processing functions like pattern matching with LIKE or regular expressions, and string manipulation functions like substring or concatenation.

Mental Model

Core Idea

String length and position functions let you measure text size and find where parts of text appear inside larger strings.

Think of it like...

It's like using a ruler to measure the length of a rope and a magnifying glass to spot where a knot is tied along it.

┌───────────────┐
│   Full String │
│ "hello world"│
└──────┬────────┘
       │
       │ length('hello world') = 11
       │ position('world' in 'hello world') = 7
       ▼
┌───────────────┐
│  Functions    │
│ length(),     │
│ position()    │
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding string length basics

Concept: Learn how to find the number of characters in a string using the length function.

In PostgreSQL, the length() function returns the number of characters in a text string. For example, length('apple') returns 5 because 'apple' has five letters. This helps you know how big or small your text data is.

Result

length('apple') = 5

Knowing how to measure string length is fundamental for validating data size and preparing for more complex text operations.

2

FoundationFinding substring position basics

3

IntermediateHandling empty and special strings

4

IntermediateUsing position with case sensitivity

5

IntermediateCombining length and position in queries

6

AdvancedWorking with multibyte characters

7

ExpertPerformance considerations with large texts

Under the Hood

PostgreSQL stores text as variable-length data with UTF-8 encoding. The length() function counts characters by decoding the UTF-8 bytes, not just counting bytes. The position() function scans the string from the start, comparing substrings byte by byte until it finds a match or reaches the end. Both functions operate at runtime during query execution.

Why designed this way?

This design balances flexibility and correctness. Counting characters instead of bytes supports international text properly. Scanning for position is simple and reliable, though not the fastest for large texts. Alternatives like indexes or full-text search exist but are more complex. The functions keep SQL simple and predictable.

┌───────────────┐
│ Text Storage  │
│ UTF-8 bytes   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ length()      │
│ - Decode UTF-8│
│ - Count chars │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ position()    │
│ - Scan bytes  │
│ - Compare     │
│ - Return pos  │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does length() count bytes or characters in PostgreSQL? Commit to your answer.

Common Belief:Length() counts bytes in the string.

Tap to reveal reality

Quick: Does position() find substrings regardless of case? Commit to your answer.

Common Belief:Position() is case-insensitive and finds substrings regardless of letter case.

Tap to reveal reality

Quick: If position() does not find a substring, does it return NULL or 0? Commit to your answer.

Common Belief:Position() returns NULL if the substring is not found.

Tap to reveal reality

Quick: Does length(' ') return 0 or 1? Commit to your answer.

Common Belief:Length(' ') returns 0 because it's just a space.

Tap to reveal reality

Expert Zone

1

Length() counts characters, but octet_length() counts bytes; knowing when to use each is crucial for storage and network considerations.

2

Position() returns the first occurrence; to find later occurrences, you must use more complex queries or functions like strpos() with offsets.

3

Using length() on bytea (binary data) behaves differently; it counts bytes, not characters, which can confuse developers working with mixed data types.

When NOT to use

Avoid using length() and position() for very large text fields when performance is critical; instead, use full-text search indexes or trigram indexes for faster substring searches.

Production Patterns

In production, length() is often used for input validation (e.g., password length), while position() helps parse structured text like CSV fields or URLs. Combining these with CASE statements or COALESCE handles missing substrings gracefully.

Connections

Regular expressions

Builds-on

Understanding basic position and length functions prepares you to use regular expressions for more powerful and flexible text searching.

Data validation

Same pattern

Length checks in strings are a core part of validating user input, ensuring data fits expected formats and sizes.

Human perception of language

Opposite pattern

While length() counts characters exactly, humans perceive words and letters differently, highlighting the gap between raw data and user experience.

Common Pitfalls

#1Assuming position() returns NULL when substring is missing.

Wrong approach:SELECT position('z' in 'apple'); -- expecting NULL

Correct approach:SELECT position('z' in 'apple'); -- returns 0

Root cause:Misunderstanding the function's return value leads to incorrect NULL checks and query logic.

#2Using length() to count bytes instead of characters.

Wrong approach:SELECT length(convert_to('ñandú', 'UTF8')); -- expecting 5

Correct approach:SELECT length('ñandú'); -- returns 5 characters correctly

Root cause:Confusing byte arrays with text strings causes wrong length calculations.

#3Ignoring case sensitivity in position searches.

Wrong approach:SELECT position('A' in 'apple'); -- expecting 1

Correct approach:SELECT position('a' in lower('Apple')); -- returns 1

Root cause:Not accounting for case differences leads to missed matches.

Key Takeaways

String length and position functions help measure text size and find parts inside strings in PostgreSQL.

Length() counts characters, not bytes, which is important for international text handling.

Position() is case-sensitive and returns 0 if the substring is not found, not NULL.

Combining these functions enables powerful text analysis but requires careful handling of edge cases like empty strings and case differences.

Understanding their internal workings and limitations helps write efficient and correct queries in real-world applications.