0
0
PythonProgramBeginner · 2 min read

Python Program to Compress a String Using Run-Length Encoding

You can compress a string in Python using run-length encoding by iterating through the string, counting consecutive characters, and storing each character with its count as in def compress_string(s): compressed = ''; count = 1; for i in range(1, len(s)+1): if i < len(s) and s[i] == s[i-1]: count += 1 else: compressed += s[i-1] + str(count); count = 1; return compressed.
📋

Examples

Inputaaabbc
Outputa3b2c1
Inputabcd
Outputa1b1c1d1
Input
Output
🧠

How to Think About It

To compress a string, look at each character and count how many times it repeats in a row. When the character changes, write down the character and the count. Repeat this until the end of the string.
📐

Algorithm

1
Get the input string.
2
Initialize an empty result string and a count variable set to 1.
3
Loop through the string from the second character to the end.
4
If the current character is the same as the previous one, increase the count.
5
If it is different, add the previous character and count to the result, then reset count to 1.
6
After the loop, add the last character and its count to the result.
7
Return the compressed string.
💻

Code

python
def compress_string(s):
    if not s:
        return ""
    compressed = ""
    count = 1
    for i in range(1, len(s)):
        if s[i] == s[i - 1]:
            count += 1
        else:
            compressed += s[i - 1] + str(count)
            count = 1
    compressed += s[-1] + str(count)
    return compressed

print(compress_string("aaabbc"))
print(compress_string("abcd"))
print(compress_string(""))
Output
a3b2c1 a1b1c1d1
🔍

Dry Run

Let's trace the string 'aaabbc' through the code

1

Initialize variables

compressed = '', count = 1

2

Compare characters at index 1 and 0

s[1] = 'a', s[0] = 'a', same -> count = 2

3

Compare characters at index 2 and 1

s[2] = 'a', s[1] = 'a', same -> count = 3

4

Compare characters at index 3 and 2

s[3] = 'b', s[2] = 'a', different -> compressed = 'a3', count = 1

5

Compare characters at index 4 and 3

s[4] = 'b', s[3] = 'b', same -> count = 2

6

Compare characters at index 5 and 4

s[5] = 'c', s[4] = 'b', different -> compressed = 'a3b2', count = 1

7

End of string reached

Add last char and count: compressed = 'a3b2c1'

IndexCurrent CharPrevious CharCountCompressed String
1aa2
2aa3
3ba1a3
4bb2a3
5cb1a3b2
💡

Why This Works

Step 1: Counting consecutive characters

The code counts how many times each character repeats in a row using a counter variable.

Step 2: Building the compressed string

When a new character appears, the previous character and its count are added to the result string.

Step 3: Handling the last character

After the loop ends, the last character and its count are added to ensure all characters are included.

🔄

Alternative Approaches

Using itertools.groupby
python
from itertools import groupby

def compress_string_itertools(s):
    return ''.join(char + str(len(list(group))) for char, group in groupby(s))

print(compress_string_itertools('aaabbc'))
This method is shorter and uses Python's built-in grouping but may be less clear for beginners.
Using regular expressions
python
import re

def compress_string_regex(s):
    return ''.join(f'{m.group(0)[0]}{len(m.group(0))}' for m in re.finditer(r'(.)\1*', s))

print(compress_string_regex('aaabbc'))
This uses regex to find repeated characters but requires understanding of regular expressions.

Complexity: O(n) time, O(n) space

Time Complexity

The program loops through the string once, so it takes linear time proportional to the string length.

Space Complexity

The compressed string can be at most twice the length of the input (character plus count), so space is linear.

Which Approach is Fastest?

The manual loop and itertools.groupby both run in O(n) time; itertools is concise but may have slight overhead.

ApproachTimeSpaceBest For
Manual loopO(n)O(n)Clear logic and beginner-friendly
itertools.groupbyO(n)O(n)Concise and Pythonic
Regular expressionsO(n)O(n)When comfortable with regex
💡
Use run-length encoding to compress strings by counting repeated characters in a row.
⚠️
Forgetting to add the last character and its count after the loop ends.