0
0
PythonHow-ToBeginner · 3 min read

How to Use itertools.groupby in Python: Simple Guide

Use itertools.groupby to group consecutive items in an iterable by a key function. It returns keys and groups as iterators, so the input should be sorted by the same key for correct grouping.
📐

Syntax

The itertools.groupby function groups consecutive elements in an iterable based on a key function.

It returns an iterator of pairs: each pair has a key and a group iterator of items matching that key.

Basic syntax:

  • groupby(iterable, key=None)

iterable: The data to group.

key: A function to compute the grouping key for each element. Defaults to identity (element itself).

python
import itertools

# groupby syntax
groups = itertools.groupby(iterable, key=None)

for key, group in groups:
    # key is the grouping key
    # group is an iterator of grouped items
    pass
💻

Example

This example groups a sorted list of fruits by their first letter.

It shows how groupby returns keys and groups of items.

python
import itertools

fruits = ['apple', 'apricot', 'banana', 'blueberry', 'cherry', 'clementine']

# Sort by first letter to group correctly
fruits.sort(key=lambda x: x[0])

groups = itertools.groupby(fruits, key=lambda x: x[0])

for letter, group in groups:
    print(f"{letter}: {[item for item in group]}")
Output
a: ['apple', 'apricot'] b: ['banana', 'blueberry'] c: ['cherry', 'clementine']
⚠️

Common Pitfalls

1. Input must be sorted by the same key function. Otherwise, groupby only groups consecutive matching items, not all matching items.

2. The group is an iterator, so you must consume it before moving to the next group. If you try to reuse it later, it will be empty.

3. Using groupby on unsorted data leads to unexpected groups.

python
import itertools

data = ['apple', 'banana', 'apricot', 'blueberry']

# Wrong: data not sorted by first letter
groups = itertools.groupby(data, key=lambda x: x[0])
for key, group in groups:
    print(f"{key}: {[item for item in group]}")

# Correct: sort data first
print('\nAfter sorting:')
data.sort(key=lambda x: x[0])
groups = itertools.groupby(data, key=lambda x: x[0])
for key, group in groups:
    print(f"{key}: {[item for item in group]}")
Output
a: ['apple'] b: ['banana'] a: ['apricot'] b: ['blueberry'] After sorting: a: ['apple', 'apricot'] b: ['banana', 'blueberry']
📊

Quick Reference

  • Input: Iterable sorted by the key function.
  • Output: Iterator of (key, group) pairs.
  • Group: An iterator of items with the same key.
  • Key function: Defaults to identity if not provided.
  • Use case: Group consecutive items sharing a property.

Key Takeaways

Always sort your data by the key function before using itertools.groupby.
groupby groups only consecutive items with the same key, not all items in the iterable.
The groups returned are iterators and should be consumed immediately.
Use a key function to define how items are grouped; default groups by item itself.
itertools.groupby is great for grouping sorted data efficiently without extra memory.