0
0
Pandasdata~15 mins

sort_values() by multiple columns in Pandas - Deep Dive

Choose your learning style9 modes available
Overview - sort_values() by multiple columns
What is it?
The sort_values() function in pandas lets you arrange rows in a table (DataFrame) based on the values in one or more columns. When sorting by multiple columns, it orders the data first by the first column, then breaks ties using the second column, and so on. This helps organize data clearly when one column alone isn't enough to decide order. It's like sorting a list of people first by last name, then by first name.
Why it matters
Without the ability to sort by multiple columns, data would often be disorganized or only partially ordered, making it hard to find patterns or compare entries. For example, if you want to see sales data sorted by region and then by date, sorting by just one column won't give you the full picture. This function helps you cleanly organize complex data, which is essential for analysis, reporting, and decision-making.
Where it fits
Before learning sort_values() by multiple columns, you should understand pandas DataFrames and how to select columns. After this, you can explore grouping data, filtering, and advanced sorting techniques like sorting with custom keys or sorting by index.
Mental Model
Core Idea
Sorting by multiple columns means ordering data first by one column, then using the next columns to break ties in order.
Think of it like...
Imagine sorting a stack of mail first by city, then by street, and finally by house number. You organize broadly first, then refine the order step by step.
DataFrame rows sorted by columns:

┌─────────────┬───────────┬───────────┐
│ Column A   │ Column B  │ Column C  │
├─────────────┼───────────┼───────────┤
│ A1         │ B1        │ C1        │
│ A1         │ B2        │ C2        │
│ A2         │ B1        │ C3        │
│ A2         │ B2        │ C4        │
└─────────────┴───────────┴───────────┘

Sorted first by Column A, then Column B.
Build-Up - 6 Steps
1
FoundationUnderstanding DataFrame Sorting Basics
🤔
Concept: Learn how to sort a DataFrame by a single column using sort_values().
In pandas, you can sort a DataFrame by one column using df.sort_values('column_name'). This rearranges rows so that values in that column go from smallest to largest by default. For example, sorting a list of students by their age.
Result
The DataFrame rows are reordered so the chosen column is sorted ascending.
Knowing how to sort by one column is the foundation for organizing data and is the first step before sorting by multiple columns.
2
FoundationSelecting Multiple Columns for Sorting
🤔
Concept: Learn how to specify more than one column to sort by passing a list to sort_values().
You can pass a list of column names to sort_values(), like df.sort_values(['col1', 'col2']). This tells pandas to sort first by col1, then by col2 to break ties. For example, sorting employees first by department, then by salary.
Result
Rows are ordered by the first column, then by the second within groups sharing the same first column value.
Sorting by multiple columns lets you organize data hierarchically, which is common in real-world datasets.
3
IntermediateControlling Sort Order per Column
🤔Before reading on: Do you think you can sort some columns ascending and others descending in one call? Commit to yes or no.
Concept: Learn to specify ascending or descending order separately for each column using the ascending parameter.
The ascending parameter accepts a list of booleans matching the columns list. For example, df.sort_values(['col1', 'col2'], ascending=[True, False]) sorts col1 ascending and col2 descending. This is useful when you want to sort by date ascending but by score descending.
Result
Data is sorted with mixed ascending/descending orders per column as specified.
Knowing you can mix sort directions per column gives precise control over data ordering.
4
IntermediateHandling Missing Values in Sorting
🤔Before reading on: Do you think missing values appear at the start or end by default? Commit to your answer.
Concept: Learn how pandas places missing values (NaNs) during sorting and how to control their position.
By default, missing values appear at the end when sorting ascending. You can change this with the na_position parameter: na_position='first' puts NaNs at the start. For example, df.sort_values('col', na_position='first').
Result
Missing values are placed as specified, affecting the final order.
Handling missing data placement prevents confusion and ensures meaningful sorted results.
5
AdvancedSorting with Custom Sort Keys
🤔Before reading on: Can you apply a function to transform column values before sorting? Commit to yes or no.
Concept: Learn to use the key parameter to apply a function to column values before sorting.
The key parameter accepts a function applied to the column(s) before sorting. For example, df.sort_values('col', key=lambda x: x.str.lower()) sorts strings ignoring case. This allows custom sorting logic without changing the original data.
Result
Data is sorted based on transformed values, enabling flexible ordering.
Using key functions unlocks powerful custom sorting beyond default behavior.
6
ExpertPerformance Considerations in Multi-Column Sorting
🤔Before reading on: Do you think sorting by many columns is always fast? Commit to yes or no.
Concept: Understand how sorting by multiple columns affects performance and memory, and how pandas optimizes this.
Sorting by multiple columns requires pandas to compare rows on each column in order. This can be slower with many columns or large data. Internally, pandas uses efficient algorithms but large datasets may need careful memory management or alternative approaches like sorting on indexes or using categorical data.
Result
Sorting remains correct but may be slower or use more memory with many columns or large data.
Knowing performance tradeoffs helps you write efficient data processing pipelines and avoid slowdowns.
Under the Hood
When you call sort_values() with multiple columns, pandas creates a sorting key by combining the columns in the order given. It compares rows first by the first column's values; if two rows tie, it compares the second column, and so on. Internally, pandas uses a stable sorting algorithm to maintain order where values are equal. It also handles missing values by placing them according to the na_position parameter. If ascending is a list, pandas applies the direction per column during comparisons.
Why designed this way?
This design matches how humans naturally sort complex data: by primary criteria, then secondary, etc. It balances flexibility and performance. Earlier pandas versions only supported single-column sorting or less flexible multi-column sorting. The current design allows clear, explicit control over order and missing values, making it easier to write readable and correct code.
┌─────────────────────────────┐
│ sort_values() called with    │
│ columns = [col1, col2, ...] │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Create sorting keys from     │
│ each column in order         │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Compare rows by col1 values  │
│ If tie, compare col2 values  │
│ Repeat for all columns       │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Apply ascending/descending   │
│ order per column             │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Place missing values per     │
│ na_position parameter        │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Return sorted DataFrame      │
└─────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does sort_values() modify the original DataFrame by default? Commit to yes or no.
Common Belief:Calling sort_values() changes the original DataFrame in place.
Tap to reveal reality
Reality:By default, sort_values() returns a new sorted DataFrame and does not modify the original unless you set inplace=True.
Why it matters:Assuming in-place modification can cause bugs where the original data remains unsorted, leading to incorrect analysis or results.
Quick: If you sort by multiple columns, does the order of columns in the list matter? Commit to yes or no.
Common Belief:The order of columns in the list passed to sort_values() does not affect the final sorting order.
Tap to reveal reality
Reality:The order absolutely matters; pandas sorts first by the first column, then uses the second column to break ties, and so on.
Why it matters:Ignoring column order can lead to unexpected sorting results and confusion when analyzing data.
Quick: When sorting with ascending=[True, False], does pandas apply ascending order to all columns? Commit to yes or no.
Common Belief:The ascending parameter applies the same order to all columns regardless of list input.
Tap to reveal reality
Reality:If ascending is a list, pandas applies each boolean to the corresponding column, allowing mixed ascending and descending sorts.
Why it matters:Misunderstanding this leads to incorrect data order, especially when mixed sorting is needed.
Quick: Are missing values always sorted at the end regardless of parameters? Commit to yes or no.
Common Belief:Missing values (NaNs) always appear at the end when sorting.
Tap to reveal reality
Reality:You can control missing value placement with na_position='first' or 'last'.
Why it matters:Not knowing this can cause misinterpretation of sorted data, especially when missing data is important.
Expert Zone
1
When sorting by multiple columns, pandas uses a stable sort algorithm, which preserves the order of equal elements from previous sorts. This allows chaining sorts by sorting on the last column first and the first column last as an alternative approach.
2
Using categorical data types for columns before sorting can greatly improve performance and memory usage, especially for large datasets with repeated values.
3
The key parameter can accept functions that transform multiple columns at once by passing a function that returns a DataFrame or array, enabling complex custom sorting logic.
When NOT to use
Avoid using sort_values() with many columns on very large datasets if performance is critical; instead, consider sorting on indexes or using database engines optimized for sorting. Also, if you need to sort by complex custom rules, sometimes pre-processing data or using specialized libraries is better.
Production Patterns
In production, sort_values() by multiple columns is often used in data pipelines to prepare data for reporting or machine learning. It is common to sort by timestamp and category to ensure consistent order. Also, sorting is combined with grouping and filtering to create clean, ordered subsets for analysis.
Connections
SQL ORDER BY clause
sort_values() by multiple columns is the pandas equivalent of SQL's ORDER BY with multiple columns.
Understanding SQL ORDER BY helps grasp how multi-column sorting works in pandas and vice versa, bridging database and Python data analysis.
Lexicographic ordering in mathematics
Multi-column sorting follows lexicographic order, comparing elements sequentially like words in a dictionary.
Knowing lexicographic order explains why sorting by multiple columns works by comparing columns in sequence.
Library book cataloging
Sorting books by author, then title mirrors sorting data by multiple columns.
This real-world system shows how hierarchical sorting organizes complex information efficiently.
Common Pitfalls
#1Assuming sort_values() changes the original DataFrame without inplace=True.
Wrong approach:df.sort_values(['col1', 'col2']) print(df) # Still unsorted
Correct approach:df_sorted = df.sort_values(['col1', 'col2']) print(df_sorted) # Sorted output
Root cause:Misunderstanding that sort_values() returns a new DataFrame by default.
#2Passing ascending as a single boolean when sorting multiple columns needing mixed order.
Wrong approach:df.sort_values(['col1', 'col2'], ascending=True) # Both ascending
Correct approach:df.sort_values(['col1', 'col2'], ascending=[True, False]) # Mixed order
Root cause:Not realizing ascending can be a list matching columns for different sort directions.
#3Ignoring missing value placement leading to confusing sorted results.
Wrong approach:df.sort_values('col') # NaNs always at end, unexpected
Correct approach:df.sort_values('col', na_position='first') # NaNs at start as needed
Root cause:Not knowing about na_position parameter to control NaN placement.
Key Takeaways
sort_values() lets you order pandas DataFrames by one or more columns to organize data clearly.
The order of columns and the ascending parameter list control how data is sorted hierarchically and directionally.
Missing values can be placed at the start or end using na_position, preventing confusion in sorted data.
Advanced features like the key parameter allow custom sorting logic without changing original data.
Understanding performance and internal mechanics helps write efficient, correct data processing code.