How to Find Top N Values in pandas DataFrame or Series
Use the
nlargest(n) method on a pandas Series or DataFrame column to find the top n values. For DataFrames, specify the column name with nlargest(n, 'column_name') to get rows with the highest values in that column.Syntax
The nlargest() method is used to get the top n values from a pandas Series or DataFrame.
Series.nlargest(n): Returns the topnvalues from a Series.DataFrame.nlargest(n, columns): Returns the topnrows ordered by the specified column.
Parameters:
n: Number of top values to return.columns: Column name to sort by (only for DataFrame).
python
series.nlargest(n)
dataframe.nlargest(n, 'column_name')Example
This example shows how to find the top 3 values in a pandas Series and the top 2 rows with the highest values in a DataFrame column.
python
import pandas as pd # Create a Series series = pd.Series([10, 50, 30, 20, 40]) # Find top 3 values in Series top3_series = series.nlargest(3) # Create a DataFrame data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'score': [85, 92, 88, 91]} df = pd.DataFrame(data) # Find top 2 rows by 'score' top2_df = df.nlargest(2, 'score') print('Top 3 values in Series:') print(top3_series) print('\nTop 2 rows in DataFrame by score:') print(top2_df)
Output
Top 3 values in Series:
1 50
4 40
2 30
dtype: int64
Top 2 rows in DataFrame by score:
name score
1 Bob 92
3 David 91
Common Pitfalls
Common mistakes when using nlargest() include:
- Not specifying the column name when using it on a DataFrame, which causes an error.
- Using
sort_values()instead ofnlargest()which is less efficient for large data. - Confusing
nlargest()withhead(), which just returns the first rows without sorting.
python
import pandas as pd data = {'name': ['Alice', 'Bob'], 'score': [85, 92]} df = pd.DataFrame(data) # Wrong: missing column name # df.nlargest(1) # This will raise TypeError # Right: top = df.nlargest(1, 'score') print(top)
Output
name score
1 Bob 92
Quick Reference
| Method | Description | Usage Example |
|---|---|---|
| Series.nlargest(n) | Get top n values from a Series | series.nlargest(3) |
| DataFrame.nlargest(n, columns) | Get top n rows by column value | df.nlargest(2, 'score') |
Key Takeaways
Use pandas' nlargest() method to efficiently find top n values in Series or DataFrame columns.
Always specify the column name when using nlargest() on a DataFrame.
nlargest() is faster and clearer than sorting and slicing for top values.
Avoid calling nlargest() on a DataFrame without a column name to prevent errors.