How to Use loc in pandas for Data Selection and Filtering
In pandas,
loc is used to select rows and columns by their labels. You can specify row labels, column labels, or both inside df.loc[row_label, column_label] to get or set data based on index names.Syntax
The basic syntax of loc is df.loc[row_indexer, column_indexer]. Here:
- row_indexer: label(s) of the row(s) you want to select.
- column_indexer: label(s) of the column(s) you want to select.
You can use single labels, lists of labels, slices, or boolean arrays for both row and column indexers.
python
df.loc[row_label, column_label]
Example
This example shows how to select specific rows and columns using loc. It demonstrates selecting a single row, multiple rows, and specific columns by their labels.
python
import pandas as pd # Create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['NY', 'LA', 'Chicago', 'Houston']} df = pd.DataFrame(data, index=['a', 'b', 'c', 'd']) # Select row with label 'b' row_b = df.loc['b'] # Select rows 'a' to 'c' and columns 'Name' and 'Age' subset = df.loc['a':'c', ['Name', 'Age']] print("Row with label 'b':") print(row_b) print("\nSubset of rows 'a' to 'c' and columns 'Name' and 'Age':") print(subset)
Output
Row with label 'b':
Name Bob
Age 30
City LA
Name: b, dtype: object
Subset of rows 'a' to 'c' and columns 'Name' and 'Age':
Name Age
a Alice 25
b Bob 30
c Charlie 35
Common Pitfalls
One common mistake is confusing loc with iloc. loc uses labels, while iloc uses integer positions. Another pitfall is using labels that do not exist, which causes a KeyError. Also, slicing with loc includes the end label, unlike Python's usual slicing.
python
import pandas as pd data = {'Value': [10, 20, 30]} df = pd.DataFrame(data, index=['x', 'y', 'z']) # Wrong: using integer position with loc (raises KeyError) try: print(df.loc[1]) except KeyError as e: print(f"KeyError: {e}") # Right: use label with loc print(df.loc['y']) # Slicing includes end label print(df.loc['x':'y'])
Output
KeyError: 1
Value 20
Name: y, dtype: int64
Value
x 10
y 20
Quick Reference
| Usage | Description | Example |
|---|---|---|
| Select single row by label | Returns a Series for the row | df.loc['row_label'] |
| Select multiple rows by labels | Returns DataFrame for rows | df.loc[['row1', 'row2']] |
| Select rows and columns | Returns subset DataFrame | df.loc['row1':'row3', ['col1', 'col2']] |
| Boolean indexing | Select rows where condition is True | df.loc[df['Age'] > 30] |
| Set values | Assign new values to subset | df.loc['a', 'Age'] = 26 |
Key Takeaways
Use df.loc[row_label, column_label] to select data by labels in pandas.
loc includes the end label when slicing rows or columns.
Always use labels with loc, not integer positions (use iloc for positions).
You can select single or multiple rows and columns with loc.
loc supports boolean conditions to filter rows easily.