Sometimes data has repeated rows that we don't need. Removing duplicates helps us clean data and get accurate results.
Removing duplicates (drop_duplicates) in Data Analysis Python
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
subset lets you choose columns to check for duplicates. If None, all columns are checked.
keep decides which duplicate to keep: 'first' keeps the first, 'last' keeps the last, and False drops all duplicates.
df.drop_duplicates()
df.drop_duplicates(subset=['Name', 'Age'])
df.drop_duplicates(keep='last')df.drop_duplicates(inplace=True)This code creates a small table with repeated rows. Then it removes duplicates and shows the cleaned table.
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Alice', 'David', 'Bob'], 'Age': [25, 30, 25, 40, 30], 'City': ['NY', 'LA', 'NY', 'Chicago', 'LA']} df = pd.DataFrame(data) print('Original DataFrame:') print(df) # Remove duplicate rows unique_df = df.drop_duplicates() print('\nDataFrame after removing duplicates:') print(unique_df)
By default, drop_duplicates() returns a new DataFrame and does not change the original.
Use inplace=True if you want to modify the original DataFrame directly.
Checking duplicates on specific columns helps when only some columns matter for uniqueness.
Use drop_duplicates() to remove repeated rows from data.
You can choose which columns to check and which duplicates to keep.
Removing duplicates helps make data clean and ready for analysis.