0
0
Pandasdata~30 mins

Data validation checks in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Data Validation Checks with pandas
📖 Scenario: You work as a data analyst for a small retail company. You receive daily sales data in tables. Before using the data for reports, you need to check if the data is clean and valid.For example, sales numbers should not be negative, and product names should not be empty.
🎯 Goal: Build a simple data validation check using pandas to find invalid sales records.You will create a DataFrame, set a validation rule, filter invalid rows, and print them.
📋 What You'll Learn
Create a pandas DataFrame with given sales data
Create a validation threshold variable for minimum valid sales
Use a filter to find rows where sales are below the threshold
Print the invalid rows
💡 Why This Matters
🌍 Real World
Data validation is essential in real-world data analysis to ensure reports and decisions are based on clean and accurate data.
💼 Career
Data analysts and scientists often perform validation checks to catch errors early and maintain data quality.
Progress0 / 4 steps
1
Create the sales data DataFrame
Import pandas as pd and create a DataFrame called sales_data with these exact entries:
{'Product': ['Apples', 'Bananas', 'Cherries', 'Dates'], 'Sales': [50, -10, 30, 0]}
Pandas
Need a hint?

Use pd.DataFrame() with a dictionary containing the product names and sales numbers.

2
Set the minimum valid sales threshold
Create a variable called min_valid_sales and set it to 1 to represent the minimum valid sales number.
Pandas
Need a hint?

This variable will help us check if sales are valid (1 or more).

3
Filter invalid sales records
Create a new DataFrame called invalid_sales by filtering sales_data to include only rows where the Sales value is less than min_valid_sales.
Pandas
Need a hint?

Use boolean indexing with sales_data['Sales'] < min_valid_sales to filter rows.

4
Print the invalid sales records
Print the invalid_sales DataFrame to display the rows with invalid sales.
Pandas
Need a hint?

Use print(invalid_sales) to show the filtered rows.