What is the output of this Python code that groups data and calculates the sum?
import pandas as pd data = {'Category': ['A', 'B', 'A', 'B', 'C'], 'Value': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) result = df.groupby('Category').sum() print(result)
Remember that groupby with sum() returns a DataFrame indexed by the grouping column.
The groupby('Category').sum() groups rows by 'Category' and sums the 'Value' column. The result is a DataFrame indexed by 'Category' with summed values.
After filtering a DataFrame with multiple conditions, how many unique 'Type' values remain?
import pandas as pd data = {'Type': ['X', 'Y', 'X', 'Z', 'Y', 'Z', 'X'], 'Score': [5, 10, 15, 20, 25, 30, 35]} df = pd.DataFrame(data) filtered = df[(df['Score'] > 10) & (df['Type'] != 'Z')] unique_count = filtered['Type'].nunique() print(unique_count)
Check which rows satisfy both conditions and count distinct 'Type' values.
Rows with 'Score' > 10 and 'Type' not 'Z' are: (X,15), (Y,25), (X,35). Unique 'Type' values are 'X' and 'Y', so count is 2.
Which plot correctly visualizes the relationship between three variables in the dataset?
import matplotlib.pyplot as plt import pandas as pd import seaborn as sns data = {'X': [1, 2, 3, 4, 5], 'Y': [5, 4, 3, 2, 1], 'Z': [2, 3, 4, 5, 6]} df = pd.DataFrame(data) plt.figure(figsize=(6,4)) sns.scatterplot(data=df, x='X', y='Y', hue='Z', palette='viridis') plt.show()
Look for a plot that shows two variables on axes and uses color for the third.
The scatter plot with color hue represents three variables: X and Y on axes, Z by color intensity.
What error does this code raise when merging two DataFrames with no common columns specified?
import pandas as pd df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]}) merged = pd.merge(df1, df2) print(merged)
Check what happens if you merge without specifying keys and no common columns exist.
Since df1 and df2 have no columns in common, merge() raises a ValueError about no common columns.
You have a large dataset with missing values scattered in multiple columns. Which method is best to handle missing data while preserving as much information as possible?
Think about methods that use patterns in data to estimate missing values rather than simple replacements.
KNN imputation uses similarity between rows to estimate missing values, preserving data structure better than simple mean or zero filling or dropping rows.