0
0
Data Analysis Pythondata~20 mins

Sample() for random rows in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Sample() Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this sample() code?
Given the DataFrame df below, what will df.sample(n=3, random_state=1) return?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'A': [10, 20, 30, 40, 50], 'B': ['a', 'b', 'c', 'd', 'e']})
sample_df = df.sample(n=3, random_state=1)
print(sample_df)
A
   A  B
2  30  c
0  10  a
1  20  b
B
   A  B
1  20  b
4  50  e
2  30  c
C
   A  B
4  50  e
0  10  a
3  40  d
D
   A  B
3  40  d
1  20  b
0  10  a
Attempts:
2 left
💡 Hint
Remember that setting random_state fixes the random selection order.
data_output
intermediate
1:00remaining
How many rows are returned by sample() with frac=0.4?
If a DataFrame has 10 rows, what is the number of rows returned by df.sample(frac=0.4, random_state=5)?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'X': range(10)})
sample_df = df.sample(frac=0.4, random_state=5)
print(len(sample_df))
A6
B4
C3
D5
Attempts:
2 left
💡 Hint
frac means fraction of total rows.
🔧 Debug
advanced
1:30remaining
What error does this sample() code raise?
What error will this code produce?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
sample_df = df.sample(n=5)
AKeyError: 'n'
BTypeError: sample() got an unexpected keyword argument 'n'
CNo error, returns 5 rows with NaNs
DValueError: Cannot take a larger sample than population when 'replace=False'
Attempts:
2 left
💡 Hint
Check if sample size is larger than DataFrame size without replacement.
🚀 Application
advanced
1:30remaining
Which code produces a random sample with replacement?
You want to randomly select 4 rows from a DataFrame of 3 rows, allowing repeats. Which code does this?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})
Adf.sample(frac=1.33, random_state=0)
Bdf.sample(n=4, replace=False, random_state=0)
Cdf.sample(n=4, replace=True, random_state=0)
Ddf.sample(n=4)
Attempts:
2 left
💡 Hint
Sampling with replacement allows repeats and can exceed original size.
🧠 Conceptual
expert
1:00remaining
What is the effect of setting random_state in sample()?
Why do we set the random_state parameter in df.sample()?
ATo ensure the sample is the same every time the code runs
BTo increase the sample size automatically
CTo speed up the sampling process
DTo sort the sampled rows by their index
Attempts:
2 left
💡 Hint
Think about reproducibility in random operations.