Challenge - 5 Problems
Correlation Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this correlation calculation?
Given the following DataFrame, what is the output of
df.corr()?Data Analysis Python
import pandas as pd import numpy as np data = {'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1], 'C': [2, 3, 2, 3, 2]} df = pd.DataFrame(data) print(df.corr())
Attempts:
2 left
💡 Hint
Remember that correlation measures how two variables move together. Negative means opposite directions.
✗ Incorrect
Column A and B are perfectly negatively correlated (-1.0). Column C does not correlate with A or B (0.0). The diagonal is always 1.0 because a variable correlates perfectly with itself.
❓ data_output
intermediate2:00remaining
How many pairs have correlation above 0.5?
Using the DataFrame below, how many unique pairs of columns have a correlation greater than 0.5?
Data Analysis Python
import pandas as pd import numpy as np np.random.seed(0) data = {'X': np.random.rand(10), 'Y': np.random.rand(10), 'Z': np.random.rand(10)} df = pd.DataFrame(data) corr_matrix = df.corr() # Count pairs with correlation > 0.5 (excluding self-correlation) count = 0 for i in corr_matrix.columns: for j in corr_matrix.columns: if i < j and corr_matrix.loc[i,j] > 0.5: count += 1 print(count)
Attempts:
2 left
💡 Hint
Random numbers usually have low correlation unless seeded or constructed.
✗ Incorrect
The random columns X, Y, Z are independent. Their correlations are close to zero, so no pairs exceed 0.5.
🔧 Debug
advanced2:00remaining
What error does this code raise?
What error will this code raise when trying to compute correlation?
Data Analysis Python
import pandas as pd data = {'A': [1, 2, 3], 'B': ['x', 'y', 'z']} df = pd.DataFrame(data) print(df.corr())
Attempts:
2 left
💡 Hint
Correlation only works on numeric columns; non-numeric columns are ignored.
✗ Incorrect
Pandas automatically excludes non-numeric columns when computing correlation. So no error occurs, but columns with strings are ignored. The result is a correlation matrix for numeric columns only.
🚀 Application
advanced2:00remaining
Which column pair has the strongest positive correlation?
Given the DataFrame below, which pair of columns has the strongest positive correlation?
Data Analysis Python
import pandas as pd data = {'P': [1, 2, 3, 4, 5], 'Q': [2, 4, 6, 8, 10], 'R': [5, 4, 3, 2, 1], 'S': [1, 3, 1, 7, 9]} df = pd.DataFrame(data) corr = df.corr() print(corr)
Attempts:
2 left
💡 Hint
Look for pairs where one column is a perfect multiple of the other.
✗ Incorrect
Column Q is exactly 2 times column P, so their correlation is 1.0 (perfect positive). Other pairs have lower or negative correlation.
🧠 Conceptual
expert2:00remaining
What does a correlation value of 0 indicate?
In the context of the
corr() method, what does a correlation value of 0 between two variables mean?Attempts:
2 left
💡 Hint
Correlation measures linear relationships only.
✗ Incorrect
A correlation of 0 means no linear relationship exists between the variables. However, they might still have a non-linear relationship or be dependent in other ways.