0
0
Data Analysis Pythondata~20 mins

Web analytics data pattern in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Web Analytics Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
data_output
intermediate
2:00remaining
Identify the number of unique users per day

You have a web analytics dataset with columns user_id and date. Which code snippet correctly calculates the number of unique users for each day?

Data Analysis Python
import pandas as pd

data = pd.DataFrame({
    'user_id': [1, 2, 1, 3, 2, 4],
    'date': ['2024-06-01', '2024-06-01', '2024-06-02', '2024-06-02', '2024-06-02', '2024-06-03']
})
Adata.groupby('date')['user_id'].nunique()
Bdata.groupby('user_id')['date'].nunique()
Cdata.groupby('date').count()['user_id']
Ddata['user_id'].unique().groupby(data['date'])
Attempts:
2 left
💡 Hint

Think about grouping by date and counting unique users, not total visits.

Predict Output
intermediate
2:00remaining
Output of session duration calculation

Given a DataFrame df with columns session_id and timestamp, what is the output of this code?

Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'session_id': [1, 1, 2, 2, 2],
    'timestamp': pd.to_datetime(['2024-06-01 10:00', '2024-06-01 10:05', '2024-06-01 11:00', '2024-06-01 11:10', '2024-06-01 11:15'])
})
session_duration = df.groupby('session_id')['timestamp'].agg(lambda x: (x.max() - x.min()).seconds)
print(session_duration.to_dict())
A{1: 5, 2: 15}
B{1: 300, 2: 900}
C{1: '00:05:00', 2: '00:15:00'}
D{1: 600, 2: 900}
Attempts:
2 left
💡 Hint

Remember that .seconds returns total seconds as an integer.

visualization
advanced
3:00remaining
Correct plot for hourly page views

You want to plot the number of page views per hour from a DataFrame df with a timestamp column. Which code produces a correct bar plot of hourly counts?

Data Analysis Python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-06-01 10:15', '2024-06-01 10:45', '2024-06-01 11:00',
        '2024-06-01 11:30', '2024-06-01 12:00', '2024-06-01 12:15'
    ])
})
A
df['hour'] = df['timestamp'].dt.hour
counts = df.groupby('hour').size()
counts.plot(kind='bar')
plt.show()
B
df['timestamp'].hour
counts = df.groupby('hour').count()
counts.plot(kind='bar')
plt.show()
C
counts = df.groupby(df['timestamp'].dt.hour).count()
counts.plot(kind='line')
plt.show()
D
df['hour'] = df['timestamp'].dt.hour
counts = df.groupby('hour')['timestamp'].count()
counts.plot(kind='bar')
plt.show()
Attempts:
2 left
💡 Hint

Use dt.hour to extract hour and count rows per hour.

🔧 Debug
advanced
2:00remaining
Error in calculating bounce rate

Given a DataFrame df with columns session_id and page_views, this code tries to calculate bounce rate (percentage of sessions with only 1 page view). What error or problem will occur?

Data Analysis Python
import pandas as pd

df = pd.DataFrame({
    'session_id': [1, 2, 3, 4],
    'page_views': [1, 3, 1, 2]
})
bounce_rate = df[df['page_views'] == 1].count() / df.count()
print(bounce_rate)
ARaises ZeroDivisionError due to division by zero
BRaises KeyError because 'page_views' is missing
CReturns a Series with ratio per column, not a single bounce rate value
DReturns a single float value representing bounce rate
Attempts:
2 left
💡 Hint

Check what count() returns on a DataFrame.

🧠 Conceptual
expert
3:00remaining
Interpreting user session patterns from data

You analyze web sessions and find that the average session duration is very low, but the number of page views per session is high. What is the most likely explanation?

AUsers quickly navigate many pages but spend little time on each, indicating possible confusion or fast scanning
BUsers have very few sessions but each session lasts a long time
CData collection has errors causing session durations to be overestimated
DUsers spend a long time on the first page and then leave, causing low average duration
Attempts:
2 left
💡 Hint

Think about what many page views but short duration means for user behavior.