You have a web analytics dataset with columns user_id and date. Which code snippet correctly calculates the number of unique users for each day?
import pandas as pd data = pd.DataFrame({ 'user_id': [1, 2, 1, 3, 2, 4], 'date': ['2024-06-01', '2024-06-01', '2024-06-02', '2024-06-02', '2024-06-02', '2024-06-03'] })
Think about grouping by date and counting unique users, not total visits.
Grouping by date and applying nunique() on user_id counts unique users per day. Option A groups by user, not date. Option A is invalid syntax. Option A counts all visits, not unique users.
Given a DataFrame df with columns session_id and timestamp, what is the output of this code?
import pandas as pd df = pd.DataFrame({ 'session_id': [1, 1, 2, 2, 2], 'timestamp': pd.to_datetime(['2024-06-01 10:00', '2024-06-01 10:05', '2024-06-01 11:00', '2024-06-01 11:10', '2024-06-01 11:15']) }) session_duration = df.groupby('session_id')['timestamp'].agg(lambda x: (x.max() - x.min()).seconds) print(session_duration.to_dict())
Remember that .seconds returns total seconds as an integer.
Session 1 duration is 5 minutes = 300 seconds. Session 2 duration is 15 minutes = 900 seconds. Option B shows minutes, not seconds. Option B shows strings, not integers. Option B has wrong value for session 1.
You want to plot the number of page views per hour from a DataFrame df with a timestamp column. Which code produces a correct bar plot of hourly counts?
import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({ 'timestamp': pd.to_datetime([ '2024-06-01 10:15', '2024-06-01 10:45', '2024-06-01 11:00', '2024-06-01 11:30', '2024-06-01 12:00', '2024-06-01 12:15' ]) })
Use dt.hour to extract hour and count rows per hour.
Option D correctly extracts hour, counts timestamps per hour, and plots a bar chart. Option D misses counting column and may plot size series but less explicit. Option D uses wrong attribute .hour on Series. Option D plots line instead of bar.
Given a DataFrame df with columns session_id and page_views, this code tries to calculate bounce rate (percentage of sessions with only 1 page view). What error or problem will occur?
import pandas as pd df = pd.DataFrame({ 'session_id': [1, 2, 3, 4], 'page_views': [1, 3, 1, 2] }) bounce_rate = df[df['page_views'] == 1].count() / df.count() print(bounce_rate)
Check what count() returns on a DataFrame.
count() returns counts per column, so dividing two DataFrames returns a Series with ratios per column. This is not a single bounce rate value. No error occurs. Option C is incorrect because output is not a single float.
You analyze web sessions and find that the average session duration is very low, but the number of page views per session is high. What is the most likely explanation?
Think about what many page views but short duration means for user behavior.
High page views with low session duration suggests users move quickly through pages, possibly scanning or confused. Option A contradicts high page views. Option A is about data errors, not behavior. Option A contradicts the given data.