beginner

What does the group_by() function do in R's dplyr package?

group_by() splits the data into groups based on one or more variables. It prepares the data so that you can perform operations on each group separately.

Click to reveal answer

beginner

What is the purpose of summarise() in combination with group_by()?

summarise() creates a summary statistic for each group created by group_by(). For example, it can calculate the average or total for each group.

Click to reveal answer

beginner

How do you calculate the average of a column score for each group of team in a dataframe df?

df %>%
  group_by(team) %>%
  summarise(avg_score = mean(score))

This groups the data by team and calculates the average score for each team.

Click to reveal answer

beginner

What happens if you use summarise() without group_by()?

summarise() will calculate the summary statistic for the entire dataset, not by groups.

Click to reveal answer

intermediate

Can you use multiple summary functions inside summarise() after grouping?

Yes, you can calculate many summaries at once. For example:

df %>%
  group_by(team) %>%
  summarise(avg_score = mean(score), max_score = max(score))

Click to reveal answer

What does group_by() do before using summarise()?

ASplits data into groups based on variables

BDeletes rows from the data

CSorts the data alphabetically

DChanges data types of columns

What will summarise() do if used without group_by()?

AThrow an error

BCalculate summary for each row

CCalculate summary for the whole dataset

DCreate new groups automatically

Which of these is a valid way to calculate the total sales per region using dplyr?

Adf %>% filter(region) %>% summarise(total_sales = sum(sales))

Bdf %>% summarise(total_sales = sum(sales)) %>% group_by(region)

Cdf %>% group_by(sales) %>% summarise(total_region = sum(region))

Ddf %>% group_by(region) %>% summarise(total_sales = sum(sales))

Can you use multiple summary calculations inside one summarise() call?

AYes, you can calculate many summaries at once

BNo, only one summary is allowed

COnly if you use <code>group_by()</code> twice

DOnly with special packages

What does this code do?<br>

df %>% group_by(category) %>% summarise(count = n())

ACalculates mean of category

BCounts rows in each category group

CFilters rows with category

DCreates new categories

Explain how group_by() and summarise() work together to summarize data.

Describe a real-life example where you would use group_by() with summarise().