beginner

What is clustering in machine learning?

Clustering is a way to group data points so that points in the same group are more similar to each other than to those in other groups. It helps find hidden patterns without knowing the labels beforehand.

Click to reveal answer

beginner

What does partitioning mean in the context of data and databases?

Partitioning means splitting a large dataset into smaller, manageable parts based on some rules. This helps speed up queries and organize data better.

Click to reveal answer

beginner

Name a common algorithm used for clustering.

K-means is a popular clustering algorithm. It divides data into K groups by assigning points to the nearest center and updating centers until stable.

Click to reveal answer

intermediate

How does clustering differ from classification?

Clustering groups data without labels (unsupervised), while classification assigns labels based on known categories (supervised). Clustering finds patterns, classification predicts labels.

Click to reveal answer

intermediate

Why is partitioning useful in big data systems like dbt?

Partitioning helps by breaking big tables into smaller parts, so queries only scan needed parts. This saves time and computing power, making data processing faster.

Click to reveal answer

What is the main goal of clustering?

AReduce the size of the dataset

BGroup similar data points together

CSplit data into training and test sets

DAssign labels to data points

Which of the following is a partitioning method in databases?

AGrouping data by similarity

BNormalizing data values

CPredicting labels for data

DSplitting data by date ranges

K-means clustering requires you to specify:

ANumber of clusters (K)

BDistance metric

CData labels

DTraining epochs

Which statement about clustering is TRUE?

AIt needs labeled data

BIt is supervised learning

CIt finds groups in data without labels

DIt predicts future values

In dbt, partitioning helps to:

AMake queries faster by scanning less data

BTrain machine learning models

CVisualize data clusters

DEncrypt sensitive data

Explain in your own words what clustering is and why it is useful.

Describe how partitioning can improve data processing in big data tools like dbt.