0
0
Apache Sparkdata~20 mins

Understanding partitions in Apache Spark - Practice Questions & Exercises

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Partition Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the number of partitions after this operation?
Given a Spark DataFrame with 4 partitions, what will be the number of partitions after applying df.repartition(6)?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
df = spark.createDataFrame(data, ['id', 'value'])
df = df.repartition(6)
num_partitions = df.rdd.getNumPartitions()
print(num_partitions)
A4
B6
C0
D1
Attempts:
2 left
💡 Hint
Repartition changes the number of partitions to the specified number.
🧠 Conceptual
intermediate
1:30remaining
Why is partitioning important in Spark?
Which of the following best explains why partitioning data is important in Apache Spark?
AIt helps distribute data across nodes to enable parallel processing.
BIt compresses data to save storage space.
CIt encrypts data for security purposes.
DIt converts data into a different file format.
Attempts:
2 left
💡 Hint
Think about how Spark processes data across a cluster.
data_output
advanced
2:00remaining
What is the output of this partition count code?
Consider this Spark code snippet. What will be printed as the number of partitions?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(i,) for i in range(10)]
df = spark.createDataFrame(data, ['num'])
df2 = df.coalesce(2)
print(df2.rdd.getNumPartitions())
A1
B10
C2
D5
Attempts:
2 left
💡 Hint
Coalesce reduces the number of partitions without full shuffle.
🔧 Debug
advanced
2:00remaining
Identify the error in this partitioning code
What error will this Spark code raise?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(1, 'x'), (2, 'y')]
df = spark.createDataFrame(data, ['id', 'val'])
df2 = df.repartition(-3)
print(df2.rdd.getNumPartitions())
AValueError: Number of partitions must be positive
BTypeError: repartition argument must be a string
CNo error, prints 3
DRuntimeError: repartition failed due to negative value
Attempts:
2 left
💡 Hint
Number of partitions cannot be negative.
🚀 Application
expert
2:30remaining
How many partitions after chained operations?
Given a DataFrame with 8 partitions, what is the number of partitions after these chained operations?
df2 = df.repartition(4).coalesce(2).repartition(5)
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(i,) for i in range(20)]
df = spark.createDataFrame(data, ['val']).repartition(8)
df2 = df.repartition(4).coalesce(2).repartition(5)
print(df2.rdd.getNumPartitions())
A2
B4
C8
D5
Attempts:
2 left
💡 Hint
Repartition sets partitions exactly; coalesce reduces partitions without shuffle.