0
0
Apache Sparkdata~20 mins

Inner, left, right, and full outer joins in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Spark Join Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of Inner Join in Spark
What is the output of the following Spark code performing an inner join?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
data2 = [(2, "Sales"), (3, "HR"), (4, "IT")]
df1 = spark.createDataFrame(data1, ["id", "name"])
df2 = spark.createDataFrame(data2, ["id", "dept"])
joined_df = df1.join(df2, on="id", how="inner")
result = joined_df.collect()
A[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
B[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
C[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR'), Row(id=4, name=None, dept='IT')]
D[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR'), Row(id=4, name=None, dept='IT')]
Attempts:
2 left
💡 Hint
Inner join returns only rows with matching keys in both tables.
Predict Output
intermediate
2:00remaining
Output of Left Outer Join in Spark
What is the output of this Spark code performing a left outer join?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
data2 = [(2, "Sales"), (3, "HR"), (4, "IT")]
df1 = spark.createDataFrame(data1, ["id", "name"])
df2 = spark.createDataFrame(data2, ["id", "dept"])
joined_df = df1.join(df2, on="id", how="left")
result = joined_df.collect()
A[Row(id=4, name=None, dept='IT')]
B[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
C[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
D[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR'), Row(id=4, name=None, dept='IT')]
Attempts:
2 left
💡 Hint
Left join keeps all rows from the left table, adding matching data from the right.
Predict Output
advanced
2:00remaining
Output of Right Outer Join in Spark
What is the output of this Spark code performing a right outer join?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
data2 = [(2, "Sales"), (3, "HR"), (4, "IT")]
df1 = spark.createDataFrame(data1, ["id", "name"])
df2 = spark.createDataFrame(data2, ["id", "dept"])
joined_df = df1.join(df2, on="id", how="right")
result = joined_df.collect()
A[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
B[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
C[Row(id=1, name='Alice', dept=None), Row(id=4, name=None, dept='IT')]
D[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR'), Row(id=4, name=None, dept='IT')]
Attempts:
2 left
💡 Hint
Right join keeps all rows from the right table, adding matching data from the left.
Predict Output
advanced
2:00remaining
Output of Full Outer Join in Spark
What is the output of this Spark code performing a full outer join?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [(1, "Alice"), (2, "Bob"), (3, "Charlie")]
data2 = [(2, "Sales"), (3, "HR"), (4, "IT")]
df1 = spark.createDataFrame(data1, ["id", "name"])
df2 = spark.createDataFrame(data2, ["id", "dept"])
joined_df = df1.join(df2, on="id", how="outer")
result = joined_df.collect()
A[Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR')]
B[Row(id=1, name='Alice', dept=None), Row(id=2, name='Bob', dept='Sales'), Row(id=3, name='Charlie', dept='HR'), Row(id=4, name=None, dept='IT')]
C[Row(id=1, name='Alice', dept=None), Row(id=4, name=None, dept='IT')]
D[Row(id=1, name='Alice', dept='Sales'), Row(id=2, name='Bob', dept='HR'), Row(id=3, name='Charlie', dept='IT')]
Attempts:
2 left
💡 Hint
Full outer join keeps all rows from both tables, filling missing values with None.
🧠 Conceptual
expert
1:30remaining
Understanding Join Types in Spark
Which join type would you use in Spark to keep all rows from the left dataframe and only matching rows from the right dataframe, filling unmatched right columns with nulls?
ALeft outer join
BInner join
CRight outer join
DFull outer join
Attempts:
2 left
💡 Hint
Think about which join keeps all left rows regardless of matches.