Challenge - 5 Problems

🎖️

Column Expressions Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of chained column expressions

What is the output of the following Apache Spark code snippet?

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
data = [(1, 10), (2, 20), (3, 30)]
df = spark.createDataFrame(data, ["id", "value"])

result = df.select((col("value") + 5).alias("value_plus_5"))
result.show()

+------------+
|value_plus_5|
+------------+
|          15|
|          25|
|          35|
+------------+

+-----+-----+
|   id|value|
+-----+-----+
|    1|   10|
|    2|   20|
|    3|   30|
+-----+-----+

+------------+
|value_plus_5|
+------------+
|           5|
|          15|
|          25|
+------------+

DSyntaxError: invalid syntax

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Result of filtering with column functions

Given the DataFrame below, what is the output after applying the filter?

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["name", "age"])

filtered_df = df.filter(col("age") > 28)
filtered_df.show()

+-------+---+
|   name|age|
+-------+---+
|    Bob| 30|
|Charlie| 35|
+-------+---+

+-------+---+
|   name|age|
+-------+---+
|  Alice| 25|
|    Bob| 30|
+-------+---+

+-------+---+
|   name|age|
+-------+---+
|  Alice| 25|
+-------+---+

DTypeError: '>' not supported between instances of 'Column' and 'int'

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing aggregated data with groupBy and functions

You have a DataFrame with sales data. Which option correctly shows the output of this aggregation?

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import sum

spark = SparkSession.builder.getOrCreate()
data = [("A", 100), ("B", 200), ("A", 300), ("B", 400)]
df = spark.createDataFrame(data, ["category", "sales"])

agg_df = df.groupBy("category").agg(sum("sales").alias("total_sales"))
agg_df.orderBy("category").show()

+--------+-----------+
|category|total_sales|
+--------+-----------+
|       A|        300|
|       B|        400|
+--------+-----------+

+--------+-----------+
|category|total_sales|
+--------+-----------+
|       A|        400|
|       B|        600|
+--------+-----------+

+--------+-----------+
|category|total_sales|
+--------+-----------+
|       A|        100|
|       B|        200|
+--------+-----------+

DAttributeError: 'DataFrame' object has no attribute 'groupby'

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in column expression

What error does the following code produce?

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

spark = SparkSession.builder.getOrCreate()
data = [(1, 2), (3, 4)]
df = spark.createDataFrame(data, ["a", "b"])

result = df.select(col("a") + col("c"))
result.show()

ATypeError: unsupported operand type(s) for +: 'Column' and 'Column'

BNameError: name 'col' is not defined

CAnalysisException: cannot resolve '`c`' given input columns: [a, b]

DNo error, outputs sum of columns 'a' and 'c'

Attempts:

2 left

🚀 Application

expert

3:00remaining

Calculate new column with conditional logic

Which option correctly creates a new column 'status' with value 'adult' if age >= 18, else 'minor'?

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import when, col

spark = SparkSession.builder.getOrCreate()
data = [("John", 17), ("Jane", 20)]
df = spark.createDataFrame(data, ["name", "age"])

result = df.withColumn("status", ???)
result.select("name", "age", "status").show()

Acase when col("age") >= 18 then "adult" else "minor" end

Bcol("age") >= 18 ? "adult" : "minor"

Cif(col("age") >= 18, "adult", "minor")

Dwhen(col("age") >= 18, "adult").otherwise("minor")

Attempts:

2 left