0
0
Apache Sparkdata~10 mins

Schema definition and inference in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a CSV file with automatic schema inference.

Apache Spark
df = spark.read.option("header", "true").option("inferSchema", [1]).csv("data.csv")
Drag options to blanks, or click blank then click option'
A"true"
BTrue
CFalse
D"false"
Attempts:
3 left
💡 Hint
Common Mistakes
Using boolean True instead of string "true" for option value.
Setting inferSchema to "false" disables schema detection.
2fill in blank
medium

Complete the code to define a schema with a string field 'name' and integer field 'age'.

Apache Spark
from pyspark.sql.types import StructType, StructField, [1], IntegerType
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])
Drag options to blanks, or click blank then click option'
AFloatType
BStringType
CBooleanType
DDateType
Attempts:
3 left
💡 Hint
Common Mistakes
Using FloatType or BooleanType instead of StringType for 'name'.
Forgetting to import StringType.
3fill in blank
hard

Fix the error in the code to apply the schema when reading a JSON file.

Apache Spark
df = spark.read.schema([1]).json("data.json")
Drag options to blanks, or click blank then click option'
Aschema
BSchema
Cstruct
DStructType
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the class name StructType instead of the schema variable.
Using incorrect capitalization like 'Schema'.
4fill in blank
hard

Fill both blanks to create a schema with a non-nullable integer field 'id' and a nullable string field 'email'.

Apache Spark
schema = StructType([
    StructField("id", [1](), [2]),
    StructField("email", StringType(), True)
])
Drag options to blanks, or click blank then click option'
AIntegerType
BFalse
CTrue
DStringType
Attempts:
3 left
💡 Hint
Common Mistakes
Setting nullable to True for 'id' when it should be False.
Using StringType instead of IntegerType for 'id'.
5fill in blank
hard

Fill all three blanks to create a DataFrame with a schema and show its schema.

Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("TestApp").getOrCreate()
schema = StructType([
    StructField("name", [1](), True),
    StructField("age", [2](), True)
])
data = [("Alice", 30), ("Bob", 25)]
df = spark.createDataFrame(data, schema=[3])
df.printSchema()
Drag options to blanks, or click blank then click option'
AStringType
BIntegerType
Cschema
DStructType
Attempts:
3 left
💡 Hint
Common Mistakes
Passing StructType class instead of schema variable.
Mixing up data types for fields.