Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to read a CSV file with automatic schema inference.
Apache Spark
df = spark.read.option("header", "true").option("inferSchema", [1]).csv("data.csv")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using boolean True instead of string "true" for option value.
Setting inferSchema to "false" disables schema detection.
✗ Incorrect
Setting option 'inferSchema' to 'true' tells Spark to automatically detect the data types of columns.
2fill in blank
mediumComplete the code to define a schema with a string field 'name' and integer field 'age'.
Apache Spark
from pyspark.sql.types import StructType, StructField, [1], IntegerType schema = StructType([ StructField("name", StringType(), True), StructField("age", IntegerType(), True) ])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using FloatType or BooleanType instead of StringType for 'name'.
Forgetting to import StringType.
✗ Incorrect
StringType is used to define a string field in the schema.
3fill in blank
hardFix the error in the code to apply the schema when reading a JSON file.
Apache Spark
df = spark.read.schema([1]).json("data.json")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the class name StructType instead of the schema variable.
Using incorrect capitalization like 'Schema'.
✗ Incorrect
The variable 'schema' holds the schema object and should be passed to the schema() method.
4fill in blank
hardFill both blanks to create a schema with a non-nullable integer field 'id' and a nullable string field 'email'.
Apache Spark
schema = StructType([
StructField("id", [1](), [2]),
StructField("email", StringType(), True)
]) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Setting nullable to True for 'id' when it should be False.
Using StringType instead of IntegerType for 'id'.
✗ Incorrect
The 'id' field is an IntegerType and is not nullable, so nullable is set to False.
5fill in blank
hardFill all three blanks to create a DataFrame with a schema and show its schema.
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("TestApp").getOrCreate() schema = StructType([ StructField("name", [1](), True), StructField("age", [2](), True) ]) data = [("Alice", 30), ("Bob", 25)] df = spark.createDataFrame(data, schema=[3]) df.printSchema()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing StructType class instead of schema variable.
Mixing up data types for fields.
✗ Incorrect
Use StringType for 'name', IntegerType for 'age', and pass the schema variable to createDataFrame.