0
0
Apache Sparkdata~5 mins

Reading JSON and nested data in Apache Spark - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the main function to read a JSON file in Apache Spark?
You use spark.read.json(path) to load a JSON file into a DataFrame.
Click to reveal answer
beginner
How do you access nested fields in a JSON column in Spark?
You use dot notation like df.select("nestedField.subField") to access nested data.
Click to reveal answer
intermediate
What Spark function helps to flatten nested JSON structures?
You can use explode() to turn nested arrays into separate rows, helping flatten the data.
Click to reveal answer
beginner
How can you infer the schema automatically when reading JSON in Spark?
By default, spark.read.json() infers the schema from the JSON data automatically.
Click to reveal answer
intermediate
Why is it useful to understand the schema of nested JSON data before processing?
Knowing the schema helps you select, filter, and transform nested fields correctly without errors.
Click to reveal answer
Which Spark method reads a JSON file into a DataFrame?
Aspark.read.json()
Bspark.load.csv()
Cspark.read.text()
Dspark.load.json()
How do you select a nested field named 'address.city' from a DataFrame?
Adf.select('address_city')
Bdf.select('address-city')
Cdf.select('city')
Ddf.select('address.city')
What does the explode() function do in Spark?
AJoins two DataFrames
BFlattens nested arrays into multiple rows
CFilters rows based on condition
DAggregates data by key
If a JSON file has nested objects, how does Spark handle the schema by default?
AInfers nested schema automatically
BReads all data as strings
CFails to read nested data
DRequires manual schema definition
Why is schema knowledge important when working with nested JSON?
ATo convert JSON to CSV
BTo speed up file reading
CTo correctly select and transform nested fields
DTo delete nested data
Explain how to read a nested JSON file in Apache Spark and access a nested field.
Think about how JSON structure maps to DataFrame columns.
You got /3 concepts.
    Describe the purpose and use of the explode() function when working with nested JSON data in Spark.
    Imagine turning a list inside a cell into multiple rows.
    You got /3 concepts.