Recall & Review
beginner
What is the purpose of using SQL queries on DataFrames in Apache Spark?
SQL queries on DataFrames allow you to use familiar SQL syntax to analyze and manipulate data stored in Spark DataFrames, making it easier to perform complex data operations without writing complex code.
Click to reveal answer
beginner
How do you register a DataFrame as a temporary view in Spark to run SQL queries on it?
You use the
createOrReplaceTempView method on the DataFrame with a name, for example: df.createOrReplaceTempView("my_table"). This lets you run SQL queries on 'my_table'.Click to reveal answer
beginner
What Spark method do you use to run an SQL query on a registered DataFrame view?
You use
spark.sql('your SQL query'). This returns a new DataFrame with the query results.Click to reveal answer
beginner
Can you explain how to select specific columns from a DataFrame using SQL queries in Spark?
After registering the DataFrame as a temporary view, you write a SQL query like
SELECT column1, column2 FROM view_name. This returns only those columns in the result DataFrame.Click to reveal answer
intermediate
What is the difference between
createOrReplaceTempView and createGlobalTempView in Spark?createOrReplaceTempView creates a temporary view visible only in the current Spark session. createGlobalTempView creates a global temporary view accessible across different sessions but requires prefixing the view name with global_temp. when querying.Click to reveal answer
Which method registers a DataFrame as a temporary view in Spark?
✗ Incorrect
The correct method to register a DataFrame as a temporary view is createOrReplaceTempView.
How do you run an SQL query on a registered DataFrame view in Spark?
✗ Incorrect
You use spark.sql('SQL query') to run SQL queries on registered views.
What does the SQL query
SELECT * FROM my_table do on a DataFrame view?✗ Incorrect
SELECT * returns all columns and rows from the specified view.
Which prefix is required to query a global temporary view in Spark?
✗ Incorrect
Global temporary views require the prefix global_temp. when querying.
What type of object does spark.sql() return after running a query?
✗ Incorrect
spark.sql() returns a DataFrame containing the query results.
Explain how to run an SQL query on a Spark DataFrame step-by-step.
Think about how SQL queries need a table name and how Spark lets you create that from a DataFrame.
You got /3 concepts.
Describe the difference between temporary and global temporary views in Spark.
Consider visibility and scope of views in Spark sessions.
You got /3 concepts.