0
0
Apache Sparkdata~5 mins

SQL queries on DataFrames in Apache Spark - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of using SQL queries on DataFrames in Apache Spark?
SQL queries on DataFrames allow you to use familiar SQL syntax to analyze and manipulate data stored in Spark DataFrames, making it easier to perform complex data operations without writing complex code.
Click to reveal answer
beginner
How do you register a DataFrame as a temporary view in Spark to run SQL queries on it?
You use the createOrReplaceTempView method on the DataFrame with a name, for example: df.createOrReplaceTempView("my_table"). This lets you run SQL queries on 'my_table'.
Click to reveal answer
beginner
What Spark method do you use to run an SQL query on a registered DataFrame view?
You use spark.sql('your SQL query'). This returns a new DataFrame with the query results.
Click to reveal answer
beginner
Can you explain how to select specific columns from a DataFrame using SQL queries in Spark?
After registering the DataFrame as a temporary view, you write a SQL query like SELECT column1, column2 FROM view_name. This returns only those columns in the result DataFrame.
Click to reveal answer
intermediate
What is the difference between createOrReplaceTempView and createGlobalTempView in Spark?
createOrReplaceTempView creates a temporary view visible only in the current Spark session. createGlobalTempView creates a global temporary view accessible across different sessions but requires prefixing the view name with global_temp. when querying.
Click to reveal answer
Which method registers a DataFrame as a temporary view in Spark?
AcreateOrReplaceTempView
BregisterDataFrame
CcreateTempDataFrame
DregisterTempSQL
How do you run an SQL query on a registered DataFrame view in Spark?
Adf.sql('SQL query')
Bspark.sql('SQL query')
Cspark.query('SQL query')
Ddf.querySQL('SQL query')
What does the SQL query SELECT * FROM my_table do on a DataFrame view?
ACreates a new table named my_table
BDeletes all data from the view
CUpdates all rows in the view
DSelects all columns and rows from the view
Which prefix is required to query a global temporary view in Spark?
Aglobal.
Btemp_global.
Cglobal_temp.
Dtemp.
What type of object does spark.sql() return after running a query?
ADataFrame
BRDD
CList
DDictionary
Explain how to run an SQL query on a Spark DataFrame step-by-step.
Think about how SQL queries need a table name and how Spark lets you create that from a DataFrame.
You got /3 concepts.
    Describe the difference between temporary and global temporary views in Spark.
    Consider visibility and scope of views in Spark sessions.
    You got /3 concepts.