0
0
Apache Sparkdata~30 mins

SQL queries on DataFrames in Apache Spark - Mini Project: Build & Apply

Choose your learning style9 modes available
SQL Queries on DataFrames with Apache Spark
📖 Scenario: You work at a small bookstore. You have a list of books with their titles, authors, and prices. You want to use SQL queries on this data to find useful information.
🎯 Goal: Learn how to create a Spark DataFrame, register it as a SQL temporary view, run SQL queries on it, and display the results.
📋 What You'll Learn
Create a Spark DataFrame from a list of book data
Register the DataFrame as a temporary SQL view
Write and run SQL queries on the view
Display the query results
💡 Why This Matters
🌍 Real World
Bookstores and many businesses use SQL queries on data tables to find useful information quickly.
💼 Career
Data analysts and data scientists often use Spark SQL to analyze big data efficiently.
Progress0 / 4 steps
1
Create the initial DataFrame
Create a list called books_data with these exact tuples: ("The Alchemist", "Paulo Coelho", 10.99), ("1984", "George Orwell", 8.99), ("To Kill a Mockingbird", "Harper Lee", 7.99). Then create a Spark DataFrame called books_df from books_data with columns "title", "author", and "price".
Apache Spark
Need a hint?

Use spark.createDataFrame() with your list and column names.

2
Register the DataFrame as a SQL temporary view
Use the DataFrame books_df and register it as a temporary SQL view called books_view using the method createOrReplaceTempView.
Apache Spark
Need a hint?

Use books_df.createOrReplaceTempView("books_view") to register the view.

3
Write and run a SQL query to select books priced above 8
Write a SQL query string called query that selects all columns from books_view where the price is greater than 8. Then run this query using spark.sql(query) and save the result in a DataFrame called expensive_books_df.
Apache Spark
Need a hint?

Write the SQL query as a string and run it with spark.sql().

4
Display the result of the SQL query
Use the DataFrame expensive_books_df and call the show() method to display the books with price greater than 8.
Apache Spark
Need a hint?

Call expensive_books_df.show() to print the results.