0
0
Apache Sparkdata~30 mins

String functions in Spark in Apache Spark - Mini Project: Build & Apply

Choose your learning style9 modes available
String functions in Spark
📖 Scenario: You work in a company that collects customer feedback. The feedback is stored as text strings in a Spark DataFrame. You want to clean and analyze this text data using Spark's string functions.
🎯 Goal: Learn how to use basic string functions in Spark to manipulate and analyze text data in a DataFrame.
📋 What You'll Learn
Create a Spark DataFrame with customer feedback strings
Add a configuration variable for a keyword to search
Use Spark string functions to find feedback containing the keyword and convert text to uppercase
Display the filtered and transformed feedback
💡 Why This Matters
🌍 Real World
Cleaning and analyzing customer feedback text is common in many businesses to improve products and services.
💼 Career
Data scientists and analysts often use Spark string functions to preprocess and analyze large text datasets efficiently.
Progress0 / 4 steps
1
Create a Spark DataFrame with customer feedback
Create a Spark DataFrame called feedback_df with one column named feedback containing these exact strings: 'Great service', 'Poor quality', 'Excellent support', 'Average experience', 'Poor response time'.
Apache Spark
Need a hint?

Use spark.createDataFrame with a list of tuples and specify the column name as ['feedback'].

2
Add a keyword variable to search in feedback
Create a variable called keyword and set it to the string 'Poor'.
Apache Spark
Need a hint?

Just assign the string 'Poor' to the variable keyword.

3
Filter feedback containing the keyword and convert to uppercase
Use Spark string functions to create a new DataFrame called filtered_df that contains only rows where the feedback column contains the keyword. Also add a new column called feedback_upper that has the feedback text converted to uppercase. Use filter, contains, and upper functions from pyspark.sql.functions.
Apache Spark
Need a hint?

Use filter(col('feedback').contains(keyword)) to select rows, then withColumn('feedback_upper', upper(col('feedback'))) to add uppercase column.

4
Display the filtered and transformed feedback
Use show() on filtered_df to display the rows with the original and uppercase feedback.
Apache Spark
Need a hint?

Use filtered_df.show() to print the DataFrame with the new column.