0
0
Apache Sparkdata~30 mins

What is Apache Spark - Hands-On Activity

Choose your learning style9 modes available
What is Apache Spark
📖 Scenario: Imagine you have a huge collection of photos on your computer. You want to find all photos taken in summer quickly. Doing this one by one is slow. Apache Spark helps by working on many photos at once, making the search fast.
🎯 Goal: Learn what Apache Spark is and how it helps process big data quickly by working on many pieces at the same time.
📋 What You'll Learn
Understand the basic idea of Apache Spark
Know why Spark is faster than normal programs for big data
See a simple example of Spark code to count words
💡 Why This Matters
🌍 Real World
Companies use Apache Spark to analyze huge amounts of data quickly, like logs, social media, or sales data.
💼 Career
Knowing Spark helps you work with big data in roles like data engineer, data scientist, or analyst.
Progress0 / 4 steps
1
Create a list of sentences
Create a list called sentences with these exact strings: 'Apache Spark is fast', 'Spark processes big data', 'big data needs fast tools'.
Apache Spark
Need a hint?

Use square brackets [] to create a list and put each sentence in quotes separated by commas.

2
Set up Spark session
Write code to create a Spark session called spark using SparkSession.builder.appName('WordCount').getOrCreate().
Apache Spark
Need a hint?

Import SparkSession from pyspark.sql first, then create spark session as shown.

3
Create RDD and count words
Use spark.sparkContext.parallelize(sentences) to create an RDD called rdd. Then use flatMap with lambda to split sentences into words. Use map to create pairs (word, 1). Use reduceByKey to count words and save result in word_counts.
Apache Spark
Need a hint?

Use flatMap to split each sentence into words, then map to pairs, then reduceByKey to sum counts.

4
Show the word counts
Use print(word_counts.collect()) to display the list of word counts.
Apache Spark
Need a hint?

Use collect() to get all results from the cluster and print them.