0
0
Apache Sparkdata~30 mins

Reading from Kafka with Spark in Apache Spark - Mini Project: Build & Apply

Choose your learning style9 modes available
Reading from Kafka with Spark
📖 Scenario: You work at a company that collects real-time data from various sensors. This data is sent to a Kafka topic. Your job is to read this data using Apache Spark and prepare it for analysis.
🎯 Goal: Build a Spark application that reads messages from a Kafka topic, extracts the message values, and displays them.
📋 What You'll Learn
Create a Spark session named spark
Read data from Kafka topic sensor-data on Kafka server localhost:9092
Select only the value column from the Kafka stream and cast it to string
Show the first 5 messages from the stream
💡 Why This Matters
🌍 Real World
Many companies use Kafka to collect real-time data streams from devices, logs, or user activity. Spark helps process this data quickly for monitoring or analytics.
💼 Career
Data engineers and data scientists often need to read streaming data from Kafka using Spark to build real-time data pipelines and dashboards.
Progress0 / 4 steps
1
Create Spark session
Create a Spark session called spark with the app name KafkaSparkApp.
Apache Spark
Need a hint?

Use SparkSession.builder.appName(...).getOrCreate() to create the Spark session.

2
Configure Kafka source
Create a DataFrame called kafka_df by reading from Kafka with spark.read.format("kafka"). Set the Kafka server to localhost:9092 and the topic to sensor-data. Use load() to load the data.
Apache Spark
Need a hint?

Use option to set Kafka server and topic before calling load().

3
Extract and cast message values
Create a new DataFrame called messages_df by selecting the value column from kafka_df and casting it to string using .cast("string").
Apache Spark
Need a hint?

Use selectExpr("CAST(value AS STRING) as value") to cast the Kafka message values.

4
Show messages
Use messages_df.show(5) to display the first 5 messages from the Kafka stream.
Apache Spark
Need a hint?

Use show(5) on messages_df to display the messages.