Apache Sparkdata~10 mins

Reading from Kafka with Spark in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Reading from Kafka with Spark

Start Spark Session

↓

Set Kafka Configurations

↓

Create Spark DataFrame from Kafka

↓

Select and Cast Kafka Data

↓

Process or Show Data

↓

Stop Spark Session

This flow shows how Spark connects to Kafka, reads data, processes it, and then stops.

Execution Sample

Apache Spark

spark = SparkSession.builder.appName("KafkaExample").getOrCreate()

kafka_df = spark.read.format("kafka")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .option("subscribe", "test-topic")
  .load()

selected_df = kafka_df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
selected_df.show()

spark.stop()

This code connects Spark to Kafka, reads messages from 'test-topic', and shows key and value as strings.

Execution Table

Step	Action	Spark Variable	Result/Output
1	Start Spark Session	spark	SparkSession object created
2	Configure Kafka source	kafka_df	DataFrame configured to read from Kafka topic 'test-topic'
3	Load data from Kafka	kafka_df	DataFrame loaded with Kafka messages (key, value, topic, partition, offset, timestamp)
4	Select and cast key and value	selected_df	DataFrame with key and value as strings
5	Show data	Output	[Shows rows with key and value columns as strings]
6	Stop Spark Session	spark	SparkSession stopped
Exit	End of process		No more data to read or process

💡 Spark session stopped, no further Kafka data read

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	After Step 5	Final
spark	None	SparkSession created	SparkSession active	SparkSession active	SparkSession active	Stopped
kafka_df	None	Configured	Loaded with Kafka data	Loaded with Kafka data	Loaded with Kafka data	Released
selected_df	None	None	None	DataFrame with casted key/value	Same as previous	Released

Key Moments - 3 Insights

Why do we need to cast the key and value from Kafka data?

What happens if the Kafka topic does not exist or is unreachable?

Why do we stop the Spark session at the end?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the state of 'kafka_df' after step 3?

AConfigured but not loaded

BLoaded with Kafka data

CStopped

DNone

Concept Snapshot

Reading from Kafka with Spark:
- Start SparkSession
- Configure Kafka source with bootstrap servers and topic
- Load Kafka data as DataFrame
- Cast key and value from binary to string
- Process or show data
- Stop SparkSession to release resources

Full Transcript

This visual execution shows how to read data from Kafka using Apache Spark. First, we start a Spark session. Then, we configure the Kafka source by specifying the Kafka servers and the topic to subscribe to. Next, we load the data from Kafka into a Spark DataFrame. Since Kafka data is in binary format, we cast the key and value columns to strings to make them readable. After that, we can process or display the data. Finally, we stop the Spark session to free resources. The execution table traces each step and variable state, helping beginners understand the flow and transformations.