Apache Sparkdata~10 mins

Databricks platform overview in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Databricks platform overview

User logs into Databricks

↓

Create or open Workspace

↓

Create Notebook or Job

↓

Write Spark code

↓

Submit code to Cluster

↓

Cluster runs Spark jobs

↓

Results returned to Notebook

↓

Visualize or export results

↓

Manage data and resources

↓

Collaborate with team

↓

End

This flow shows how a user interacts with Databricks: logging in, creating notebooks, running Spark code on clusters, getting results, and collaborating.

Execution Sample

Apache Spark

# Sample Spark code in Databricks notebook
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')]
df = spark.createDataFrame(data, ['id', 'name'])
df.show()

This code creates a Spark DataFrame with sample data and shows it in the notebook.

Execution Table

Step	Action	Evaluation	Result
1	Import SparkSession	from pyspark.sql import SparkSession	SparkSession class available
2	Create SparkSession	spark = SparkSession.builder.getOrCreate()	SparkSession object created
3	Prepare data list	data = [(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')]	List of tuples created
4	Create DataFrame	df = spark.createDataFrame(data, ['id', 'name'])	DataFrame with 3 rows and 2 columns created
5	Show DataFrame	df.show()	Table displayed: id \| name 1 \| Alice 2 \| Bob 3 \| Cathy
6	End of code execution	No more code	DataFrame displayed in notebook

💡 Code execution ends after displaying the DataFrame in the notebook.

Variable Tracker

Variable	Start	After Step 3	After Step 4	Final
spark	None	SparkSession object created	SparkSession object created	SparkSession object created
data	None	[(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')]	[(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')]	[(1, 'Alice'), (2, 'Bob'), (3, 'Cathy')]
df	None	None	DataFrame with 3 rows and 2 columns	DataFrame with 3 rows and 2 columns

Key Moments - 3 Insights

Why do we need to create a SparkSession before running Spark code?

What does df.show() do in the notebook?

Why is the data variable created before the DataFrame?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the value of 'df' after step 4?

ANone

BA list of tuples

CA Spark DataFrame with 3 rows and 2 columns

DA SparkSession object

Concept Snapshot

Databricks lets you write and run Spark code in notebooks.
You start by creating a SparkSession.
Load or create data, then make DataFrames.
Run code on clusters and see results instantly.
Use notebooks to visualize and share your work.

Full Transcript

Databricks is a platform where users log in and create notebooks to write Spark code. The user starts by creating a SparkSession, which is needed to run Spark commands. Then, data is prepared as a list of tuples. This data is converted into a Spark DataFrame. The DataFrame is shown in the notebook output. The platform runs the code on clusters and returns results quickly. Users can visualize data and collaborate with others in the workspace.