0
0
Apache-sparkHow-ToBeginner ยท 3 min read

How to Show DataFrame in PySpark: Simple Guide

In PySpark, you can display the contents of a DataFrame using the show() method. This method prints the rows of the DataFrame in a readable table format in the console. You can also control how many rows to show by passing a number to show(n).
๐Ÿ“

Syntax

The basic syntax to display a PySpark DataFrame is using the show() method. You can call it on any DataFrame object.

  • df.show(): Displays the first 20 rows by default.
  • df.show(n): Displays the first n rows.
  • df.show(n, truncate=False): Shows full column content without truncation.
python
df.show()
df.show(10)
df.show(5, truncate=False)
๐Ÿ’ป

Example

This example creates a simple PySpark DataFrame and uses show() to display its contents. It shows how to print the default 20 rows and how to show all content without truncation.

python
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('ShowDataFrameExample').getOrCreate()

# Create sample data
data = [(1, 'Alice', 29), (2, 'Bob', 31), (3, 'Cathy', 25), (4, 'David', 35)]
columns = ['id', 'name', 'age']

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Show default 20 rows (only 4 rows here)
df.show()

# Show all rows without truncation
df.show(truncate=False)

spark.stop()
Output
+---+-----+---+ | id| name|age| +---+-----+---+ | 1|Alice| 29| | 2| Bob| 31| | 3|Cathy| 25| | 4|David| 35| +---+-----+---+ +---+-----+---+ |id |name |age| +---+-----+---+ |1 |Alice|29 | |2 |Bob |31 | |3 |Cathy|25 | |4 |David|35 | +---+-----+---+
โš ๏ธ

Common Pitfalls

Some common mistakes when showing DataFrames in PySpark include:

  • Expecting show() to return data instead of printing it. It prints to console and returns None.
  • Not specifying truncate=False when you want to see full column values, leading to truncated output.
  • Trying to use print(df) which only shows the DataFrame type, not its content.
python
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('PitfallExample').getOrCreate()
data = [(1, 'LongNameThatGetsCutOff')]
columns = ['id', 'name']
df = spark.createDataFrame(data, columns)

# Wrong: prints DataFrame object info, not data
print(df)

# Wrong: truncated output

df.show()

# Right: full output without truncation
df.show(truncate=False)

spark.stop()
Output
<pyspark.sql.dataframe.DataFrame object at 0x7f8c8c0e0> +---+-----------------------+ | id| name| +---+-----------------------+ | 1|LongNameThatGetsCutOff...| +---+-----------------------+ +---+-----------------------+ | id| name| +---+-----------------------+ | 1|LongNameThatGetsCutOff | +---+-----------------------+
๐Ÿ“Š

Quick Reference

Here is a quick summary of the show() method options:

Method CallDescription
df.show()Show first 20 rows with truncated columns
df.show(n)Show first n rows with truncated columns
df.show(n, truncate=False)Show first n rows with full column content
df.show(truncate=False)Show first 20 rows with full column content
โœ…

Key Takeaways

Use df.show() to print the first 20 rows of a PySpark DataFrame.
Pass a number to show(n) to display a specific number of rows.
Use truncate=False to see full column values without cutting off text.
print(df) does not show data; always use df.show() to view contents.
show() prints output to console and returns None; it does not return data.