How to Show DataFrame in PySpark: Simple Guide
In PySpark, you can display the contents of a DataFrame using the
show() method. This method prints the rows of the DataFrame in a readable table format in the console. You can also control how many rows to show by passing a number to show(n).Syntax
The basic syntax to display a PySpark DataFrame is using the show() method. You can call it on any DataFrame object.
df.show(): Displays the first 20 rows by default.df.show(n): Displays the firstnrows.df.show(n, truncate=False): Shows full column content without truncation.
python
df.show() df.show(10) df.show(5, truncate=False)
Example
This example creates a simple PySpark DataFrame and uses show() to display its contents. It shows how to print the default 20 rows and how to show all content without truncation.
python
from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ShowDataFrameExample').getOrCreate() # Create sample data data = [(1, 'Alice', 29), (2, 'Bob', 31), (3, 'Cathy', 25), (4, 'David', 35)] columns = ['id', 'name', 'age'] # Create DataFrame df = spark.createDataFrame(data, columns) # Show default 20 rows (only 4 rows here) df.show() # Show all rows without truncation df.show(truncate=False) spark.stop()
Output
+---+-----+---+
| id| name|age|
+---+-----+---+
| 1|Alice| 29|
| 2| Bob| 31|
| 3|Cathy| 25|
| 4|David| 35|
+---+-----+---+
+---+-----+---+
|id |name |age|
+---+-----+---+
|1 |Alice|29 |
|2 |Bob |31 |
|3 |Cathy|25 |
|4 |David|35 |
+---+-----+---+
Common Pitfalls
Some common mistakes when showing DataFrames in PySpark include:
- Expecting
show()to return data instead of printing it. It prints to console and returnsNone. - Not specifying
truncate=Falsewhen you want to see full column values, leading to truncated output. - Trying to use
print(df)which only shows the DataFrame type, not its content.
python
from pyspark.sql import SparkSession spark = SparkSession.builder.appName('PitfallExample').getOrCreate() data = [(1, 'LongNameThatGetsCutOff')] columns = ['id', 'name'] df = spark.createDataFrame(data, columns) # Wrong: prints DataFrame object info, not data print(df) # Wrong: truncated output df.show() # Right: full output without truncation df.show(truncate=False) spark.stop()
Output
<pyspark.sql.dataframe.DataFrame object at 0x7f8c8c0e0>
+---+-----------------------+
| id| name|
+---+-----------------------+
| 1|LongNameThatGetsCutOff...|
+---+-----------------------+
+---+-----------------------+
| id| name|
+---+-----------------------+
| 1|LongNameThatGetsCutOff |
+---+-----------------------+
Quick Reference
Here is a quick summary of the show() method options:
| Method Call | Description |
|---|---|
| df.show() | Show first 20 rows with truncated columns |
| df.show(n) | Show first n rows with truncated columns |
| df.show(n, truncate=False) | Show first n rows with full column content |
| df.show(truncate=False) | Show first 20 rows with full column content |
Key Takeaways
Use df.show() to print the first 20 rows of a PySpark DataFrame.
Pass a number to show(n) to display a specific number of rows.
Use truncate=False to see full column values without cutting off text.
print(df) does not show data; always use df.show() to view contents.
show() prints output to console and returns None; it does not return data.