How to Rename a Column in PySpark DataFrame
In PySpark, you can rename a column using the
withColumnRenamed method on a DataFrame. This method takes the current column name and the new column name as arguments and returns a new DataFrame with the renamed column.Syntax
The withColumnRenamed method is used to rename a column in a PySpark DataFrame.
df.withColumnRenamed(existingName, newName)existingName: The current name of the column you want to rename.newName: The new name you want to assign to the column.- This method returns a new DataFrame with the column renamed; it does not change the original DataFrame.
python
new_df = df.withColumnRenamed("old_column_name", "new_column_name")
Example
This example shows how to rename a column named age to years in a PySpark DataFrame.
python
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("RenameColumnExample").getOrCreate() # Create sample data data = [(1, "Alice", 25), (2, "Bob", 30), (3, "Cathy", 22)] columns = ["id", "name", "age"] df = spark.createDataFrame(data, columns) # Rename column 'age' to 'years' new_df = df.withColumnRenamed("age", "years") # Show original and new DataFrames print("Original DataFrame:") df.show() print("DataFrame after renaming column:") new_df.show()
Output
+---+-----+---+
| id| name|age|
+---+-----+---+
| 1|Alice| 25|
| 2| Bob| 30|
| 3|Cathy| 22|
+---+-----+---+
+---+-----+-----+
| id| name|years|
+---+-----+-----+
| 1|Alice| 25|
| 2| Bob| 30|
| 3|Cathy| 22|
+---+-----+-----+
Common Pitfalls
- Trying to rename a column that does not exist will not raise an error but will leave the DataFrame unchanged.
- Remember that
withColumnRenamedreturns a new DataFrame; it does not modify the original one. - Renaming multiple columns requires chaining
withColumnRenamedcalls or using other methods.
python
wrong_df = df.withColumnRenamed("non_existing_column", "new_name") # This does nothing # Correct way to rename multiple columns: renamed_df = df.withColumnRenamed("id", "user_id").withColumnRenamed("age", "years")
Quick Reference
| Method | Description | Example |
|---|---|---|
| withColumnRenamed | Rename one column | df.withColumnRenamed('old', 'new') |
| selectExpr | Rename multiple columns with SQL expressions | df.selectExpr('old AS new', 'col2') |
| toDF | Rename all columns by providing new names | df.toDF('new1', 'new2', 'new3') |
Key Takeaways
Use withColumnRenamed to rename a single column in a PySpark DataFrame.
withColumnRenamed returns a new DataFrame; original DataFrame stays unchanged.
Renaming multiple columns requires chaining withColumnRenamed or other methods.
Trying to rename a non-existing column does not cause an error but has no effect.
Use toDF or selectExpr for renaming multiple columns efficiently.