0
0
Pandasdata~5 mins

String type (object, string) in Pandas

Choose your learning style9 modes available
Introduction

We use string types in pandas to store and work with text data clearly and efficiently.

When you have a column of names, like customer names or product names.
When you want to clean or analyze text data, such as comments or reviews.
When you need to convert numbers stored as text into real numbers.
When you want to filter or search for specific words in your data.
When you want to prepare text data for visualization or reporting.
Syntax
Pandas
df['column_name'] = df['column_name'].astype('string')
Use astype('string') to convert a column to pandas string type.
Pandas string type is better than plain Python objects for text because it supports special string methods.
Examples
This converts the 'name' column to pandas string type and prints the data types.
Pandas
import pandas as pd

df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})
df['name'] = df['name'].astype('string')
print(df.dtypes)
Converts 'age' column to string type and checks which values are numeric strings.
Pandas
df = pd.DataFrame({'age': ['25', '30', '35']})
df['age'] = df['age'].astype('string')
print(df['age'].str.isnumeric())
Handles missing values (None) and converts text to uppercase using string methods.
Pandas
df = pd.DataFrame({'text': ['hello', None, 'world']})
df['text'] = df['text'].astype('string')
print(df['text'].str.upper())
Sample Program

This program creates a DataFrame with city names and populations. It converts the 'city' column to pandas string type to safely use string methods. Then it checks which city names start with the letter 'T'.

Pandas
import pandas as pd

# Create a DataFrame with mixed types
data = {'city': ['New York', 'Paris', 'Tokyo', None], 'population': [8000000, 2140000, 13960000, 0]}
df = pd.DataFrame(data)

# Convert 'city' column to string type
df['city'] = df['city'].astype('string')

# Use string method to check which cities start with 'T'
starts_with_t = df['city'].str.startswith('T')

print(df.dtypes)
print(starts_with_t)
OutputSuccess
Important Notes

Pandas string type supports missing values as <NA>, which is better than Python's None for text data.

Using pandas string type allows you to use .str accessor for many useful text operations.

Summary

Use pandas string type to store and work with text data efficiently.

Convert columns using astype('string') to enable string methods.

String type handles missing values well and supports many text operations.