0
0
NumPydata~5 mins

String type in NumPy

Choose your learning style9 modes available
Introduction

We use string type in NumPy to store and work with text data efficiently in arrays.

When you want to store a list of names or words in a NumPy array.
When you need to perform fast operations on many text entries together.
When you want to save memory by using fixed-length strings instead of Python objects.
When you want to combine text data with numerical data in arrays for analysis.
Syntax
NumPy
numpy.array(['text1', 'text2'], dtype='S')
numpy.array(['text1', 'text2'], dtype='U')

'S' means fixed-length byte strings (ASCII or bytes).

'U' means fixed-length Unicode strings (supports all characters).

Examples
This creates a byte string array with 3 animals and prints the array and its type.
NumPy
import numpy as np
arr = np.array(['cat', 'dog', 'bird'], dtype='S')
print(arr)
print(arr.dtype)
This creates a Unicode string array that can store any characters, not just ASCII.
NumPy
import numpy as np
arr = np.array(['cat', 'dog', 'bird'], dtype='U')
print(arr)
print(arr.dtype)
If you don't specify dtype, NumPy chooses Unicode string type automatically for text.
NumPy
import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr)
print(arr.dtype)
Sample Program

This program shows how to create byte and Unicode string arrays in NumPy, print their types, and find the length of each string.

NumPy
import numpy as np

# Create a byte string array
byte_arr = np.array(['red', 'green', 'blue'], dtype='S')
print('Byte string array:', byte_arr)
print('Data type:', byte_arr.dtype)

# Create a Unicode string array
unicode_arr = np.array(['red', 'green', 'blue'], dtype='U')
print('Unicode string array:', unicode_arr)
print('Data type:', unicode_arr.dtype)

# Show length of each string element
lengths = np.vectorize(len)(unicode_arr)
print('Lengths of each string:', lengths)
OutputSuccess
Important Notes

Byte strings (dtype='S') store text as bytes and are limited to ASCII or byte data.

Unicode strings (dtype='U') store text as Unicode and support all characters like emojis or accents.

NumPy string types have fixed length, so longer strings get truncated if they exceed the set length.

Summary

NumPy supports two main string types: byte strings ('S') and Unicode strings ('U').

Use 'S' for ASCII or byte data, and 'U' for full Unicode text.

String arrays are fixed-length, so be careful with string sizes to avoid truncation.