0
0
NumPydata~7 mins

Structured arrays vs DataFrames in NumPy

Choose your learning style9 modes available
Introduction

Structured arrays and DataFrames help organize data with different types in one place. They make it easy to work with complex data like tables.

You have data with multiple columns of different types, like names and ages.
You want to do fast numerical operations on data with fixed types.
You need to store and access data like a table but want to use numpy functions.
You want to use pandas features like easy filtering and grouping.
You want to convert between numpy arrays and pandas DataFrames.
Syntax
NumPy
import numpy as np
import pandas as pd

# Structured array creation
structured_array = np.array([(1, 'Alice', 25), (2, 'Bob', 30)],
                            dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])

# DataFrame creation
data_frame = pd.DataFrame({'id': [1, 2], 'name': ['Alice', 'Bob'], 'age': [25, 30]})

Structured arrays use numpy's dtype to define column names and types.

DataFrames are from pandas and offer more features for data analysis.

Examples
This shows an empty structured array with defined columns but no rows.
NumPy
import numpy as np

# Empty structured array
empty_structured = np.array([], dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])
print(empty_structured)
Structured array with a single row of data.
NumPy
import numpy as np

# Structured array with one element
one_element = np.array([(1, 'Alice', 25)], dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])
print(one_element)
DataFrame with one row, easy to read and manipulate.
NumPy
import pandas as pd

# DataFrame with one row
one_row_df = pd.DataFrame({'id': [1], 'name': ['Alice'], 'age': [25]})
print(one_row_df)
Empty DataFrame with column names but no data rows.
NumPy
import pandas as pd

# DataFrame with empty data
empty_df = pd.DataFrame(columns=['id', 'name', 'age'])
print(empty_df)
Sample Program

This program shows how to create a structured array, access its data, convert it to a DataFrame, and filter rows in the DataFrame.

NumPy
import numpy as np
import pandas as pd

# Create a structured array with 3 rows
structured_array = np.array([
    (1, 'Alice', 25),
    (2, 'Bob', 30),
    (3, 'Charlie', 35)
], dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'i4')])

print('Structured Array:')
print(structured_array)
print()

# Access the 'name' column from structured array
print('Names from structured array:')
print(structured_array['name'])
print()

# Convert structured array to pandas DataFrame
data_frame = pd.DataFrame(structured_array)
print('Converted DataFrame:')
print(data_frame)
print()

# Filter DataFrame for age > 28
filtered_df = data_frame[data_frame['age'] > 28]
print('Filtered DataFrame (age > 28):')
print(filtered_df)
OutputSuccess
Important Notes

Structured arrays are fast and use less memory but have limited features compared to DataFrames.

DataFrames provide many tools for data cleaning, filtering, and analysis but use more memory.

Common mistake: Trying to use DataFrame methods directly on structured arrays will cause errors.

Use structured arrays when you need speed and fixed types; use DataFrames for flexible data analysis.

Summary

Structured arrays store data with named columns and fixed types using numpy.

DataFrames are more powerful tables from pandas with many analysis features.

You can convert between structured arrays and DataFrames to use the best of both.