DataFrame vs Series in pandas: Key Differences and Usage
Series is a one-dimensional labeled array that can hold any data type, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types. Essentially, a DataFrame is like a table made of multiple Series sharing the same index.Quick Comparison
Here is a quick side-by-side comparison of DataFrame and Series in pandas.
| Feature | Series | DataFrame |
|---|---|---|
| Dimensions | 1-dimensional | 2-dimensional |
| Data Structure | Single column with index | Multiple columns with index |
| Data Types | Holds one data type | Columns can have different data types |
| Shape | (length,) | (rows, columns) |
| Use Case | Single list of data with labels | Table of data with rows and columns |
| Indexing | Single index | Row and column indexes |
Key Differences
A Series is like a single column of data with an index that labels each element. It can hold any data type such as numbers, strings, or dates. Because it is one-dimensional, it behaves like a list or array but with labels for each item.
A DataFrame is a collection of Series that share the same index. It is two-dimensional, meaning it has rows and columns, similar to a spreadsheet or SQL table. Each column in a DataFrame is a Series and can have its own data type, allowing mixed types in one structure.
In terms of operations, you can perform element-wise operations on a Series, while a DataFrame supports more complex operations like selecting columns, filtering rows, and aggregating data across columns.
Code Comparison
Creating and working with a Series to hold a list of numbers with labels:
import pandas as pd # Create a Series s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd']) # Access element by label value_b = s['b'] # Print Series and accessed value print(s) print(f"Value at label 'b': {value_b}")
DataFrame Equivalent
Creating and working with a DataFrame holding multiple columns of data with row labels:
import pandas as pd # Create a DataFrame df = pd.DataFrame({ 'A': [10, 20, 30, 40], 'B': [100, 200, 300, 400] }, index=['a', 'b', 'c', 'd']) # Access a column (which is a Series) col_A = df['A'] # Access a single value by row and column label value_b_A = df.at['b', 'A'] # Print DataFrame, column, and accessed value print(df) print(col_A) print(f"Value at row 'b' and column 'A': {value_b_A}")
When to Use Which
Choose a Series when you need to work with a single list of data with labels, such as a column of numbers or strings. It is simpler and faster for one-dimensional data.
Choose a DataFrame when you need to handle tabular data with multiple columns and rows, especially when columns have different data types. It is ideal for datasets like spreadsheets or SQL tables.
In summary, use Series for single labeled arrays and DataFrame for full tables.