Overview - info() for column types and nulls

What is it?

The info() function in pandas is a quick way to see a summary of a DataFrame. It shows the number of rows, columns, data types of each column, and how many non-null values each column has. This helps you understand the structure and completeness of your data at a glance.

Why it matters

Without info(), you might waste time guessing what your data looks like or miss important details like missing values or wrong data types. This function helps you spot problems early, so you can clean and prepare your data correctly before analysis. It saves time and prevents errors in your work.

Where it fits

Before using info(), you should know how to load data into pandas DataFrames. After learning info(), you can move on to handling missing data, converting data types, and exploring data with other pandas functions.

Mental Model

Core Idea

info() is like a quick health check that tells you the shape, type, and completeness of your data columns.

Think of it like...

Imagine info() as a doctor’s quick checkup report for your dataset, showing which parts are healthy (complete) and which parts need attention (missing or wrong types).

┌───────────────────────────────┐
│ DataFrame info() summary      │
├─────────────┬───────────────┤
│ Column Name │ Data Type     │
├─────────────┼───────────────┤
│ col1        │ int64         │
│ col2        │ float64       │
│ col3        │ object (text) │
├─────────────┴───────────────┤
│ Non-null counts per column    │
│ Total rows: 1000             │
│ col1: 1000 non-null          │
│ col2: 950 non-null           │
│ col3: 1000 non-null          │
└───────────────────────────────┘

Build-Up - 6 Steps

1

FoundationWhat info() Shows by Default

Concept: Learn the basic output of info() including row count, column count, data types, and non-null counts.

When you call df.info() on a DataFrame, it prints the number of rows and columns, lists each column with its data type, and shows how many non-null values each column has. This helps you quickly see if any columns have missing data and what type of data each column holds.

Result

A summary printout showing total rows, columns, each column's data type, and non-null counts.

Understanding the default info() output is the first step to quickly assessing your dataset’s structure and completeness.

2

FoundationData Types and Null Counts Explained

3

IntermediateUsing info() with Memory Usage Details

4

Intermediateinfo() with Verbose and Null Counts Options

5

AdvancedInterpreting info() for Mixed Data Types

6

Expertinfo() Internals and Performance Considerations

Under the Hood

info() accesses the DataFrame's internal metadata, including the index size, column data types, and counts of non-null values stored in pandas' optimized data structures. It does not iterate over all data values but uses this metadata to quickly summarize the DataFrame. When memory_usage='deep' is requested, it performs a deeper scan of object columns to estimate memory usage more accurately.

Why designed this way?

info() was designed to provide a fast, lightweight summary without the cost of scanning all data. This allows users to quickly check data health even on large datasets. Alternatives that scan all data would be too slow and impractical for big data workflows.

┌───────────────────────────────┐
│ pandas DataFrame object        │
├───────────────┬───────────────┤
│ Metadata      │ Data Storage  │
│ - index size  │ - column data │
│ - dtypes      │ - values      │
│ - non-null counts             │
├───────────────┴───────────────┤
│ info() reads metadata only    │
│ If memory_usage='deep'        │
│   scans object columns deeply │
└───────────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does info() show the exact number of missing values by default? Commit to yes or no.

Common Belief:info() shows the exact count of missing values for each column by default.

Tap to reveal reality

Quick: Do you think info() inspects every data value to determine data types? Commit to yes or no.

Common Belief:info() scans all data values to determine the data type of each column.

Tap to reveal reality

Quick: Does info() always show all columns regardless of DataFrame size? Commit to yes or no.

Common Belief:info() always displays every column in the DataFrame no matter how many there are.

Tap to reveal reality

Quick: Can info() detect mixed data types within a single column? Commit to yes or no.

Common Belief:info() clearly shows if a column has mixed data types by listing all types.

Tap to reveal reality

Expert Zone

1

info() relies on pandas' internal metadata which can become outdated if you manipulate data with certain operations, so sometimes info() might not reflect the latest state until you refresh or reload data.

2

Memory usage estimation with memory_usage='deep' can be expensive on large object columns, so use it selectively when you need detailed memory profiling.

3

info() output format and parameters have evolved across pandas versions, so knowing your pandas version helps interpret info() results correctly.

When NOT to use

info() is not suitable when you need detailed statistics like exact missing value counts, unique values, or distribution summaries. Use df.describe(), df.isnull().sum(), or df.value_counts() for those tasks instead.

Production Patterns

In real-world data pipelines, info() is often used in automated data validation scripts to quickly check data integrity before processing. It is combined with logging to alert teams about missing data or unexpected data types early in the workflow.

Connections

Data Cleaning

info() output guides data cleaning by revealing missing values and data types.

Knowing how info() highlights nulls and types helps you decide which columns need cleaning or type conversion.

Database Schema Inspection

Both info() and database schema tools summarize data structure and types.

Understanding info() helps you grasp how databases describe tables, aiding smoother data integration.

System Health Monitoring

info() acts like a health check for data, similar to how system monitors check server status.

Seeing info() as a health check helps prioritize fixing data issues like missing values, just as sysadmins fix server alerts.

Common Pitfalls

#1Assuming info() shows missing values directly.

Wrong approach:df.info() # User reads non-null counts as missing counts

Correct approach:df.info() missing_counts = len(df) - df.count() print(missing_counts)

Root cause:Confusing non-null counts with missing counts leads to wrong assumptions about data completeness.

#2Expecting info() to show all columns when DataFrame is wide.

Wrong approach:df.info() # User misses columns because output is truncated

Correct approach:df.info(verbose=True) # Shows all columns regardless of number

Root cause:Not knowing about verbose option causes missing important column info.

#3Using info() to check detailed data quality like unique values or distributions.

Wrong approach:df.info() # User expects detailed stats from info()

Correct approach:df.describe() df.value_counts() # Use these for detailed statistics

Root cause:Misunderstanding info() as a full data profiling tool rather than a summary.

Key Takeaways

info() provides a fast summary of a DataFrame’s shape, data types, and non-null counts to quickly assess data health.

It uses stored metadata, not scanning all data, which makes it efficient even for large datasets.

info() shows non-null counts, so you must subtract from total rows to find missing values.

Customizing info() with parameters like verbose and memory_usage helps tailor the summary to your needs.

Understanding info() output guides data cleaning, type conversion, and memory optimization in real-world data science.