Challenge - 5 Problems

🎖️

Memory Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Memory usage of pandas DataFrame columns

What is the output of the following code showing memory usage of each column in a pandas DataFrame?

Data Analysis Python

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A': np.arange(1000),
    'B': np.random.rand(1000),
    'C': ['text']*1000
})
mem_usage = df.memory_usage(deep=True)
print(mem_usage)

Index    128
A       1000
B       8000
C      40000
dtype: int64

Index    128
A       8000
B       8000
C      32000
dtype: int64

Index    128
A       8000
B       8000
C      61000
dtype: int64

Index    128
A       8000
B       1000
C      40000
dtype: int64

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Memory usage reduction by changing data types

Given a DataFrame with 1 million integers stored as int64, what is the memory usage after converting the column to int8?

Data Analysis Python

import pandas as pd
import numpy as np
df = pd.DataFrame({'numbers': np.arange(1000000)})
mem_before = df.memory_usage(deep=True).numbers

df['numbers'] = df['numbers'].astype('int8')
mem_after = df.memory_usage(deep=True).numbers
print(mem_before, mem_after)

A8000000 8000000

B8000000 1000000

C1000000 8000000

D1000000 1000000

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the cause of high memory usage in a DataFrame

Why does this DataFrame use more memory than expected?

import pandas as pd
df = pd.DataFrame({
    'col': ['apple', 'banana', 'apple', 'banana'] * 250000
})
print(df.memory_usage(deep=True))

ABecause the DataFrame uses int64 for strings which is inefficient.

BBecause the DataFrame index is very large and uses most memory.

CBecause the DataFrame has many missing values increasing memory.

DBecause the strings are stored as object dtype without using categorical type, so each string is stored separately.

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Visualizing memory usage of DataFrame columns

Which code snippet correctly creates a bar plot showing memory usage of each column in a DataFrame df?

import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
mem.plot(kind='bar')
plt.show()

import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
plt.plot(mem)
plt.show()

import matplotlib.pyplot as plt
mem = df.memory_usage()
plt.bar(mem.index, mem.values)
plt.show()

import matplotlib.pyplot as plt
mem = df.memory_usage(deep=True)
plt.scatter(mem.index, mem.values)
plt.show()

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Understanding memory usage impact of categorical data

Which statement best explains why converting a string column with many repeated values to categorical dtype reduces memory usage?

ACategorical dtype stores the unique strings once and replaces column values with integer codes, reducing memory.

BCategorical dtype compresses strings using zip compression internally, saving memory.

CCategorical dtype converts strings to fixed-length byte arrays, which always use less memory.

DCategorical dtype stores strings as pointers to the original strings, increasing memory usage.

Attempts:

2 left