0
0
Data Analysis Pythondata~5 mins

Sample() for random rows in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Sample() for random rows
O(k)
Understanding Time Complexity

We want to understand how the time to pick random rows from data grows as the data gets bigger.

How does the sampling time change when the dataset size increases?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 1000  # example size
k = 3     # example sample size
data = pd.DataFrame({'A': range(n)})
sample_rows = data.sample(k)

This code creates a table with n rows and picks k random rows from it.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Selecting k random rows from n rows.
  • How many times: The operation depends on k, the number of rows sampled.
How Execution Grows With Input

Picking k rows from n rows takes time mostly based on k, not n.

Input Size (n)Approx. Operations
10About k operations (e.g., 3)
100About k operations (e.g., 3)
1000About k operations (e.g., 3)

Pattern observation: The time grows with k, the sample size, not with n, the total data size.

Final Time Complexity

Time Complexity: O(k)

This means the time to pick random rows grows with how many rows you want, not how big the whole data is.

Common Mistake

[X] Wrong: "Sampling random rows takes longer as the whole dataset gets bigger."

[OK] Correct: The sampling method usually picks only the needed rows, so time depends on sample size, not total data size.

Interview Connect

Knowing how sampling scales helps you explain efficient data handling in real projects, showing you understand practical data work.

Self-Check

"What if we change k to be a fraction of n (like 10% of n)? How would the time complexity change then?"