Data frame creation in R Programming - Time & Space Complexity
When we create a data frame in R, the time it takes depends on how much data we add. We want to understand how this time grows as we add more rows or columns.
How does the work needed change when the data frame gets bigger?
Analyze the time complexity of the following code snippet.
# Create a data frame with n rows
create_df <- function(n) {
data.frame(
id = 1:n,
value = rnorm(n)
)
}
# Example call
my_df <- create_df(1000)
This code creates a data frame with two columns: one with numbers from 1 to n, and one with n random numbers.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Generating n random numbers and creating vectors of length n.
- How many times: Each operation happens once for each of the n rows.
As n grows, the time to create the data frame grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 random numbers generated and 10 ids created |
| 100 | About 100 random numbers generated and 100 ids created |
| 1000 | About 1000 random numbers generated and 1000 ids created |
Pattern observation: The work grows steadily as the number of rows increases.
Time Complexity: O(n)
This means the time to create the data frame grows in a straight line with the number of rows.
[X] Wrong: "Creating a data frame takes the same time no matter how many rows it has."
[OK] Correct: More rows mean more data to generate and store, so it takes more time.
Understanding how data frame creation time grows helps you write efficient code when working with large datasets. This skill shows you can think about how your code behaves as data grows.
"What if we added more columns with complex calculations? How would the time complexity change?"