HDFS encryption at rest in Hadoop - Time & Space Complexity
We want to understand how the time to encrypt and decrypt data in HDFS grows as the amount of data increases.
How does the encryption process affect the speed when storing or reading large files?
Analyze the time complexity of the following Hadoop code snippet for HDFS encryption at rest.
// Simplified pseudocode for HDFS encryption at rest
Configuration conf = new Configuration();
conf.setBoolean("dfs.encrypt.data.transfer", true);
FileSystem fs = FileSystem.get(conf);
Path file = new Path("/encrypted/file.txt");
FSDataOutputStream out = fs.create(file);
byte[] data = new byte[blockSize];
for (int i = 0; i < totalBlocks; i++) {
encryptor.encrypt(data, out);
}
out.close();
This code writes data blocks to HDFS, encrypting each block before writing it to disk.
Look at what repeats as data size grows.
- Primary operation: Encrypting each data block before writing.
- How many times: Once per block, so totalBlocks times.
As the file size grows, the number of blocks grows proportionally, so encryption work grows too.
| Input Size (n blocks) | Approx. Operations |
|---|---|
| 10 | 10 encryptions |
| 100 | 100 encryptions |
| 1000 | 1000 encryptions |
Pattern observation: Doubling the data doubles the encryption work, so time grows linearly.
Time Complexity: O(n)
This means the time to encrypt data grows directly with the amount of data stored.
[X] Wrong: "Encryption time stays the same no matter how much data we store."
[OK] Correct: Each block must be encrypted separately, so more data means more encryption work and more time.
Understanding how encryption affects data processing time helps you explain trade-offs in secure data storage systems.
"What if we changed the block size to be larger? How would the time complexity change?"