Distributed systems often have bugs that are hard to reproduce. What is the main reason for this difficulty?
Think about how many parts work together and how timing affects their behavior.
Distributed systems run many components on different machines. Network delays and timing differences cause bugs to appear only sometimes, making them hard to reproduce.
To test how a distributed system behaves during failures, which approach is most effective?
Think about a method that purposely causes problems to see how the system reacts.
Chaos engineering introduces failures intentionally to test system resilience, which is crucial for distributed systems.
When testing a distributed system with thousands of nodes, what is a major challenge?
Consider what happens when many machines and users interact simultaneously.
Testing at large scale needs many machines and realistic traffic, which is costly and complex to set up.
Adding detailed logs helps debug distributed systems but has a downside. What is it?
Think about how extra work for logging might impact system speed and resources.
While logs help find bugs, too much logging can slow the system and use a lot of disk space.
Each of 5 microservices can fail in 3 different ways. To test all single failures independently and all pairs of failures together, how many test cases are needed?
Calculate single failures as 5 microservices × 3 failures each. For pairs, count all unique pairs of failures.
Single failures: 5 × 3 = 15. Pairs: combinations of 15 failures taken 2 at a time = 15×14/2 = 105. Total = 15 + 105 = 120 tests.