Concept Flow - Bucketing for sampling
Start with large dataset
Define number of buckets
Apply hash function on key
Assign each record to a bucket
Select specific bucket(s) for sampling
Use sampled bucket data for analysis
End
Data is split into fixed buckets using a hash on a key. Sampling is done by selecting one or more buckets.