0
0
PostgreSQLquery~5 mins

TABLESAMPLE for random sampling in PostgreSQL

Choose your learning style9 modes available
Introduction

TABLESAMPLE helps you get a small random part of a big table quickly. It saves time when you don't need all data.

You want to test queries on a small random set instead of the whole table.
You need a quick estimate of data without scanning everything.
You want to create a sample dataset for training a model.
You want to check data quality by looking at random rows.
You want to speed up reports by using a sample instead of full data.
Syntax
PostgreSQL
SELECT * FROM table_name TABLESAMPLE method (percentage);

method is usually SYSTEM or BERNOULLI.

percentage is how much of the table you want, like 10 for 10%.

Examples
Gets about 10% of rows using SYSTEM method, which is faster but less random.
PostgreSQL
SELECT * FROM employees TABLESAMPLE SYSTEM (10);
Gets about 5% of rows using BERNOULLI method, which is more random but slower.
PostgreSQL
SELECT * FROM sales TABLESAMPLE BERNOULLI (5);
Gets about 1% of customers with only id and name columns.
PostgreSQL
SELECT id, name FROM customers TABLESAMPLE SYSTEM (1);
Sample Program

This creates a small table of fruits, inserts 8 rows, then selects about 50% of them randomly using SYSTEM method.

PostgreSQL
CREATE TABLE fruits (id SERIAL PRIMARY KEY, name TEXT);
INSERT INTO fruits (name) VALUES ('Apple'), ('Banana'), ('Cherry'), ('Date'), ('Elderberry'), ('Fig'), ('Grape'), ('Honeydew');

SELECT * FROM fruits TABLESAMPLE SYSTEM (50);
OutputSuccess
Important Notes

TABLESAMPLE does not guarantee exact percentage, just approximate.

SYSTEM method samples pages, so rows in same page come together.

BERNOULLI method checks each row independently but is slower.

Summary

Use TABLESAMPLE to quickly get a random part of a table.

SYSTEM is faster but less random; BERNOULLI is slower but more random.

Good for testing, sampling, and speeding up queries on big tables.