0
0
PostgreSQLquery~30 mins

TABLESAMPLE for random sampling in PostgreSQL - Mini Project: Build & Apply

Choose your learning style9 modes available
Using TABLESAMPLE for Random Sampling in PostgreSQL
📖 Scenario: You work for a bookstore that wants to analyze a small random sample of its sales data to understand customer preferences without scanning the entire large sales table.
🎯 Goal: Build a SQL query using TABLESAMPLE SYSTEM to randomly select approximately 10% of the rows from the sales table.
📋 What You'll Learn
Create a sales table with columns sale_id, book_title, and price.
Insert exactly 5 rows into the sales table with specified values.
Write a query using TABLESAMPLE SYSTEM (10) to randomly sample about 10% of the rows.
Add an ORDER BY sale_id clause to the sampling query.
💡 Why This Matters
🌍 Real World
Random sampling helps businesses analyze a manageable subset of large data quickly, saving time and resources.
💼 Career
Data analysts and database administrators often use TABLESAMPLE to perform quick exploratory data analysis on big datasets.
Progress0 / 4 steps
1
Create the sales table and insert data
Create a table called sales with columns sale_id (integer), book_title (text), and price (numeric). Then insert these exact rows: (1, 'The Hobbit', 15.99), (2, '1984', 12.50), (3, 'Dune', 18.00), (4, 'Pride and Prejudice', 9.99), (5, 'The Catcher in the Rye', 14.00).
PostgreSQL
Need a hint?

Use CREATE TABLE to define the table and INSERT INTO with multiple rows to add data.

2
Set the sampling percentage
Create a variable or comment to indicate the sampling percentage as 10 for the TABLESAMPLE SYSTEM clause.
PostgreSQL
Need a hint?

Use a comment or variable to note the sampling percentage 10 for clarity.

3
Write the sampling query using TABLESAMPLE SYSTEM (10)
Write a SQL query to select all columns from sales using TABLESAMPLE SYSTEM (10) to randomly sample about 10% of the rows.
PostgreSQL
Need a hint?

Use TABLESAMPLE SYSTEM (10) right after the table name in the FROM clause.

4
Add ORDER BY sale_id to the sampling query
Modify the sampling query to add ORDER BY sale_id at the end to sort the sampled rows by sale_id.
PostgreSQL
Need a hint?

Add ORDER BY sale_id after the sampling clause to sort the results.