Bird
Raised Fist0
PostgreSQLquery~15 mins

Logical replication basics in PostgreSQL - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Logical replication basics
What is it?
Logical replication is a way to copy data changes from one PostgreSQL database to another in real time. It sends only the changes made to the data, not the entire database. This allows different databases to stay in sync while still being able to operate independently. It is useful for scaling, backups, and data integration.
Why it matters
Without logical replication, copying data between databases would require full backups or manual syncing, which is slow and error-prone. Logical replication solves this by efficiently sending only the changes, keeping data fresh and consistent across systems. This is crucial for businesses that need high availability, disaster recovery, or distributed data processing.
Where it fits
Before learning logical replication, you should understand basic PostgreSQL concepts like tables, transactions, and physical replication. After mastering logical replication, you can explore advanced topics like conflict handling, multi-master replication, and replication slots.
Mental Model
Core Idea
Logical replication streams only the data changes from one database to another, allowing them to stay synchronized without copying everything.
Think of it like...
It's like subscribing to a newsletter that sends you only the new articles instead of the entire magazine every time it publishes.
Publisher (Primary DB) ──changes──▶ Logical Replication Stream ──▶ Subscriber (Replica DB)

┌───────────────┐           ┌─────────────────────┐           ┌───────────────┐
│   Primary DB  │           │ Logical Replication  │           │  Replica DB   │
│  (Publisher)  │──────────▶│      Stream         │──────────▶│ (Subscriber)  │
└───────────────┘           └─────────────────────┘           └───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Logical Replication
🤔
Concept: Logical replication copies only data changes, not whole files.
In PostgreSQL, logical replication sends INSERT, UPDATE, and DELETE changes from one database to another. Unlike physical replication, it does not copy the entire database files but streams changes as logical operations.
Result
You get a replica database that reflects changes made on the primary without copying everything.
Understanding that logical replication works at the data change level helps you see why it is flexible and efficient.
2
FoundationKey Components: Publisher and Subscriber
🤔
Concept: Logical replication involves a publisher that sends changes and a subscriber that receives them.
The publisher is the source database that streams changes. The subscriber is the target database that applies those changes. You set up publications on the publisher and subscriptions on the subscriber to control what data is replicated.
Result
You can control which tables and changes are sent and received between databases.
Knowing these roles clarifies how data flows and how to configure replication.
3
IntermediateSetting Up a Publication
🤔Before reading on: do you think a publication can include all tables or only specific ones? Commit to your answer.
Concept: A publication defines which tables and changes to send from the publisher.
You create a publication on the primary database using SQL commands like CREATE PUBLICATION. You can specify all tables or select specific tables to include. This controls what data changes are sent to subscribers.
Result
The publisher knows exactly which data changes to stream to subscribers.
Understanding publications lets you tailor replication to your needs, avoiding unnecessary data transfer.
4
IntermediateCreating a Subscription
🤔Before reading on: do you think a subscription can replicate data from multiple publishers? Commit to your answer.
Concept: A subscription connects the subscriber to a publisher's publication to receive changes.
On the subscriber database, you create a subscription using CREATE SUBSCRIPTION, specifying the publisher's connection info and the publication name. The subscriber then starts receiving and applying changes automatically.
Result
The subscriber database stays updated with changes from the publisher.
Knowing how subscriptions work helps you set up and manage data flow from publishers.
5
IntermediateReplication Slots and Their Role
🤔
Concept: Replication slots keep track of what changes have been sent to subscribers to avoid data loss.
PostgreSQL uses replication slots to remember the position in the write-ahead log (WAL) for each subscriber. This ensures that changes are not removed before subscribers receive them, preventing data loss.
Result
Reliable delivery of changes without missing any updates.
Understanding replication slots is key to managing replication durability and resource usage.
6
AdvancedHandling Conflicts and Data Types
🤔Before reading on: do you think logical replication supports all data types and conflict scenarios automatically? Commit to your answer.
Concept: Logical replication has limitations with some data types and conflict handling requires care.
Certain data types like large objects or sequences need special handling. Also, if the subscriber modifies replicated tables, conflicts can occur. Logical replication assumes the subscriber is mostly read-only or carefully managed to avoid conflicts.
Result
You must plan replication carefully to avoid data inconsistencies.
Knowing these limits prevents surprises and data corruption in production.
7
ExpertPerformance and Internals of Logical Replication
🤔Before reading on: do you think logical replication impacts primary database performance significantly? Commit to your answer.
Concept: Logical replication streams changes asynchronously and uses WAL decoding internally.
PostgreSQL decodes changes from the WAL and sends them over a replication protocol. This process adds some overhead but is designed to minimize impact. Understanding this helps optimize replication settings and troubleshoot performance.
Result
Efficient replication with manageable performance costs.
Knowing the internal mechanics helps experts tune replication for high-load environments.
Under the Hood
Logical replication works by reading the write-ahead log (WAL) of the primary database and decoding the changes into logical operations like INSERT, UPDATE, DELETE. These operations are sent over a replication protocol to subscribers, which apply them to their tables. Replication slots track the progress to ensure no data is lost. The process is asynchronous, meaning changes are sent shortly after they happen but not instantly.
Why designed this way?
Logical replication was designed to provide flexible, table-level replication that physical replication could not offer. Physical replication copies entire data files and requires identical database setups, limiting use cases. Logical replication allows selective replication, supports different schemas, and enables more complex replication topologies. The design balances efficiency with flexibility by using WAL decoding and replication slots.
┌───────────────┐       WAL       ┌───────────────┐       Stream       ┌───────────────┐
│ Primary DB    │───────────────▶│ WAL Decoder   │───────────────▶│ Subscriber DB │
│ (Publisher)  │                │ (Logical Rep) │                │ (Subscriber)  │
└───────────────┘                └───────────────┘                └───────────────┘
       │
       │
       ▼
┌───────────────┐
│ Replication   │
│ Slot Tracker  │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does logical replication copy the entire database files? Commit yes or no.
Common Belief:Logical replication copies the whole database just like physical replication.
Tap to reveal reality
Reality:Logical replication only sends data changes (inserts, updates, deletes), not entire database files.
Why it matters:Believing this leads to expecting full copies and misunderstanding replication speed and flexibility.
Quick: Can logical replication replicate schema changes automatically? Commit yes or no.
Common Belief:Logical replication automatically replicates schema changes like adding columns or tables.
Tap to reveal reality
Reality:Logical replication does not replicate schema changes; these must be applied manually on subscribers.
Why it matters:Assuming automatic schema sync causes errors and data inconsistencies when schemas differ.
Quick: Can subscribers modify replicated tables freely without causing issues? Commit yes or no.
Common Belief:Subscribers can freely modify replicated tables without conflicts.
Tap to reveal reality
Reality:Modifying replicated tables on subscribers can cause conflicts and data inconsistencies unless carefully managed.
Why it matters:Ignoring this can lead to data corruption and replication failures.
Quick: Does logical replication support all PostgreSQL data types out of the box? Commit yes or no.
Common Belief:Logical replication supports all data types without exceptions.
Tap to reveal reality
Reality:Some data types like large objects or sequences require special handling or are not fully supported.
Why it matters:Not knowing this causes replication errors or missing data.
Expert Zone
1
Logical replication can replicate between different major PostgreSQL versions if compatible, enabling rolling upgrades.
2
Replication slots can cause disk bloat if subscribers lag or disconnect, requiring careful monitoring.
3
Publications can be altered dynamically to add or remove tables without restarting replication.
When NOT to use
Logical replication is not suitable when you need exact physical copies or very low-latency failover; physical replication or streaming replication is better. Also, for multi-master setups with frequent conflicts, consider specialized tools like BDR or third-party solutions.
Production Patterns
In production, logical replication is used for selective data distribution, real-time analytics replicas, zero-downtime upgrades, and integrating heterogeneous systems. It is common to combine logical replication with monitoring tools and scripts to handle schema changes and conflict detection.
Connections
Event-driven architecture
Logical replication streams data changes as events, similar to event-driven systems.
Understanding logical replication as event streaming helps grasp how data flows asynchronously and can trigger reactions in other systems.
Version control systems
Both track changes incrementally rather than copying entire states.
Seeing logical replication like version control clarifies why only changes are sent, saving time and space.
Supply chain logistics
Logical replication is like just-in-time delivery of goods, sending only what is needed when needed.
This connection highlights efficiency and timing in data synchronization.
Common Pitfalls
#1Trying to replicate schema changes automatically.
Wrong approach:ALTER TABLE replicated_table ADD COLUMN new_col INT; -- expecting subscriber to update automatically
Correct approach:Manually apply ALTER TABLE commands on both publisher and subscriber databases.
Root cause:Misunderstanding that logical replication does not handle schema changes.
#2Modifying replicated tables on subscriber without precautions.
Wrong approach:UPDATE replicated_table SET col = 'value' WHERE id = 1; -- on subscriber without conflict handling
Correct approach:Avoid modifying replicated tables on subscriber or use conflict management strategies.
Root cause:Assuming subscriber is independent and can freely change replicated data.
#3Ignoring replication slot disk usage growth.
Wrong approach:Not monitoring replication slots, leading to disk space exhaustion.
Correct approach:Regularly monitor and manage replication slots, drop unused slots to free space.
Root cause:Lack of awareness about replication slot behavior and resource impact.
Key Takeaways
Logical replication streams only data changes, not entire database files, making it efficient and flexible.
It uses publications and subscriptions to control what data is sent and received between databases.
Schema changes are not replicated automatically and must be managed manually to avoid errors.
Replication slots track progress and ensure no data loss but require monitoring to prevent resource issues.
Understanding the internal WAL decoding and asynchronous nature helps optimize and troubleshoot replication setups.

Practice

(1/5)
1. What is the main purpose of logical replication in PostgreSQL?
easy
A. To copy data changes from specific tables between databases
B. To create a full backup of the database
C. To optimize query performance
D. To encrypt data during transfer

Solution

  1. Step 1: Understand logical replication concept

    Logical replication copies only data changes from selected tables, not the entire database.
  2. Step 2: Compare options with concept

    Encrypting data during transfer, creating a full backup of the database, and optimizing query performance are unrelated to logical replication.
  3. Final Answer:

    To copy data changes from specific tables between databases -> Option A
  4. Quick Check:

    Logical replication = Copy data changes [OK]
Hint: Logical replication copies changes, not full backups [OK]
Common Mistakes:
  • Confusing logical replication with physical backup
  • Thinking it copies entire database
  • Assuming it improves query speed
2. Which SQL command is used to create a publication for logical replication?
easy
A. CREATE REPLICATION SLOT myslot LOGICAL;
B. CREATE SUBSCRIPTION mysub CONNECTION 'conninfo' PUBLICATION mypub;
C. CREATE PUBLICATION mypub FOR ALL TABLES;
D. CREATE DATABASE mydb WITH REPLICATION;

Solution

  1. Step 1: Identify command for publication creation

    The command to create a publication is CREATE PUBLICATION followed by publication name and tables.
  2. Step 2: Analyze options

    Only CREATE PUBLICATION mypub FOR ALL TABLES; creates a publication. CREATE SUBSCRIPTION mysub CONNECTION 'conninfo' PUBLICATION mypub; creates a subscription, CREATE REPLICATION SLOT myslot LOGICAL; creates a replication slot, and CREATE DATABASE mydb WITH REPLICATION; creates a database.
  3. Final Answer:

    CREATE PUBLICATION mypub FOR ALL TABLES; -> Option C
  4. Quick Check:

    Publication creation = CREATE PUBLICATION [OK]
Hint: Publication uses CREATE PUBLICATION, subscription uses CREATE SUBSCRIPTION [OK]
Common Mistakes:
  • Mixing publication and subscription commands
  • Using replication slot command for publication
  • Confusing database creation with replication setup
3. Given the following commands on the publisher:
CREATE PUBLICATION mypub FOR TABLE customers;
And on the subscriber:
CREATE SUBSCRIPTION mysub CONNECTION 'host=source dbname=mydb user=replicator password=secret' PUBLICATION mypub;
What will happen when a new row is inserted into the customers table on the publisher?
medium
A. The new row will be replicated to the subscriber's customers table
B. The new row will not be replicated because subscription is missing
C. The entire customers table will be copied again
D. An error will occur because publications cannot replicate inserts

Solution

  1. Step 1: Understand publication and subscription setup

    The publication includes the customers table, and the subscription connects to it, enabling replication of changes.
  2. Step 2: Analyze effect of insert on replication

    Inserts on the published table are sent to the subscriber, so the new row will appear there.
  3. Final Answer:

    The new row will be replicated to the subscriber's customers table -> Option A
  4. Quick Check:

    Insert on published table = replicated row [OK]
Hint: Inserts on published tables replicate if subscription exists [OK]
Common Mistakes:
  • Thinking subscription is missing when it is created
  • Assuming full table copy on each insert
  • Believing inserts are not replicated
4. You created a subscription but notice no data is replicating. Which of the following is a likely cause?
medium
A. Logical replication does not support inserts
B. The subscriber database is offline
C. You forgot to create a replication slot on the subscriber
D. The publication does not include the tables you want to replicate

Solution

  1. Step 1: Check publication includes tables

    If the publication does not include the desired tables, no changes will be sent to the subscriber.
  2. Step 2: Evaluate other options

    The subscriber database is offline is unlikely because subscription requires the subscriber to be online. Logical replication does not support inserts is false; inserts are supported. You forgot to create a replication slot on the subscriber is incorrect because replication slots are created on the publisher, not subscriber.
  3. Final Answer:

    The publication does not include the tables you want to replicate -> Option D
  4. Quick Check:

    Missing tables in publication = no replication [OK]
Hint: Ensure publication includes tables to replicate [OK]
Common Mistakes:
  • Confusing replication slot location
  • Assuming inserts are unsupported
  • Ignoring publication table list
5. You want to replicate only changes from the orders table but exclude the order_logs table, which is large and not needed on the subscriber. How should you set up the publication?
hard
A. CREATE PUBLICATION mypub FOR TABLE orders, order_logs;
B. CREATE PUBLICATION mypub FOR TABLE orders;
C. CREATE PUBLICATION mypub FOR ALL TABLES;
D. CREATE PUBLICATION mypub FOR TABLE order_logs;

Solution

  1. Step 1: Understand selective table replication

    To replicate only the orders table, the publication must include only that table.
  2. Step 2: Analyze options for correct table inclusion

    CREATE PUBLICATION mypub FOR ALL TABLES; replicates all tables, including order_logs which is unwanted. CREATE PUBLICATION mypub FOR TABLE orders, order_logs; includes both tables. CREATE PUBLICATION mypub FOR TABLE order_logs; includes only order_logs, which is unwanted.
  3. Final Answer:

    CREATE PUBLICATION mypub FOR TABLE orders; -> Option B
  4. Quick Check:

    Include only needed tables in publication [OK]
Hint: List only needed tables in publication to exclude others [OK]
Common Mistakes:
  • Using FOR ALL TABLES when exclusion is needed
  • Including unwanted tables in publication
  • Confusing publication and subscription roles