0
0
PostgreSQLquery~15 mins

Logical replication basics in PostgreSQL - Deep Dive

Choose your learning style9 modes available
Overview - Logical replication basics
What is it?
Logical replication is a way to copy data changes from one PostgreSQL database to another in real time. It sends only the changes made to the data, not the entire database. This allows different databases to stay in sync while still being able to operate independently. It is useful for scaling, backups, and data integration.
Why it matters
Without logical replication, copying data between databases would require full backups or manual syncing, which is slow and error-prone. Logical replication solves this by efficiently sending only the changes, keeping data fresh and consistent across systems. This is crucial for businesses that need high availability, disaster recovery, or distributed data processing.
Where it fits
Before learning logical replication, you should understand basic PostgreSQL concepts like tables, transactions, and physical replication. After mastering logical replication, you can explore advanced topics like conflict handling, multi-master replication, and replication slots.
Mental Model
Core Idea
Logical replication streams only the data changes from one database to another, allowing them to stay synchronized without copying everything.
Think of it like...
It's like subscribing to a newsletter that sends you only the new articles instead of the entire magazine every time it publishes.
Publisher (Primary DB) ──changes──▶ Logical Replication Stream ──▶ Subscriber (Replica DB)

┌───────────────┐           ┌─────────────────────┐           ┌───────────────┐
│   Primary DB  │           │ Logical Replication  │           │  Replica DB   │
│  (Publisher)  │──────────▶│      Stream         │──────────▶│ (Subscriber)  │
└───────────────┘           └─────────────────────┘           └───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Logical Replication
🤔
Concept: Logical replication copies only data changes, not whole files.
In PostgreSQL, logical replication sends INSERT, UPDATE, and DELETE changes from one database to another. Unlike physical replication, it does not copy the entire database files but streams changes as logical operations.
Result
You get a replica database that reflects changes made on the primary without copying everything.
Understanding that logical replication works at the data change level helps you see why it is flexible and efficient.
2
FoundationKey Components: Publisher and Subscriber
🤔
Concept: Logical replication involves a publisher that sends changes and a subscriber that receives them.
The publisher is the source database that streams changes. The subscriber is the target database that applies those changes. You set up publications on the publisher and subscriptions on the subscriber to control what data is replicated.
Result
You can control which tables and changes are sent and received between databases.
Knowing these roles clarifies how data flows and how to configure replication.
3
IntermediateSetting Up a Publication
🤔Before reading on: do you think a publication can include all tables or only specific ones? Commit to your answer.
Concept: A publication defines which tables and changes to send from the publisher.
You create a publication on the primary database using SQL commands like CREATE PUBLICATION. You can specify all tables or select specific tables to include. This controls what data changes are sent to subscribers.
Result
The publisher knows exactly which data changes to stream to subscribers.
Understanding publications lets you tailor replication to your needs, avoiding unnecessary data transfer.
4
IntermediateCreating a Subscription
🤔Before reading on: do you think a subscription can replicate data from multiple publishers? Commit to your answer.
Concept: A subscription connects the subscriber to a publisher's publication to receive changes.
On the subscriber database, you create a subscription using CREATE SUBSCRIPTION, specifying the publisher's connection info and the publication name. The subscriber then starts receiving and applying changes automatically.
Result
The subscriber database stays updated with changes from the publisher.
Knowing how subscriptions work helps you set up and manage data flow from publishers.
5
IntermediateReplication Slots and Their Role
🤔
Concept: Replication slots keep track of what changes have been sent to subscribers to avoid data loss.
PostgreSQL uses replication slots to remember the position in the write-ahead log (WAL) for each subscriber. This ensures that changes are not removed before subscribers receive them, preventing data loss.
Result
Reliable delivery of changes without missing any updates.
Understanding replication slots is key to managing replication durability and resource usage.
6
AdvancedHandling Conflicts and Data Types
🤔Before reading on: do you think logical replication supports all data types and conflict scenarios automatically? Commit to your answer.
Concept: Logical replication has limitations with some data types and conflict handling requires care.
Certain data types like large objects or sequences need special handling. Also, if the subscriber modifies replicated tables, conflicts can occur. Logical replication assumes the subscriber is mostly read-only or carefully managed to avoid conflicts.
Result
You must plan replication carefully to avoid data inconsistencies.
Knowing these limits prevents surprises and data corruption in production.
7
ExpertPerformance and Internals of Logical Replication
🤔Before reading on: do you think logical replication impacts primary database performance significantly? Commit to your answer.
Concept: Logical replication streams changes asynchronously and uses WAL decoding internally.
PostgreSQL decodes changes from the WAL and sends them over a replication protocol. This process adds some overhead but is designed to minimize impact. Understanding this helps optimize replication settings and troubleshoot performance.
Result
Efficient replication with manageable performance costs.
Knowing the internal mechanics helps experts tune replication for high-load environments.
Under the Hood
Logical replication works by reading the write-ahead log (WAL) of the primary database and decoding the changes into logical operations like INSERT, UPDATE, DELETE. These operations are sent over a replication protocol to subscribers, which apply them to their tables. Replication slots track the progress to ensure no data is lost. The process is asynchronous, meaning changes are sent shortly after they happen but not instantly.
Why designed this way?
Logical replication was designed to provide flexible, table-level replication that physical replication could not offer. Physical replication copies entire data files and requires identical database setups, limiting use cases. Logical replication allows selective replication, supports different schemas, and enables more complex replication topologies. The design balances efficiency with flexibility by using WAL decoding and replication slots.
┌───────────────┐       WAL       ┌───────────────┐       Stream       ┌───────────────┐
│ Primary DB    │───────────────▶│ WAL Decoder   │───────────────▶│ Subscriber DB │
│ (Publisher)  │                │ (Logical Rep) │                │ (Subscriber)  │
└───────────────┘                └───────────────┘                └───────────────┘
       │
       │
       ▼
┌───────────────┐
│ Replication   │
│ Slot Tracker  │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does logical replication copy the entire database files? Commit yes or no.
Common Belief:Logical replication copies the whole database just like physical replication.
Tap to reveal reality
Reality:Logical replication only sends data changes (inserts, updates, deletes), not entire database files.
Why it matters:Believing this leads to expecting full copies and misunderstanding replication speed and flexibility.
Quick: Can logical replication replicate schema changes automatically? Commit yes or no.
Common Belief:Logical replication automatically replicates schema changes like adding columns or tables.
Tap to reveal reality
Reality:Logical replication does not replicate schema changes; these must be applied manually on subscribers.
Why it matters:Assuming automatic schema sync causes errors and data inconsistencies when schemas differ.
Quick: Can subscribers modify replicated tables freely without causing issues? Commit yes or no.
Common Belief:Subscribers can freely modify replicated tables without conflicts.
Tap to reveal reality
Reality:Modifying replicated tables on subscribers can cause conflicts and data inconsistencies unless carefully managed.
Why it matters:Ignoring this can lead to data corruption and replication failures.
Quick: Does logical replication support all PostgreSQL data types out of the box? Commit yes or no.
Common Belief:Logical replication supports all data types without exceptions.
Tap to reveal reality
Reality:Some data types like large objects or sequences require special handling or are not fully supported.
Why it matters:Not knowing this causes replication errors or missing data.
Expert Zone
1
Logical replication can replicate between different major PostgreSQL versions if compatible, enabling rolling upgrades.
2
Replication slots can cause disk bloat if subscribers lag or disconnect, requiring careful monitoring.
3
Publications can be altered dynamically to add or remove tables without restarting replication.
When NOT to use
Logical replication is not suitable when you need exact physical copies or very low-latency failover; physical replication or streaming replication is better. Also, for multi-master setups with frequent conflicts, consider specialized tools like BDR or third-party solutions.
Production Patterns
In production, logical replication is used for selective data distribution, real-time analytics replicas, zero-downtime upgrades, and integrating heterogeneous systems. It is common to combine logical replication with monitoring tools and scripts to handle schema changes and conflict detection.
Connections
Event-driven architecture
Logical replication streams data changes as events, similar to event-driven systems.
Understanding logical replication as event streaming helps grasp how data flows asynchronously and can trigger reactions in other systems.
Version control systems
Both track changes incrementally rather than copying entire states.
Seeing logical replication like version control clarifies why only changes are sent, saving time and space.
Supply chain logistics
Logical replication is like just-in-time delivery of goods, sending only what is needed when needed.
This connection highlights efficiency and timing in data synchronization.
Common Pitfalls
#1Trying to replicate schema changes automatically.
Wrong approach:ALTER TABLE replicated_table ADD COLUMN new_col INT; -- expecting subscriber to update automatically
Correct approach:Manually apply ALTER TABLE commands on both publisher and subscriber databases.
Root cause:Misunderstanding that logical replication does not handle schema changes.
#2Modifying replicated tables on subscriber without precautions.
Wrong approach:UPDATE replicated_table SET col = 'value' WHERE id = 1; -- on subscriber without conflict handling
Correct approach:Avoid modifying replicated tables on subscriber or use conflict management strategies.
Root cause:Assuming subscriber is independent and can freely change replicated data.
#3Ignoring replication slot disk usage growth.
Wrong approach:Not monitoring replication slots, leading to disk space exhaustion.
Correct approach:Regularly monitor and manage replication slots, drop unused slots to free space.
Root cause:Lack of awareness about replication slot behavior and resource impact.
Key Takeaways
Logical replication streams only data changes, not entire database files, making it efficient and flexible.
It uses publications and subscriptions to control what data is sent and received between databases.
Schema changes are not replicated automatically and must be managed manually to avoid errors.
Replication slots track progress and ensure no data loss but require monitoring to prevent resource issues.
Understanding the internal WAL decoding and asynchronous nature helps optimize and troubleshoot replication setups.