0
0
RabbitMQdevops~7 mins

Network partitions and split-brain in RabbitMQ - Commands & Configuration

Choose your learning style9 modes available
Introduction
Network partitions happen when parts of a RabbitMQ cluster lose connection with each other. This can cause split-brain, where two parts of the cluster think they are the main cluster, leading to data conflicts and message loss.
When your RabbitMQ cluster nodes are in different data centers and a network failure isolates them.
When you want to understand why your RabbitMQ cluster nodes are not syncing messages properly.
When you need to configure RabbitMQ to handle network failures safely without losing messages.
When troubleshooting unexpected message duplication or loss in a clustered RabbitMQ setup.
When planning RabbitMQ cluster deployment to avoid split-brain scenarios.
Config File - rabbitmq.conf
rabbitmq.conf
cluster_partition_handling = pause_minority

# This setting tells RabbitMQ to pause nodes that are in the minority during a network partition,
# preventing split-brain by stopping conflicting nodes from accepting messages.

# Other common options are 'ignore' (default, risky) and 'autoheal' (automatic recovery but can cause data loss).

The cluster_partition_handling setting controls how RabbitMQ behaves during network partitions.

Setting it to pause_minority pauses nodes that lose majority connectivity, avoiding split-brain.

This helps keep your cluster consistent and safe during network issues.

Commands
Check the current status of the RabbitMQ cluster to see which nodes are connected and their health.
Terminal
rabbitmqctl cluster_status
Expected OutputExpected
Cluster status of node rabbit@node1 ... [{nodes,[disc,[rabbit@node1,rabbit@node2,rabbit@node3]]}, {running_nodes,[rabbit@node1,rabbit@node2,rabbit@node3]}, {partitions,[]}]
Set the cluster partition handling to 'pause_minority' to prevent split-brain by pausing minority nodes during network partitions.
Terminal
rabbitmqctl set_cluster_partition_handling pause_minority
Expected OutputExpected
Setting cluster partition handling to pause_minority ... done
Verify the cluster status again to confirm the new partition handling setting is active and nodes are healthy.
Terminal
rabbitmqctl cluster_status
Expected OutputExpected
Cluster status of node rabbit@node1 ... [{nodes,[disc,[rabbit@node1,rabbit@node2,rabbit@node3]]}, {running_nodes,[rabbit@node1,rabbit@node2,rabbit@node3]}, {partitions,[]}]
Key Concept

If you remember nothing else from this pattern, remember: configuring RabbitMQ to pause minority nodes during network partitions prevents split-brain and keeps your cluster consistent.

Common Mistakes
Leaving cluster_partition_handling at the default 'ignore' setting.
This allows split-brain to happen, causing data conflicts and message loss during network partitions.
Set cluster_partition_handling to 'pause_minority' or 'autoheal' depending on your safety and availability needs.
Not checking cluster status after changing partition handling settings.
You might miss that the setting was not applied or nodes are unhealthy, leading to unnoticed split-brain risks.
Always run 'rabbitmqctl cluster_status' to verify cluster health after configuration changes.
Summary
Use 'rabbitmqctl cluster_status' to check the health and connectivity of your RabbitMQ cluster nodes.
Set 'cluster_partition_handling = pause_minority' in rabbitmq.conf to prevent split-brain by pausing minority nodes during network partitions.
Verify changes by rechecking cluster status to ensure your cluster remains consistent and safe.