Overview

Documentation for getting started with the HA cluster stack on Debian Jessie and beyond, using Pacemaker/Corosync 2.X with the CRM shell.

About this guide

In this guide we will be setting up a simple two-node cluster running an Nginx server with a shared IP.

You should already have two machines ready, running freshly installed copies of Debian 8.

Some things to note:

Host setup

It's assumed that we have two nodes running Debian Jessie and both hosts are connected somehow, so they can see each other on the network.

The standard way of building a cluster relies on multicast addressing – make sure this is feasible on your network and does not interfere with other services. (Alternatively you could switch to unicast, there will be an example configuration in Editing corosync.conf)

Installation

Node configuration

Hostnames / IPs:

node01 : 192.168.122.201
node02 : 192.168.122.202

Throughout this guide we assume that the hosts running our nodes have the hostnames node01 and node02 and have been assigned static IPs on the local network. Configure your machines accordingly or substitute your own hostnames and IP addresses.

Installing the Pacemaker/Corosync 2.X HA cluster stack

Since Debian Jessie doesn't contain the packages for the new stack, we will use jessie-backports.

cat > /etc/apt/sources.list.d/jessie-backports.list << "EOF"
deb http://http.debian.net/debian jessie-backports main
EOF

Make sure to run apt-get update to update the package list. Now we're ready to install the packages:

# apt-get update
# apt-get install -t jessie-backports pacemaker crmsh

This will install all necessary dependencies including corosync and fence-agents. We're also installing the CRM shell to manage our cluster.

Next we install the Nginx server on both nodes:

# apt-get install nginx

We also need to disable its automatic startup:

# systemctl disable nginx

In general the resource managed by your cluster should never be (re)started by anyone/anything other than pacemaker.

As a precaution we might also want to prevent pacemaker from starting immediately on our nodes: (e.g. in case something goes wrong while setting up the fencing agents later on)

# systemctl disable pacemaker

However keep in mind that you will have to execute service start pacemaker whenever you rebooted (one of) your nodes.

Configuration

Before we can start our cluster we have some configuring to do. This includes changing our corosync.conf file and generating an authkey for secure communication. After that is done we will have to copy these files to all our nodes.

Editing corosync.conf

Open the file /etc/corosync/corosync.conf in your favorite text editor on the first node.

We are making the following changes to the file:

For a two node setup we also need to add two_node: 1 to the quorum-block.

Your new corosync.conf should now look something like this:

# Please read the corosync.conf.5 manual page
# Debian-HA ClustersFromScratch sample config
totem {
        version: 2
        
        cluster_name: debian

        token: 3000
        token_retransmits_before_loss_const: 10

        clear_node_high_bit: yes

        crypto_cipher: aes256   # default was 'none'
        crypto_hash: sha1       # default was 'none'

        interface {
                ringnumber: 0

                # set address of the network here; default was '127.0.0.1'
                bindnetaddr: 192.168.122.0

                mcastaddr: 239.255.1.1
                mcastport: 5405

                ttl: 1
        }
}

logging {
        fileline: off

        to_stderr: no
        to_logfile: no
        to_syslog: yes

        syslog_facility: daemon
        debug: off
        
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        provider: corosync_votequorum
        two_node: 1             # value added
        expected_votes: 2
}

N.B.: If your network is routed you might have to raise the TTL value to get a reliable connection.

Refer to the following manpages for details on the configuration options:

man corosync.conf
man votequorum

If you want a unicast setup you will have to use transport: udpu and specify a nodelist with the static IPs of the nodes. Your config would look something like the following:

# Please read the corosync.conf.5 manual page
# This is a *partial* config file to show a unicast setup.
totem {
        version: 2
        transport: udpu         # set to 'udpu' (udp unicast)
        
        [...]

        interface {
                ringnumber: 0
                bindnetaddr: 192.168.122.0
                ttl: 1
        }
}

logging { [...] }

quorum {
        provider: corosync_votequorum
        two_node: 1
}

# Here we have to list the network addresses of all nodes manually
nodelist {
        node {
                ring0_addr: 192.168.122.201
        }
        node {
                ring0_addr: 192.168.122.202
        }
}

We are going to transfer the configuration file to the other node host(s) soon, but first we need an authkey because we enabled the 'crypto' settings.

Generating authkey

Run the following on node01:

root@node01:~# corosync-keygen

Warning: The program will most likely say something like 'Press keys on your keyboard to generate entropy.' But if you just start typing away into the corosync-keygen program all your input will be passed on to bash when the program finishes. To avoid this, open another terminal and type away there. corosync-keygen will periodically print a status message.

Once enough entropy was generated it will exit with:

Writing corosync key to /etc/corosync/authkey.

This key is needed for secure communication between the nodes.

Copy to other nodes

Now that we have our files ready on node01 we want to copy them to node02:

root@node01:~# scp /etc/corosync/corosync.conf root@node02:/etc/corosync/corosync.conf
root@node01:~# scp /etc/corosync/authkey root@node02:/etc/corosync/authkey

Starting the cluster

Now that our cluster is configured, we're ready to start Corosync & Pacemaker:

service corosync start
service pacemaker start

Check the status of the cluster:

crm status

(Alternatively, use crm_mon --one-shot -V for this)

Expected output:

Last updated: Tue Mar 29 10:02:59 2016
Last change: Tue Mar 29 10:02:50 2016 by root via crm_attribute on node02
Stack: corosync
Current DC: node02 (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured

Online: [ node01 node02 ]

Now is a good time to familiarize yourself with the crm shell. Call crm to open the shell and try running status from there:

root@node01:~# crm
crm(live)# status
... [Output omitted] ...
crm(live)# node
crm(live)node# show node01
node01(1084783305): normal
        standby=off
crm(live)node# up
crm(live)# bye

Using the shell to issue such commands has the added advantage of tab completion and the help command should you ever get lost. Also note how the shell prompt changes depending on the level you're in (node, configure, resource, ...)

Adding Resources

Now that our cluster is up and running we can add our resources.

They will have the following names:

Nginx and the shared IP

Open up crm configure:

root@node01:~# crm configure
crm(live)configure# 

Issue the following commands:

crm(live)configure#
        property stonith-enabled=no
        property no-quorum-policy=ignore
        property default-resource-stickiness=100

We're not enabling stonith until we set up the fencing agents later on. The no-quorum-policy=ignore is important so the cluster will keep running even with only one node up.

Now to add the actual resources for Nginx:

# replace the IP & network interface with your settings! #
        primitive IP-nginx ocf:heartbeat:IPaddr2 \
                params ip="192.168.122.200" nic="eth1" cidr_netmask="24" \
                meta migration-threshold=2 \
                op monitor interval=20 timeout=60 on-fail=restart
        primitive Nginx-rsc ocf:heartbeat:nginx \
                meta migration-threshold=2 \
                op monitor interval=20 timeout=60 on-fail=restart
        colocation lb-loc inf: IP-nginx Nginx-rsc
        order lb-ord inf: IP-nginx Nginx-rsc
        commit

Here we're setting up the actual resources. See IPaddr2 RA and Nginx RA for details about the parameters. The colocation and order commands are needed to ensure, that

Note: Make sure to assign an unused IP to IP-nginx above, not one of the nodes' IPs!

Your crm status should now look like this:

Last updated: Tue Apr 12 14:53:26 2016          Last change: Tue Apr 12 14:52:45 2016 by root via cibadmin on node02
Stack: corosync
Current DC: node02 (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ node01 node02 ]

Full list of resources:

 IP-nginx       (ocf::heartbeat:IPaddr2):       Started node01
 Nginx-rsc      (ocf::heartbeat:nginx): Started node01

Fencing agents

Fencing is used to put a node into a known-state, so if there's a problem in our cluster, and one node is behaving badly, not responding, etc. we'll put this node into a state that we know is safe (e.g. shut it down, or cut it off from the net). ( Fence-Clusterlabs )

To test our fencing setup before commiting anything to the live cluster we will use a shadow cib.

crm(live)# cib new fencing
INFO: cib.new: fencing shadow CIB created
crm(fencing)#

To see all the stonith devices/agents that you have, you can run:

# stonith_admin -I

This command will return you all the stonith devices you can use for fencing purposes.

The fence-agents package comes with many fencing agents:

root@node01:~# cd /usr/sbin
root@node01:/usr/sbin# ls fence_*
fence_ack_manual   fence_eps           fence_intelmodular  fence_rsb
fence_alom         fence_hds_cb        fence_ipdu          fence_sanbox2
... and many more ...
root@node01:/usr/sbin# ls fence_* | wc -l
61

Refer to the corresponding manpages for any parameters you need to pass to these. In this example we'll use the fence_virsh agent, which assumes the host is accessible via SSH at 192.168.122.1 and has root access allowed.

crm(fencing)# configure

property stonith-enabled=yes

primitive fence_node01 stonith:fence_virsh \
        params ipaddr=192.168.122.1 port=VMnode01 action=off login=root passwd=root \
        op monitor interval=60s
primitive fence_node02 stonith:fence_virsh \
        params ipaddr=192.168.122.1 port=VMnode02 action=off login=root passwd=root delay=15 \
        op monitor interval=60s
location l_fence_node01 fence_node01 -inf: node01
location l_fence_node02 fence_node02 -inf: node02
commit
up

We initially turned stonith-enabled off earlier, but now that we're configuring stonith agents we can enable it again.

The port parameter of the fence_virsh-agent expects the name of the VM within virsh.

The delay is set for the second fencing agent so that one node will issue the stonith command faster than the other in case of a lost connection. That way we can be sure only one of the nodes goes down if there is a connectivity issue.

The location constraints are there to ensure the nodes never run their own fencing-agent.

Instead of the passwd=... parameter you could supply the identity_file=... parameter with an SSH keyfile. If this key is protected by a passphrase you still need to supply it via the passwd parameter.

Now we can simulate the status of our cluster with the new configuration:

crm(fencing)# cib cibstatus simulate

Current cluster status:
Online: [ node01 node02 ]

 IP-nginx       (ocf::heartbeat:IPaddr2):       Started node01
 Nginx-rsc      (ocf::heartbeat:nginx): Started node01
 fence_node01   (stonith:fence_virsh):  Stopped
 fence_node02   (stonith:fence_virsh):  Stopped

Transition Summary:
 * Start   fence_node01 (node02)
 * Start   fence_node02 (node01)

Executing cluster transition:
 * Resource action: fence_node01    monitor on node02
 * Resource action: fence_node01    monitor on node01
 * Resource action: fence_node02    monitor on node02
 * Resource action: fence_node02    monitor on node01
 * Resource action: fence_node01    start on node02
 * Resource action: fence_node02    start on node01
 * Resource action: fence_node01    monitor=60000 on node02
 * Resource action: fence_node02    monitor=60000 on node01

Revised cluster status:
Online: [ node01 node02 ]

 IP-nginx       (ocf::heartbeat:IPaddr2):       Started node01
 Nginx-rsc      (ocf::heartbeat:nginx): Started node01
 fence_node01   (stonith:fence_virsh):  Started node02
 fence_node02   (stonith:fence_virsh):  Started node01

If everything looks correct we commit the changes and switch back to the live cib:

crm(fencing)# cib commit
crm(fencing)# cib use
crm(live)# 

And check the status again:

...
Online: [ node01 node02 ]

Full list of resources:

 IP-nginx       (ocf::heartbeat:IPaddr2):       Started node01
 Nginx-rsc      (ocf::heartbeat:nginx): Started node01
 fence_node01   (stonith:fence_virsh):  Started node02
 fence_node02   (stonith:fence_virsh):  Started node01

Testing the cluster

Now that your cluster is up and running, let's break some things!

Migrating/stopping resources

The first thing we'll look at is how to migrate resources.

crm(live)# resource
crm(live)resource# status IP-nginx
resource IP-nginx is running on: node01 
crm(live)resource# migrate IP-nginx
crm(live)resource# status IP-nginx
resource IP-nginx is running on: node02 
crm(live)resource# constraints IP-nginx
* IP-nginx
  : Node node01                                                                (score=-INFINITY, id=cli-ban-IP-nginx-on-node01)
    Nginx-rsc                                                                    (score=INFINITY, id=lb-loc)

As you can see after calling resource migrate IP-nginx our IP resource was forced over to the other node (and if you check the status, you'll see that the Nginx server moved with it). To achieve this crm automatically created a location constraint with a score of -inf so that the resource will never run on node01.

If you check crm configure show you will see the new entry there too:

...
location cli-ban-IP-nginx-on-node01 IP-nginx role=Started -inf: node01
...

If you were to migrate this resource again, it wouldn't be able to run on any of the nodes and shut down.

Instead we are going to undo the resource migration:

crm(live)# resource unmigrate IP-nginx

This will delete the resource constraint that was created by resource migrate.

If for some reason you still have constraints in your config that you need to get rid of:

crm(live)configure# delete cli-ban-IP-nginx-on-node01


Next we want to stop and restart our resources. Let's check the status once more:

crm(live)# status noheaders
Online: [ node01 node02 ]

 IP-nginx       (ocf::heartbeat:IPaddr2):       Started node01
 Nginx-rsc      (ocf::heartbeat:nginx): Started node01
 fence_node01   (stonith:fence_virsh):  Started node02
 fence_node02   (stonith:fence_virsh):  Started node01

As you can see our Nginx resources are running on node01.

We're going to stop the IP and by extension (because of the order constraint) our Nginx-server:

crm(live)# resource stop IP-nginx
crm(live)# resource show
 IP-nginx       (ocf::heartbeat:IPaddr2):       (target-role:Stopped) Stopped
 Nginx-rsc      (ocf::heartbeat:nginx): Stopped
 fence_node01   (stonith:fence_virsh):  Started
 fence_node02   (stonith:fence_virsh):  Started

Note that it did not migrate or restart because we specifically told it to stop running. Even if some rogue process (or admin) was to start an instance of nginx it would be stopped by pacemaker.

And to start it up again:

crm(live)# resource start IP-nginx
crm(live)# resource show
 IP-nginx       (ocf::heartbeat:IPaddr2):       Started
 Nginx-rsc      (ocf::heartbeat:nginx): Started
 fence_node01   (stonith:fence_virsh):  Started
 fence_node02   (stonith:fence_virsh):  Started

Simulate Nginx server failure

Next we will forcefully kill the nginx server to simulate a crash.

killall -9 nginx

Make sure to do this on the node where Nginx is currently running!

Now check for the nginx process:

pgrep -a nginx

After a while you should notice that the resource agent has restarted the server. In doing so it also incremented the failcount for the resource. We can check this via crm status or crm_mon:

# crm_mon -rfn1
-or-
# crm status inactive failcounts bynode

...
Migration Summary:
* Node node02:
* Node node01:
   Nginx-rsc: migration-threshold=2 fail-count=1 last-failure='Tue Mar 29 15:32:48 2016'

Failed Actions:
* Nginx-rsc_monitor_20000 on node01 'not running' (7): call=36, status=complete, exitreason='none',
    last-rc-change='Tue Mar 29 15:32:48 2016', queued=0ms, exec=0ms

(See crm help status and man crm_mon on how to use the parameters.)

When we configured our IP-nginx and Nginx resources we set migration-threshold=2. This means that once the failcount exceeds that limit the resource will be migrated by pacemaker.

Let's test this:

root@node01:~# crm status bynode noheaders
Node node01: online
        IP-nginx        (ocf::heartbeat:IPaddr2):       Started
        Nginx-rsc       (ocf::heartbeat:nginx): Started
        fence_node02    (stonith:fence_virsh):  Started
Node node02: online
        fence_node01    (stonith:fence_virsh):  Started

Failed Actions:
* Nginx-monitor_20000 on node01 'not running' (7): call=36, status=complete, exitreason='none',
    last-rc-change='Tue Mar 29 15:32:48 2016', queued=0ms, exec=0ms


root@node01:~# killall -9 nginx

root@node01:~# crm status bynode failcounts noheaders
Node node01: online
        fence_node02    (stonith:fence_virsh):  Started
Node node02: online
        fence_node01    (stonith:fence_virsh):  Started
        IP-nginx        (ocf::heartbeat:IPaddr2):       Started
        Nginx-rsc       (ocf::heartbeat:nginx): Started

Migration Summary:
* Node node02:
* Node node01:
   Nginx-rsc: migration-threshold=2 fail-count=2 last-failure='Tue Mar 29 15:49:13 2016'

Failed Actions:
* Nginx-monitor_20000 on node01 'not running' (7): call=57, status=complete, exitreason='none',
    last-rc-change='Tue Mar 29 15:49:13 2016', queued=0ms, exec=0ms

Note that because of the colocation rule the IP-nginx resource was migrated as well, although we only made the Nginx-rsc fail.

And once again we want to undo the damage we have done:

root@node01:~# crm
crm(live)# resource
crm(live)resource# failcount Nginx-rsc show node01
scope=status  name=fail-count-Nginx-rsc value=2
crm(live)resource# cleanup Nginx-rsc
Cleaning up Nginx-rsc on node01, removing fail-count-Nginx-rsc
Cleaning up Nginx-rsc on node02, removing fail-count-Nginx-rsc
Waiting for 2 replies from the CRMd.. OK
crm(live)resource# failcount Nginx-rsc show node01
scope=status  name=fail-count-Nginx-rsc value=0

As you can see calling crm resource cleanup Nginx-rsc resets the failcount.

Simulate node failure

Here we'll simulate a power-cut.

root@node01:~# crm status noheaders
Online: [ node01 node02 ]

 IP-rsc_nginx   (ocf::heartbeat:IPaddr2):       Started node02
 Nginx-rsc      (ocf::heartbeat:nginx): Started node02
 fence_node01   (stonith:fence_virsh):  Started node02
 fence_node02   (stonith:fence_virsh):  Started node01

The resources are running on node02, so this is the node we'll poweroff.

Since I'm working in a virtual environment, I'll destroy the machine from the VM host:

root@vmhost:~# virsh destroy node02

Resource status transitions:

root@node01:~# crm status noheaders inactive bynode
Node node01: online
        fence_node02    (stonith:fence_virsh):  Started
Node node02: UNCLEAN (offline)
        fence_node01    (stonith:fence_virsh):  Started
        IP-nginx        (ocf::heartbeat:IPaddr2):       Started
        Nginx-rsc       (ocf::heartbeat:nginx): Started

Inactive resources:

root@node01:~# crm status noheaders inactive bynode
Node node01: online
        fence_node02    (stonith:fence_virsh):  Started
        IP-nginx        (ocf::heartbeat:IPaddr2):       Started
        Nginx-rsc       (ocf::heartbeat:nginx): Started
Node node02: OFFLINE

Inactive resources:

 fence_node01   (stonith:fence_virsh):  Stopped

For a moment the second node was in an unknown state, presumably because the Stonith still had to be triggered. After the node was definitely offline the first node started up the Nginx server and the IP.

If you start (e.g. virsh start node02) the second machine again the status will be listed as pending until you start pacemaker:

root@node02:~# service pacemaker start


When we put a node in standby mode, this node won't be taken into consideration for holding resources, so if the node was running any they will be automatically moved to the other node.

This can be very useful to do maintenance stuff on this node, like a software-upgrade, configuration changes, etc.

crm(live)# node standby node02
crm(live)# node show
node02(1084783306): normal
        standby=off
node01(1084783305): normal
        standby=on

and go back online via

crm(live)# node online node02

Please note, that standby can take an additional lifetime parameter. This can be either reboot – meaning that the node will revert to online after the next reboot – or forever. The parameter defaults to forever, thus we have to manually get the node back online.

Or if you need to shut down just the nginx server for maintenance without triggering an error:

crm(live)# resource unmanage Nginx-rsc

and to monitor it again:

crm(live)# resource manage Nginx-rsc

Testing Stonith

We've tested the cluster. Now it's time to see if our fencing will run as expected.

It's assumed that both nodes are running together again. We need to choose one of them, and kill the corosync process:

root@node02:~# killall -9 corosync

As we can see from the logs (and via virsh list on the VM-host) our machine should have been shut down.

# journalctl -x
...
notice: Peer node02 was terminated (reboot) by node01 for node01: OK 
...
warning: Node node02 will be fenced because the node is no longer part of the cluster
...

And once you restart the VM and pacemaker it should be listed as online once again.

Troubleshooting

If you ever run into any trouble you can run the following command to check your cluster for errors:

crm_verify -LV
crm_verify -LVV...

Add more Vs to increase the output verbosity.

Quorum/Cluster connectivity

Sometimes you start your corosync/pacemaker stack, but each node will report that it is the only one in the cluster. If this happens, first make sure that the hosts are reachable on the network:

root@node01# ping 192.168.122.202
...
root@node02# ping 192.168.122.201
...

If this is working, try setting crypto_cipher and crypto_hash to none for the time being (make sure to edit corosync.conf for both nodes). After making the changes reload corosync:

# service pacemaker stop
# nano /etc/corosync/corosync.conf
... edit file ...
# service corosync restart
# service pacemaker start

Next take a look at the transport mode you're using. After you started corosync it should have logged this to the systemd journaling:

root@node01:~# journalctl -x | grep 'Initializing transport'
Apr 18 13:27:30 node01 corosync[451]: [TOTEM ] Initializing transport (UDP/IP Multicast).

Refer to the chapter Editing corosync.conf on how to switch this to UDP Unicast.

Resource configuration

Some helpful resource commands: