High Availability Storage with DRBD + Heartbeat + NFS on Debian 8

Overview

This guide will help you setup a highly available NFS server on Debian Jessie. This is a relatively battle-tested configuration, and there is plenty information out there on how it works.

This guide will give you a setup as follows:

  • One active NFS server with its own public, private and floating IP (VIP)
  • One passive hot standby NFS server with its own public and private IP
  • Automatic failover when one of the nodes becomes unresponsive or unreachable.
  • Unicast cluster syncronization (so it works on Linode and other places where multicast (like corosync) isn’t available).

Servers

While writing this guide, I used 2 KVM machine on Proxmox 4.2 (nfsnode01 and nfsnode02). Each VM configured as follows:

  • Default Debian Jessie install from a netinst iso
  • 512MB RAM
  • 1 x 20GB OS disk (all partitions – /dev/sda)
  • 1 x 20GB data disk (/dev/sdb)
  • Each node has 2x NICs (1x on network and 1x for DRBD data).
  • Nodes:
    • san01 (“node1”) / 192.168.0.242 / eth0
      • DRBD sync network: node1-drbd / 10.50.40.21 / eth1
    • san02 (“node2”) / 192.168.0.243 / eth0
      • DRBD sync network: node2-drbd / 10.50.40.22 / eth1
  • Cluster IP address: 192.168.0.245

Configurations

It’s good to get some basics down first.

Packages

Let’s start with some useful packages (install on both nodes):

apt-get install ntp vim  

Hostname

This is pretty important, since pretty much everything relies on the servers hostnames.

Edit /etc/hosts (removing the loopback entry for the host):

192.168.0.242   san01.lplinux.com.ar    san01
192.168.0.243   san02.lplinux.com.ar    san02
10.50.40.21     drbdnode01.lplinux.com.ar     drbdnode01
10.50.40.22    drbdnode02.lplinux.com.ar     drbdnode02

Install and Configure DRBD

DRBD will be used to constantly sync all data from the primary to the slave, whichever servers they may be at that point in time.

Install DRBD8 utils on both nodes:

apt-get install drbd8-utils  

Drop in the configs on alice and bob. First /etc/drbd.d/global_common.conf

global {  
    usage-count yes;
}

common {  
    protocol C;

    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger; reboot -f";
        local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger; halt -f";

        split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    }

    startup {
        wfc-timeout 15;
        degr-wfc-timeout 60;
    }

    net {
        cram-hmac-alg sha1;
    }

    syncer {
        rate 10M;
    }
}

and your DRBD resource config in /etc/drbd.d/r0.res

resource r0 {  
    net {
        shared-secret "a7s1g2ns97";
    }

    on drbdnode01 {
        device    /dev/drbd0;
        disk      /dev/sdb;
        address   10.50.40.21:7788;
        meta-disk internal;
    }

    on drbdnode02 {
        device    /dev/drbd0;
        disk      /dev/sdb;
        address   10.50.40.22:7788;
        meta-disk internal; 
    } 
} 

Let’s get DRBD started so the initial sync can get going.

# Create metadata on primary server
drbdadm create-md r0

# Start DRBD on both nodes
/etc/init.d/drbd start  

# Setup primary DRBD connection and sync
drbdadm -- --overwrite-data-of-peer primary r0  
drbdadm disconnect r0  
drbdadm connect r0  

You can check the sync status with this command:

cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
srcversion: 1A9F77B1CA5FF92235C2213
 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:13312 dw:13312 dr:0 al:0 bm:0 lo:1 pe:2 ua:0 ap:0 ep:1 wo:f oos:15714812
        [>....................] sync'ed:  0.2% (15344/15356)Mfinish: 0:19:15 speed: 13,312 (13,312) want: 13,440 K/sec

You can now format the DRBD disk using any filesystem you prefer, here I’m using EXT4

mkfs.ext4 /dev/drbd0  

Add the DRBD disk to /etc/fstab on both nodes

vim /etc/fstab

# Add a line like this - substitute for your preferred fs and settings
/dev/drbd0      /data           ext4    defaults        0 0

Install and Configure NFS

NFS will be used to serve our highly available data.

Let’s start with installing some packages on both nodes:

apt-get install nfs-kernel-server  

Now tell the new dependency based booting not to start NFS automatically. NFS will be started automatically by heartbeat later on.

insserv --remove nfs-kernel-server
insserv --remove nfs-common 

Setup our exports on both nodes

vim /etc/exports

# Add a line similar to this, change to suit your network and requirements
/data   192.168.0.0/255.255.255.0(rw,no_root_squash,no_all_squash,sync)

Install and Configure heartbeat

We’ll use heartbeat to syncronize the cluster, handle fencing and to promote the slave to the master.

Install heartbeat on both nodes:

apt-get install heartbeat  

Drop in the configs on both nodes as below. You’ll need 3 files in total.

/etc/heartbeat/ha.cf

logfacility     local0  
keepalive 2  
deadtime 10  
bcast   eth0  
auto_failback off  
node san01 san02  

/etc/heartbeat/haresources

san01  IPaddr::192.168.0.245/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 nfs-kernel-server  

Note: this line starts with san01 on both nodes – this is the “preferred primary”.

/etc/heartbeat/authkeys

auth 3  
3 md5 a7s1g2ns97

Note: set a good password, especially if you deploy this in a public cloud!

Finally, start up heartbeat:

service heartbeat start  

If you run ifconfig on the primary node, you should see that it now has the floating IP. You’ll also notice the NFS is running, and /data is mounted.

Testing

This is the best part. Let’s kill the primary server and make sure the slave takes over seamlessly.

The simplest way to simulate a failure is to stop heartbeat on whichever server is currently the primary.

service heartbeat stop  

Within a few seconds, you should see all services move over to san02, including the floating IP, NFS, and the /data mount. A quick check of cat /proc/drbd should also show san02 set to Primary.

Reference Links:

Print Friendly, PDF & Email

Pablo Javier Furnari

Linux System Administrator at La Plata Linux
I'm a Linux Sysadmin with 8 years of experience. I work with several clients as a consulter here in Argentina and oversea (I have clients in the United States, Mexico, Pakistan and Germany).

I know my strengths and weaknesses. I'm a quick learner, I know how to work with small and big teams. I'm hard worker, proactive and I achieve everything I propose.

Leave a Reply

Your email address will not be published. Required fields are marked *


CAPTCHA Image
Reload Image