Thursday, April 12, 2012

Linux high availability


Heartbeat clustering in Linux

Configuring high availability on Linux servers using drbd and heartbeat:


Purpose of having the heartbeat cluster is to insure maximum up-time and minimize downtime, if one server goes down other node in the cluster will automatically take over and virtual/floating IP will be moved to that node. We can let heartbeat manage the services that we want to be highly available for example apache, mysql, asterisk, dhcp, ftp, samba, tftp and many more and if your server running those services goes down other node running heartbeat will take over and start those services.
This time while configuring high availability between two Linux servers (running Redhat linux, it should work fine on Fedora or Centos as well) I tried to document the steps, which are as follows:

·         Install two Linux systems with one partition reserved for HA (same size on both systems).
 We will need three IPs (one IP for each node and one virtual/floating IP).

·         Download drbd 8.3.0

# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.0.tar.gz

·         Extract drbd.

·         Install flex (yum install flex).

·         cd drbd-8.3.0

·         Make rpm using the below command:

# make rpm

·         cd dist/RPMS/x86_64

·         rpm –ivh *

·         edit /etc/fstab (vi /etc/fstab) and modify the entry for reserved partition in (our case it is /mirrors) to:

/dev/sda6               /data                ext3    defaults,noauto        0 0

·         reboot the machine.

·         Edit /etc/drbd.conf, my drbd.conf is:

global {
    usage-count yes;
}

common {
    syncer { rate 40M; }
    protocol C;
}

resource r0 {
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
    split-brain "/usr/lib/drbd/notify-split-brain.sh youremail@domain.com";
  }

  startup {
    wfc-timeout 180;
    degr-wfc-timeout 120;
  }

  disk {
    on-io-error   detach;
  }

  net {
    cram-hmac-alg "sha1";
    shared-secret "FooFunFactory";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 100M;
    al-extents 257;
  }

   on HA1 {
    device     /dev/drbd0;
    disk       /dev/cciss/c0d0p2;
    address    10.150.2.50:7788;
    meta-disk  internal;

  }

 on HA2 {
    device     /dev/drbd0;
    disk       /dev/cciss/c0d0p2;
    address    10.150.2.51:7788;
    meta-disk  internal;
  }


}



·         Create dir /data:

mkdir /data

·         I had to install pre-reqs for heart beat which I did using the below command:

# yum install gettext PyXML libtool-libs libnet openhpi

·         Then I installed the heartbeat package

rpm -ivh heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.3-3.el5.centos.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm

·         Now do the following:

chgrp haclient /sbin/drbdsetup

 chmod o-x /sbin/drbdsetup

 chmod u+s /sbin/drbdsetup

 chgrp haclient /sbin/drbdmeta

 chmod o-x /sbin/drbdmeta

 chmod u+s /sbin/drbdmeta

·         Now type:

drbdadm create-md r0

·         You will probably receive an error, enter the below command:

dd if=/dev/zero of=/dev/sda6 bs=1M count=128

·         Now try again:

drbdadm create-md r0

·         Now do:

# service drbd start

·         # drbdadm -- --overwrite-data-of-peer primary r0 (do this on primary server only to avoid split-brain).

·         mkfs.ext3 /dev/drbd0

·         mount /dev/drbd0 /data

·         Install drbdlinks

·         Edit /etc/drbdlinks.conf,  define mount point (in our case /data) and nodes.

·         Create directrories /var/www/html under /data

·         Type command drbdlinks start

·         Create haresources , ha.cf and authkeys files in /etc/ha.d/ on both nodes, you can use the below files (edit IPs and services that you need to be highly available):


authkeys file located in /etc/ha.d/:

auth 1
1 sha1 Secret1234

Please enter the below command to change permissions of authkeys to 600:
chmod 600 /etc/ha.d/authkeys


My ha.cf file in /etc/ha.d:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0
auto_failback off
node HA1
node HA2



My haresources file located in /etc/ha.d:

HA1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 IPaddr::10.150.2.55/24/eth0/10.150.2.255 mysqld asterisk httpd sendmail dhcpd

Please don't forget to have host entries in /etc/hosts file having IPs of HA1 and HA2 in both servers:

My /etc/hosts file containing IPs and host names of both servers:

10.150.2.50 HA1
172.150.2.51 HA2

·         Type command service heartbeat start

·         Now  primary server should have the partition mounted and IP address assigned that we have defined in haresources . Apache should also be running on it.
·         To move resources to the other server type:
# service heartbeat standby
If all goes well, live replication will be setup between Linux servers.