Heartbeat clustering in Linux
Configuring high availability on Linux servers using drbd and heartbeat:
This time while configuring high availability between two Linux servers (running Redhat linux, it should work fine on Fedora or Centos as well) I tried to document the steps, which are as follows:
·
Install two Linux systems
with one partition reserved for HA (same size on both systems).
We will need three IPs (one IP for each node and one virtual/floating IP).
We will need three IPs (one IP for each node and one virtual/floating IP).
·
Download drbd 8.3.0
# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.0.tar.gz
·
Extract drbd.
·
Install flex (yum install
flex).
·
cd drbd-8.3.0
·
Make rpm using the below
command:
# make rpm
·
cd dist/RPMS/x86_64
·
rpm –ivh *
·
edit /etc/fstab (vi
/etc/fstab) and modify the entry for reserved partition in (our case it is
/mirrors) to:
/dev/sda6 /data ext3 defaults,noauto 0 0
·
reboot the machine.
·
Edit /etc/drbd.conf, my
drbd.conf is:
global {
usage-count yes;
}
common {
syncer { rate 40M; }
protocol C;
}
resource r0 {
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
split-brain "/usr/lib/drbd/notify-split-brain.sh youremail@domain.com";
}
startup {
wfc-timeout 180;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg "sha1";
shared-secret "FooFunFactory";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
}
on HA1 {
device /dev/drbd0;
disk /dev/cciss/c0d0p2;
address 10.150.2.50:7788;
meta-disk internal;
}
on HA2 {
device /dev/drbd0;
disk /dev/cciss/c0d0p2;
address 10.150.2.51:7788;
meta-disk internal;
}
}
global {
usage-count yes;
}
common {
syncer { rate 40M; }
protocol C;
}
resource r0 {
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
split-brain "/usr/lib/drbd/notify-split-brain.sh youremail@domain.com";
}
startup {
wfc-timeout 180;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg "sha1";
shared-secret "FooFunFactory";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
}
on HA1 {
device /dev/drbd0;
disk /dev/cciss/c0d0p2;
address 10.150.2.50:7788;
meta-disk internal;
}
on HA2 {
device /dev/drbd0;
disk /dev/cciss/c0d0p2;
address 10.150.2.51:7788;
meta-disk internal;
}
}
·
Create dir /data:
mkdir /data
·
I had to install pre-reqs
for heart beat which I did using the below command:
# yum install gettext PyXML libtool-libs
libnet openhpi
·
Then I installed the
heartbeat package
rpm -ivh heartbeat-2.1.4-11.el5.x86_64.rpm
heartbeat-pils-2.1.3-3.el5.centos.x86_64.rpm
heartbeat-stonith-2.1.4-11.el5.x86_64.rpm
·
Now do the following:
chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta
·
Now type:
drbdadm create-md r0
·
You will probably receive
an error, enter the below command:
dd if=/dev/zero of=/dev/sda6 bs=1M
count=128
·
Now try again:
drbdadm create-md r0
·
Now do:
# service drbd start
·
# drbdadm --
--overwrite-data-of-peer primary r0 (do this on primary server only to avoid
split-brain).
·
mkfs.ext3 /dev/drbd0
·
mount /dev/drbd0 /data
·
Install drbdlinks
·
Edit
/etc/drbdlinks.conf, define mount point
(in our case /data) and nodes.
·
Create directrories
/var/www/html under /data
·
Type command drbdlinks
start
·
Create haresources ,
ha.cf and authkeys files in /etc/ha.d/ on both nodes, you can use the below files (edit IPs and services that you need to be highly available):
authkeys file located in /etc/ha.d/:
auth 1
1 sha1 Secret1234
Please enter the below command to change permissions of authkeys to 600:
chmod 600 /etc/ha.d/authkeys
My ha.cf file in /etc/ha.d:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0
auto_failback off
node HA1
node HA2
My haresources file located in /etc/ha.d:
HA1 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 IPaddr::10.150.2.55/24/eth0/10.150.2.255 mysqld asterisk httpd sendmail dhcpd
Please don't forget to have host entries in /etc/hosts file having IPs of HA1 and HA2 in both servers:
My /etc/hosts file containing IPs and host names of both servers:
10.150.2.50 HA1
172.150.2.51 HA2
·
Type command service
heartbeat start
·
Now primary server should have the partition
mounted and IP address assigned that we have defined in haresources . Apache
should also be running on it.
·
To move resources to the
other server type:
# service heartbeat standby