Setup a Highly Available NFS Cluster with disk encryption using LUKS, DRBD, Corosync and Pacemaker
Not long ago, I was required to setup a pair of Highly Available NFS Cluster with disk encryption. I turned out setting up the cluster using:
- CentOS 8 Stream
- LUKS
- DRBD
- Corosync and Pacemaker
A) Preparations
Setup two CentOS 8 Stream VMs. In this article, each VM will have two disks named /dev/sda
and /dev/sdb
, they will be the OS disk and Data disk respectively. The hostname and IP address of the environment are:
- VM #1: nfs1.example.com / 192.168.10.11
- VM #2: nfs2.example.com / 192.168.10.12
- Service Name / IP: nfs.example.com / 192.168.10.10
B) Setup Disk Encryption
Step 1: Encrypt the data disk
Encrypt the data disk using cryptsetup
command. You need to provide a passphrase to complete the encryption setup.
[root@nfs1 ~]# cryptsetup luksFormat /dev/sdbWARNING!
========
This will overwrite data on /dev/sdb irrevocably.Are you sure? (Type 'yes' in capital letters): YES
Enter passphrase for /dev/sdb: abcd1234ABCD
Verify passphrase: abcd1234ABCD
Step 2: Open the encrypted disk
Open the encrypted data disk using cryptsetup
command.
[root@nfs1 ~]# cryptsetup open /dev/sdb cryptedsdb
Enter passphrase for /dev/sdb:
Step 3: Enable auto-unlock of the encrypted disk on boot
Generate a file with 4 KB of random data to be used as a key to unlock the encrypted volume.
[root@nfs1 ~]# mkdir /etc/luks-keys
[root@nfs1 ~]# dd if=/dev/urandom of=/etc/luks-keys/sdb_secret_key bs=512 count=8
8+0 records in
8+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 7.8853e-05 s, 51.9 MB/s
Add the created key to the encrypted disk.
[root@nfs1 ~]# cryptsetup -v luksAddKey /dev/sdb /etc/luks-keys/sdb_secret_key
Enter any existing passphrase:
Key slot 0 unlocked.
Key slot 1 created.
Command successful.
Verify that the key has been added.
[root@nfs1 ~]# cryptsetup luksDump /dev/sdb | egrep "Keyslots:|luks2"
Keyslots:
0: luks2
1: luks2
Verify that the key is working.
[root@nfs1 ~]# cryptsetup -v luksClose cryptedsdb
Command successful.
[root@nfs1 ~]# cryptsetup -v luksOpen /dev/sdb cryptedsdb --key-file=/etc/luks-keys/sdb_secret_key
Key slot 1 unlocked.
Command successful.
Configure the server to automatically open the encrypted disk on boot.
[root@nfs1 ~]# cryptsetup luksDump /dev/sdb | grep "UUID"
UUID: 7924d99f-8007-4970-8798-698887938626
[root@nfs1 ~]# echo "cryptedsdb UUID=7924d99f-8007-4970-8798-698887938626 /etc/luks-keys/sdb_secret_key luks" > /etc/crypttab
Step 4: Backup the LUKS header
Backup the LUKS header so you could do a restore in case required.
[root@nfs1 ~]# cryptsetup -v luksHeaderBackup /dev/sdb --header-backup-file /root/LuksHeaderBackup_sdb.bin
Command successful.
Step 5: Repeat steps 1 to 4 on the second NFS VM
The above steps 1 to 4 have shown the setup of disk encryption on VM nfs1, now repeat the steps to do the same for VM nfs2.
C) Setup DRBD Disk Replication
Step 1: Create and configure the Volume Group and Logical Volume
Perform this step for both VMs nfs1 and nfs2.
[root@nfs1 ~]# pvcreate /dev/mapper/cryptedsdb
Physical volume "/dev/mapper/cryptedsdb" successfully created.
[root@nfs1 ~]# vgcreate drbdvg1 /dev/mapper/cryptedsdb
Volume group "drbdvg1" successfully created
[root@nfs1 ~]# lvcreate -L 1g -n data1 drbdvg1
Logical volume "data1" created.
Step 2: Compile DRBD kernel module from source
(Alternatively, you could install DRBD modules from EPEL repository, but you would need to ensure your kernel version is compatible with the DRBD module available in EPEL.)
By compiling the DRBD kernel module ourself, we could ensure it is always compatible with our desired kernel version. Follow below steps to complete the compilation.
Ensure the system’s kernel has been updated to latest version, and install the necessary packages.
dnf update
dnf install git gcc gcc-c++ make automake autoconf rpm-build kernel-devel kernel-rpm-macros kernel-abi-whitelists elfutils-libelf-devel
Create the directory for rpmbuild.
mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
echo '%_topdir %(echo $HOME)/rpmbuild' > ~/.rpmmacros
Download the latest version of DRBD.
(Check https://www.linbit.com/linbit-software-download-page-for-linstor-and-drbd-linux-driver/#drbd9 for the latest version)
curl -L -O https://www.linbit.com/downloads/drbd/9.0/drbd-9.0.24-1.tar.gz
Extract the downloaded tarball, then go into the folder of the extracted DRBD sources.
tar zxf drbd-9.0.24-1.tar.gz
cd drbd-9.0.24-1
Build the RPM packages for DRBD.
make kmp-rpm srpm
Step 3: Install the compiled DRBD kernel module
Install the kernel module package for DRBD.
cd ~/rpmbuild/RPMS/x86_64/
dnf localinstall -y kmod-drbd-9.0.24_4.18.0_227-1.x86_64.rpm
Copy the kernel module package kmod-drbd-9.0.24_4.18.0_227-1.x86_64.rpm
to another node and install it using dnf localinstall
command.
To avoid any accidental kernel version upgrade from breaking the DRBD module, lock the kernel version.
dnf install yum-plugin-versionlock
dnf versionlock kernel*
Validate the version lock for kernel.
[root@nfs1 ~]# cat /etc/yum/pluginconf.d/versionlock.list# Added lock on Mon Aug 17 10:58:21 2020
kernel-devel-0:4.18.0-227.el8.*
kernel-rpm-macros-0:123-1.el8.*
kernel-tools-libs-0:4.18.0-227.el8.*
kernel-core-0:4.18.0-227.el8.*
kernel-tools-0:4.18.0-227.el8.*
kernel-0:4.18.0-227.el8.*
kernel-modules-0:4.18.0-227.el8.*
kernel-abi-whitelists-0:4.18.0-227.el8.*
kernel-headers-0:4.18.0-227.el8.*
Reboot the two VMs.
Step 4: Configure SELinux and Firewall for DRBD
Modify SELinux policy to exempt DRBD processes from control
semanage permissive -a drbd_t
Allow drbdsetup to create access on netlink_generic_socket labeled drbd_t by default.
[root@nfs1 ~]# ausearch -c 'drbdsetup' --raw | audit2allow -M my-drbdsetup
******************** IMPORTANT ***********************
To make this policy package active, execute:semodule -i my-drbdsetup.pp[root@nfs1 ~]# semodule -X 300 -i my-drbdsetup.pp
Allow DRBD port 7789 between the two nodes.
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="<IP of remote node>" port port="7789" protocol="tcp" accept'
firewall-cmd --reload
Step 5: Configure DRBD
Before configuration, create a backup for the configuration file global_common.conf
.
cd /etc/drbd.d
cp global_common.conf global_common.conf.default
Modify / Add the following configurations to the configuration file global_common.conf
global {
usage-count no;
}
common {
net {
protocol C;
}
}
Create the resource configuration file /etc/drbd.d/nfsha.res
resource nfsha {
on nfs1.example.com {
device /dev/drbd1;
disk /dev/mapper/drbdvg1-data1;
meta-disk internal;
address 192.168.10.11:7789;
}
on nfs2.example.com {
device /dev/drbd1;
disk /dev/mapper/drbdvg1-data1;
meta-disk internal;
address 192.168.10.12:7789;
}
}
Ensure both files /etc/drbd.d/global_common.conf
and /etc/drbd.d/nfsha.res
are configured the same in both VMs nfs1 and nfs2.
Create DRBD meta data on VM #1 nfs1.
[root@nfs1 ~]# drbdadm create-md nfsha
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/mapper/drbdvg1-data1 at byte offset 1073737728Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yesmd_offset 1073737728
al_offset 1073704960
bm_offset 1073672192Found some data==> This might destroy existing data! <==Do you want to proceed?
[need to type 'yes' to confirm] yesinitializing activity log
initializing bitmap (32 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
Then create DRBD meta data on VM #2 nfs2.
[root@nfs2 drbd.d]# drbdadm create-md nfsha
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/mapper/drbdvg1-data1 at byte offset 1073737728Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yesmd_offset 1073737728
al_offset 1073704960
bm_offset 1073672192Found some data==> This might destroy existing data! <==Do you want to proceed?
[need to type 'yes' to confirm] yesinitializing activity log
initializing bitmap (32 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
Start the DRBD service on both VMs.
systemctl start drbd
Check the DRBD status on both VMs.
[root@nfs1 ~]# drbdadm status nfsha
nfsha role:Secondary
disk:Inconsistent
nfs2.example.com role:Secondary
peer-disk:Inconsistent[root@nfs2 ~]# drbdadm status nfsha
nfsha role:Secondary
disk:Inconsistent
nfs1.example.com role:Secondary
peer-disk:Inconsistent
It is seen that both nodes are in Secondary
role now.
Now initialize the device synchronization by running the below command on VM #1 nfs1, so it will be configured as the primary.
drbdadm primary --force nfsha
Check the DRBD status again. The VM nfs1 is now Primary
while VM nfs2 is now Secondary
, and the peer-disk status is UpToDate
.
[root@nfs1 ~]# drbdadm status nfsha
nfsha role:Primary
disk:UpToDate
nfs2.example.com role:Secondary
peer-disk:UpToDate[root@nfs2 ~]# drbdadm status nfsha
nfsha role:Secondary
disk:UpToDate
nfs1.example.com role:Primary
peer-disk:UpToDate
Step 6: Setup filesystem on the DRBD device
In the primary node nfs1, create filesystem on the DRBD device.
[root@nfs1 ~]# mkfs.xfs /dev/drbd1
meta-data=/dev/drbd1 isize=512 agcount=4, agsize=65532 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=262127, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=1566, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Mount the filesystem and change its SELinux file context.
[root@nfs1 ~]# mkdir /mnt/nfsdata
[root@nfs1 ~]# mount /dev/drbd1 /mnt/nfsdata
[root@nfs1 ~]# semanage fcontext -a -t nfs_t /mnt/nfsdata
D) Setup NFS Cluster with Corosync and Pacemaker
Step 1: Installation and basic setup
Install corosync and pacemaker.
dnf --enablerepo=HighAvailability -y install pacemaker pcs corosync
Enable pcsd service.
systemctl enable --now pcsd
Enable cluster mode for daemons in SELnux.
setsebool -P daemons_enable_cluster_mode 1
Allow high-availability service in firewall.
firewall-cmd --add-service=high-availability --permanent
firewall-cmd --reload
Set the password for hacluster
user.
passwd hacluster
Authorize the two nodes. You need to provide the password of hacluster
user in this step.
[root@nfs1 ~]# pcs host auth nfs1.example.com nfs2.example.com
Username: hacluster
Password: <password>
nfs1.example.com: Authorized
nfs2.example.com: Authorized
Step 2: Create the NFS cluster
Create the NFS cluster using pcs cluster setup
command.
[root@nfs1 ~]# pcs cluster setup nfs-cluster nfs1.example.com nfs2.example.com
No addresses specified for host 'nfs1.example.com', using 'nfs1.example.com'
No addresses specified for host 'nfs2.example.com', using 'nfs2.example.com'
Destroying cluster on hosts: 'nfs1.example.com', 'nfs2.example.com'...
nfs1.example.com: Successfully destroyed cluster
nfs2.example.com: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'nfs1.example.com', 'nfs2.example.com'
nfs1.example.com: successful removal of the file 'pcsd settings'
nfs2.example.com: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'nfs1.example.com', 'nfs2.example.com'
nfs1.example.com: successful distribution of the file 'corosync authkey'
nfs1.example.com: successful distribution of the file 'pacemaker authkey'
nfs2.example.com: successful distribution of the file 'corosync authkey'
nfs2.example.com: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'nfs1.example.com', 'nfs2.example.com'
nfs1.example.com: successful distribution of the file 'corosync.conf'
nfs2.example.com: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Start cluster service.
[root@nfs1 ~]# pcs cluster start --all
nfs1.example.com: Starting Cluster...
nfs2.example.com: Starting Cluster...
Enable auto-start of cluster service.
[root@nfs1 ~]# pcs cluster enable --all
nfs1.example.com: Cluster Enabled
nfs2.example.com: Cluster Enabled
Check the current cluster status.
[root@nfs1 ~]# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: nfs1.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Thu Jul 9 12:45:48 2020
* Last change: Thu Jul 9 12:45:34 2020 by hacluster via crmd on nfs1.example.com
* 2 nodes configured
* 0 resource instances configured
Node List:
* Online: [ nfs1.example.com nfs2.example.com ]PCSD Status:
nfs1.example.com: Online
nfs2.example.com: Online
[root@nfs1 ~]# pcs status corosyncMembership information
----------------------
Nodeid Votes Name
1 1 nfs1.example.com (local)
2 1 nfs2.example.com
Temporarily disable STONITH.
pcs property set stonith-enabled=false
Step 3: Add DRBD and Filesystem resources to the NFS cluster
Save the configuration file for the cluster.
pcs cluster cib nfs-cluster-config
Create and configure the DRBD resource into the configuration file.
pcs -f nfs-cluster-config resource create NFS-DRBD ocf:linbit:drbd drbd_resource=nfsha op monitor interval=60s
pcs -f nfs-cluster-config resource promotable NFS-DRBD promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true
Create and configure the Filesystem resource into the configuration file.
pcs -f nfs-cluster-config resource create NFS-Data Filesystem device="/dev/drbd1" directory="/mnt/nfsdata" fstype="xfs" options="uquota,pquota"
pcs -f nfs-cluster-config constraint colocation add NFS-Data with NFS-DRBD-clone INFINITY with-rsc-role=Master
pcs -f nfs-cluster-config constraint order promote NFS-DRBD-clone then start NFS-Data
Verify the configuration in the configuration file.
[root@nfs1 ~]# pcs -f nfs-cluster-config resource status
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Stopped: [ nfs1.example.com nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Stopped
Push the configuration to the cluster CIB, then verify the resource status of the cluster.
[root@nfs1 ~]# pcs cluster cib-push nfs-cluster-config
CIB updated
[root@nfs1 ~]# pcs resource status
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs1.example.com ]
* Slaves: [ nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs1.example.com
Step 4: Add NFS Server resource to the NFS cluster
Install NFS server packages in both nodes.
dnf install nfs-utils
Allow NFS service on firewall in both nodes.
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload
Configure the NFS export file in both nodes.
cat <<EOF > /etc/exports
/mnt/nfsdata/folder1 192.168.10.0/255.255.255.0(rw,sync,root_squash)
/mnt/nfsdata/folder2 192.168.10.0/255.255.255.0(rw,sync,root_squash)
EOF
Create and configure the NFS resource into the configuration file.
pcs -f nfs-cluster-config resource create NFS-Server systemd:nfs-server op monitor interval="30s"
pcs -f nfs-cluster-config constraint colocation add NFS-Server with NFS-Data INFINITY
pcs -f nfs-cluster-config constraint order start NFS-Data then start NFS-Server
Verify the configuration in the configuration file.
[root@nfs1 ~]# pcs -f nfs-cluster-config resource status
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Stopped: [ nfs1.example.com nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Stopped
* NFS-Server (systemd:nfs-server): Stopped
Push the configuration to the cluster CIB, then verify the resource status of the cluster.
[root@nfs1 ~]# pcs cluster cib-push nfs-cluster-config
CIB updated
[root@nfs1 ~]# pcs resource status
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs1.example.com ]
* Slaves: [ nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs1.example.com
* NFS-Server (systemd:nfs-server): Started nfs1.example.com
Step 4: Add Virtual IP resource to the NFS cluster
Create a Virtual IP resource for the cluster.
pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.10.10 cidr_netmask=32 op monitor interval=30s
pcs constraint colocation add ClusterIP with NFS-Server INFINITY
pcs constraint order start NFS-Server then start ClusterIP
Step 5: Verify the cluster
The NFS cluster is now ready to service, verify its status.
[root@nfs1 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs1.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Fri Jul 10 15:36:33 2020
* Last change: Fri Jul 10 15:36:22 2020 by root via cibadmin on nfs1.example.com
* 2 nodes configured
* 5 resource instances configuredNode List:
* Online: [ nfs1.example.com nfs2.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs1.example.com ]
* Slaves: [ nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs1.example.com
* NFS-Server (systemd:nfs-server): Started nfs1.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs1.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Try to write some files to the filesystem.
echo "Written from node1" >> /mnt/nfsdata/folder1/testfile
Switch over the NFS service from node 1 to node 2.
pcs node standby nfs1.example.com
Verify the cluster is now active on node nfs2.example.com.
[root@nfs2 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs1.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Fri Jul 10 15:37:35 2020
* Last change: Fri Jul 10 15:37:26 2020 by root via cibadmin on nfs1.example.com
* 2 nodes configured
* 5 resource instances configuredNode List:
* Node nfs1.example.com: standby
* Online: [ nfs2.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs2.example.com ]
* Stopped: [ nfs1.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs2.example.com
* NFS-Server (systemd:nfs-server): Started nfs2.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs2.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Check and update the test file from node 2.
[root@nfs2 ~]# cat /mnt/nfsdata/folder1/testfile
Written from node1
[root@nfs2 ~]# echo "Written from node2" >> /mnt/nfsdata/folder1/testfile
[root@nfs2 ~]# cat /mnt/nfsdata/folder1/testfile
Written from node1
Written from node2
[root@nfs2 ~]#
Switch back to node 1.
pcs node unstandby nfs1.example.com
pcs node standby nfs2.example.com
pcs node unstandby nfs2.example.com
Verify the cluster is now active on node nfs1.example.com.
[root@nfs1 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs1.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Fri Jul 10 15:38:00 2020
* Last change: Fri Jul 10 15:37:49 2020 by root via cibadmin on nfs2.example.com
* 2 nodes configured
* 5 resource instances configuredNode List:
* Online: [ nfs1.example.com nfs2.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs1.example.com ]
* Slaves: [ nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs1.example.com
* NFS-Server (systemd:nfs-server): Started nfs1.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs1.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Check the test file.
[root@nfs1 ~]# cat /mnt/nfsdata/folder1/testfile
Written from node1
Written from node2
The verification of the cluster is successful, save the configuration file for the cluster.
pcs cluster cib nfs-cluster-config
D) Setup VMware Fencing
Step 1: Configure VMware Fencing in vCenter
Perform below setups using vCenter’s web console:
- Create a role for the user account to perform VMware fencing.
- Role Name: Linux HA Fencing
- Permission: System.Anonymous, System.View, VirtualMachine.Interact.PowerOff, VirtualMachine.Interact.PowerOn
2. Create the user account nfsfence
to perform VMware fencing.
3. Add the user created above to the role Linux HA Fencing
Step 2: Install VMware Fence Agent
In the NFS servers, install VMware Fence Agent.
dnf install fence-agents-vmware-rest
Step 3: Add fencing to the NFS cluster
Verify that the NFS nodes could reach the vCenter server.
[root@nfs1 ~]# fence_vmware_rest -a vcenter.example.com -l 'nfsfence@vsphere.local' -p '<password>' --ssl-insecure -z -o status -n nfs1
Status: ON
[root@nfs1 ~]# fence_vmware_rest -a vcenter.example.com -l 'nfsfence@vsphere.local' -p '<password>' --ssl-insecure -z -o status -n nfs2
Status: ON[root@nfs2 ~]# fence_vmware_rest -a vcenter.example.com -l 'nfsfence@vsphere.local' -p '<password>' --ssl-insecure -z -o status -n nfs1
Status: ON
[root@nfs2 ~]# fence_vmware_rest -a vcenter.example.com -l 'nfsfence@vsphere.local' -p '<password>' --ssl-insecure -z -o status -n nfs2
Status: ON
Create the STONITH fencing device.
pcs stonith create Fence-vCenter fence_vmware_rest pcmk_host_map="nfs1.example.com:nfs1;nfs2.example.com:nfs2" ipaddr=vcenter.example.com ssl=1 ssl_insecure=1 login='nfsfence@vsphere.local' passwd='<password>'
Check the status of the STONITH fencing device.
[root@nfs1 ~]# pcs stonith status
* Fence-vCenter (stonith:fence_vmware_rest): Started nfs1.example.com
Enable STONITH.
pcs property set stonith-enabled=true
Verify the cluster status.
[root@nfs1 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs2.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Tue Aug 11 14:22:31 2020
* Last change: Tue Aug 11 14:22:28 2020 by root via cibadmin on nfs2.example.com
* 2 nodes configured
* 6 resource instances configuredNode List:
* Online: [ nfs1.example.com nfs2.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs1.example.com ]
* Slaves: [ nfs2.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs1.example.com
* NFS-Server (systemd:nfs-server): Started nfs1.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs1.example.com
* Fence-vCenter (stonith:fence_vmware_rest): Started nfs2.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Save the configuration file for the cluster.
pcs cluster cib nfs-cluster-config-with-fencing
Step 4: Test fencing of node nfs1
From node nfs2, fence node nfs1.
[root@nfs2 ~]# pcs stonith fence nfs1.example.com
Node: nfs1.example.com fenced
Verify that all resources are moved to node nfs2.
[root@nfs2 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs2.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Tue Aug 11 14:24:14 2020
* Last change: Tue Aug 11 14:22:28 2020 by root via cibadmin on nfs2.example.com
* 2 nodes configured
* 6 resource instances configuredNode List:
* Online: [ nfs2.example.com ]
* OFFLINE: [ nfs1.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs2.example.com ]
* Stopped: [ nfs1.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs2.example.com
* NFS-Server (systemd:nfs-server): Started nfs2.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs2.example.com
* Fence-vCenter (stonith:fence_vmware_rest): Started nfs2.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
After node nfs1 is resumed, the fence agent is then moved to it automatically.
[root@nfs2 ~]# pcs status
Cluster name: nfs-cluster
Cluster Summary:
* Stack: corosync
* Current DC: nfs2.example.com (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
* Last updated: Tue Aug 11 14:28:25 2020
* Last change: Tue Aug 11 14:22:28 2020 by root via cibadmin on nfs2.example.com
* 2 nodes configured
* 6 resource instances configuredNode List:
* Online: [ nfs1.example.com nfs2.example.com ]Full List of Resources:
* Clone Set: NFS-DRBD-clone [NFS-DRBD] (promotable):
* Masters: [ nfs2.example.com ]
* Slaves: [ nfs1.example.com ]
* NFS-Data (ocf::heartbeat:Filesystem): Started nfs2.example.com
* NFS-Server (systemd:nfs-server): Started nfs2.example.com
* ClusterIP (ocf::heartbeat:IPaddr2): Started nfs2.example.com
* Fence-vCenter (stonith:fence_vmware_rest): Started nfs1.example.comDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
E) Custom tuning of the NFS cluster
By default, the totem token timeout value of a Pacemaker / Corosync cluster is 1000 ms. In some environment, this may cause unexpected cluster failover due to network instability. I would suggest to increase this parameter to a longer value, say 10000 ms.
Open the file /etc/corosync/corosync.conf
on the primary node, and add the parameter token: 10000
under the totem
section.
totem {
version: 2
cluster_name: nfs-cluster
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
token: 10000
}
Sync config file to all cluster nodes manually.
[root@nfs1 ~]# pcs cluster sync
nfs1.example.com: Succeeded
nfs2.example.com: Succeeded
Reload corosync by below command, the command will take effect to all nodes without a downtime.
[root@nfs1 ~]# pcs cluster reload corosync
Corosync reloaded
Verify the result by checking the attribute runtime.config.totem.token
.
[root@nfs1 ~]# corosync-cmapctl | grep totem.token
runtime.config.totem.token (u32) = 10000
runtime.config.totem.token_retransmit (u32) = 2380
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
runtime.config.totem.token_warning (u32) = 75
totem.token (u32) = 10000