Tuesday, September 15, 2015

Active/Passive Clustering of MySQL HOWTO [feedly]

Active/Passive Clustering of MySQL HOWTO
http://www.itchyninja.com/activepassive-clustering-of-mysql-howto/

-- via my feedly.com reader

Recently, we experimented with active/passive clustering of MySQL via Corosync and Pacemaker.  While many choose to use DRBD, our requirement was to simply have storage fail over to the other node.  In the real world, this would likely be a SAN.  For our POC, we chose to use NFS.

failover

This document will provide a high level overview of a redundant and highly-available clustering solution for MySQL in an active/passive clustering configuration. While there are many ways to make MySQL highly available each has its strengths and weaknesses. These will be explained herein.

Components of the Solution
This solution requires only freely available Open Source components. Each is depicted in the diagram below:

 Two server nodes (virtual servers or physical servers)
 Virtual IP (VIP)
 Shared Storage such as a SAN with filesystem which can be mounted on either node
 Open Source software (Pacemaker, Corosync, Heartbeat, and MySQL or MariaDB)

What It Is
With an Active/Passive configuration, only two nodes (physical or virtual servers) are needed. A Virtual IP (VIP) is floated between the two nodes and only one node is active at any given time. The application will connect to the VIP and is completely agnostic of which server is active and which is passive.

A Cluster Resource Manager (CRM) called Pacemaker monitors the two nodes on a continuous basis via a heartbeat. Failure detection as well as the failover process is automatic. If a failure of a node is detected, it will failover the VIP and the shared filesystem resource to the passive node and make it the active node. This will be virtually transparent to the application. Poll intervals and required number of sequential failures are completely configurable such that the required downtime to perform a failover can be minimized.

This solution provides the benefits in speed of a standalone MySQL database server without the overhead of replication and the possibility of data corruption and/or inconsistency. There is no need to wait for commits on the passive node since only one node is active at a time. The passive node is running basic Operating System processes only. MySQL will not be running and the storage will not be mounted.

When failover is required, Pacemaker will shutdown MySQL (if running) on the active server and transfer the SAN filesystem over to the passive node, start MySQL, and transfer the VIP. In the event the failed node recovers, the CRM will acknowledge its recovery and allow failover back to it in the event the newly active node fails. This is designed to allow the DBA Team time to repair a failed node and keep the system operational as well as provide a means to perform upgrades and other maintenance on the system.

This option allows you to run any version of MySQL, Percona MySQL, MariaDB, or whatever. There is no dependency on any specific version and no limitations of features such as Foreign Keys, storage engines, etc. If it can run on a standalone MySQL server, it can be clustered with this solution.

The combination of Pacemaker, Heartbeat, and Corosync is essentially the core software in Redhat's Clustering Suite. We are just implementing the components we need for this specific application instead of using Redhat's suite of tools.

There are no associated licensing costs for this clustering technology as all software components are Open Source and thus free for use, including commercial use.

What It Is Not
While this solution offers many benefits, it is not designed to be a one size fits all approach. Nothing is.

This is not a solution to scale database load. Since only one node can be active at a time, it will not help if you need to scale reads or writes.

It is not a fast failover where five nines of uptime are required. It can take time to stop MySQL on the failed node and start it on the passive node. This may involve anywhere from a few seconds to a few minutes depending on system load at the time of the failure. At the same time, it is much faster than a manual process could ever be and with the great stability of a standalone MySQL server may offer from three to five nines of uptime in a given year.

For many applications this is more than enough, while others may demand a more traditional clustering option for MySQL such as Percona XTRADB Cluster (PXC) or MySQL Cluster.

This is not a replication solution. Replication offers many benefits such as duplicating data on multiple drives and allowing faster failover, but at the risk of possibly having data inconsistency. Cluster options such as PXC are very good at guaranteeing consistency while other Master/Master options implement no data collision enforcement.

If you can afford minimal downtime during failovers and/or need features not compatible with traditional clustering options for MySQL, this solution may be a great fit for you. This option offers very good uptime for MySQL beyond what is possible with a standalone server. With no licensing costs or software to purchase, it is a very affordable solution as it is hard to beat…FREE!

Below is the process we implemented as a proof of concept.

Pacemaker Official Documentation

Official documentation can be found for Pacemaker here:

http://clusterlabs.org/doc/

Since we are running version 1.1 of Pacemaker with Corosync 2.x, you can view a single HTML page view of the documentation here:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html

What is Corosync?

The below excerpt is from the official documentation.

Pacemaker is a cluster resource manager.

It achieves maximum availability for your cluster services (aka. resources) by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by your preferred cluster infrastructure (either Corosync or Heartbeat).

Pacemaker's key features include:

  • Detection and recovery of node and service-level failures
  • Storage agnostic, no requirement for shared storage
  • Resource agnostic, anything that can be scripted can be clustered
  • Supports fencing (aka. STONITH) for ensuring data integrity
  • Supports large and small clusters
  • Supports both quorate and resource-driven clusters
  • Supports practically any redundancy configuration
  • Automatically replicated configuration that can be updated from any node
  • Ability to specify cluster-wide service ordering, colocation and anti-colocation
  • Support for advanced service types
    • Clones: for services which need to be active on multiple nodes
    • Multi-state: for services with multiple modes (eg. master/slave, primary/secondary)
  • Unified, scriptable, cluster management tools.

Pacemaker Architecture

Again, from the official documentation:

At the highest level, the cluster is made up of three pieces:

  • Non-cluster-aware components. These pieces include the resources themselves; scripts that start, stop and monitor them; and a local daemon that masks the differences between the different standards these scripts implement.
  • Resource management. Pacemaker provides the brain that processes and reacts to events regarding the cluster.  These events include nodes joining or leaving the cluster; resource events caused by failures, maintenance and scheduled activities; and other administrative actions.  Pacemaker will compute the ideal state of the cluster and plot a path to achieve it after any of these events.  This may include moving resources, stopping nodes and even forcing them offline with remote power switches.
  • Low-level infrastructure. Projects like Corosync, CMAN and Heartbeat provide reliable messaging, membership and quorum information about the cluster.

Node Names & Addresses

For purposes of this guide, the following IP addresses are used:

node1 192.168.2.10
node2 192.168.2.11
NFS Server 192.168.2.12
VIP 192.168.2.9

Cluster Information

Below is general information about the cluster configuration for your reference to make using the guide a little easier.  We will need to develop naming conventions for these.

Cluster Name

Again, this should be a more meaningful name, but we will keep it simple in this guide:

Cluster Name pacemaker1

 Resources

Resources are also known as services (aka processes running on the server).  Our configuration will utilize three resources:

Resource Name Explanation
MySQL The MySQL database service.
nfs-mysql The filesystem resource which manages moving mounted filesystems.  In my test case, this will be via NFS.
ip-192.168.2.9 The VIP resource.

Resource Group

There will only be one resource group, which will be called "mysql-group".   There could be other resource groups, but at this point I cannot think of any others which would be needed for our purposes.

Resource Group Name Explanation
mysql-group This group contains the MySQL, filesystem, and VIP fesources.

 

Installation

Complete the following steps on both nodes:

Disable the built-in CentOS Firewall

In our sandbox, it was too much of a pain to fight with the firewall, so we disabled it entirely.

systemctl disable firewalldservice firewalld stop

Or, you may add an allowance instead:

firewall-cmd –permanent –add-service=high-availabilityfirewall-cmd –add-service=high-availability

Install the Required Software Packages

We only need three packages: pacemaker, pcs, and the resource-agents.

yum install pacemaker pcs resource-agents

Create a SSH User for Cluster Management

The below command sets a "hacluster" linux account with a password of "secret":

echo secret | passwd –stdin hacluster

Start the Pacemaker Daemon

This daemon controls it all.  We tell it to manage all configured resources and resource groups with the below command:

pcs cluster start –all

To ensure that the pcsd daemon starts automatically:

systemctl enable pcsd.serviceservice pcsd start

Setup Host Names

To make configuration easier, I setup hostnames in the /etc/hosts file.  We will likely use actual hostnames for this in our setup.  If you choose to do it in a sandbox, you can do so like this:

nano /etc/hosts

Add the following hostnames with your internal IP addresses:

192.168.2.10 node1192.168.2.11 node2

 

Configuration

Configuration of PCS can be done on node1.  It will be synced to the other node live as changes are made.

You really should not have to edit the configuration files directly.  The pcs commands given herein will do so for you.  The config file for corosync is stored at /etc/corosync/corosync.cnf.  Below is an example of what my setup created:

totem {version: 2secauth: offcluster_name: pacemaker1transport: udpu}nodelist {node {ring0_addr: node1nodeid: 1

}

node {

ring0_addr: node2

nodeid: 2

}

}

 

quorum {

provider: corosync_votequorum

two_node: 1

}

 

logging {

to_syslog: yes

}

Again, you should not even have to touch the above file.  There is no need to create it or edit it.  The pcs commands should generate it for you.

 

Initial Configuration

Authorize the Two Nodes in the Cluster

pcs cluster auth node1 node2 -u hacluster -p secret –force

Setup the Cluster and Name It (Execute on Both Nodes)

pcs cluster setup –force –name pacemaker1 node1 node2

Start the Cluster and All of Its Resources

pcs cluster start –all

Disable Shoot-The-Node-In-The-Head

pcs property set stonith-enabled=false

Tell Cluster to Not Worry About Quorum

pcs property set no-quorum-policy=ignore

Define How the Cluster Migrates Resources

pcs resource defaults migration-threshold=1

Tell Cluster Not to Move Resources Back to Recovered Node Immediately

pcs resource defaults resource-stickiness=100

 

Assign a VIP to the Cluster by Creating a VIP Resource

pcs resource create ip-192.168.2.9 IPaddr2 ip=192.168.2.9

Create a Storage Resource

Define the storage fileserver's IP and mount point name as well as where we want to mount locally.  In this case my NFS server is (192.168.2.12).  It exports "/share" which is mounted locally on the active node in "/u01/mysql/data."

pcs resource create nfs-mysql Filesystem device=192.168.2.12:/share directory=/u01/mysql/data fstype=nfs

Create a MySQL Resource

In creating the MySQL resource, we need to tell PCS where the datadir will be and what user to run MySQL as.  The timeout defines how long to wait on start/stop of MySQL.  In this example, the value is too small at 30 seconds.  This likely needs to be much larger such as 5-10 minutes or so.

The interval tells PCS how often to poll for a problem.  In this example, we are polling every 20 seconds.  Perhaps we want this interval smaller.  Not sure.

pcs resource create MySQL ocf:heartbeat:mysql params datadir=/u01/mysql/data user=mysql op monitor timeout="30″ interval="20″

Keep It All Together

We want to be sure when MySQL fails, the storage and VIP are kept together on the same node.  The order that the resources are listed becomea sort of constraint which specifies what order the resources should be started on the new node in a failover.  Constraints can be created in addition, but for our simple setup the below will work as long as you keep the three resources in the order specified below.

pcs resource group add mysql-group nfs-mysql MySQL ip-192.168.2.9

Syncing the Configuration

Typically syncing configuration between nodes is automatic.  If for some reason, a node is wiped and you want to copy the configuration from a good working node (such as node1), you can use the following command:

pcs cluster sync node1

Cluster Management Commands

This section will detail the commands helpful with managing the cluster and its resources.

Check Cluster Status

This command is very useful to see overall status of the cluster and its resources.  You can see which nodes are active as well as on which node the services/resources are running.  I generally run the below command which tells me everything at once.  There will be additional commands below to check only portions of the overall configuration such as the cluster itself and/or its resources.

pcs status

If you only care about cluster status, it can be done with this command:

pcs cluster status

You can get details on just resources with the following command:

pcs resource show

Restarting a Failed Service

There are a few commands which are helpful to issue when a resource fails to start.  Assuming you have fixed the issue with the resource, you can tell the cluster to restart the service, but this involves a few steps as outlined below.  In our example, we will assume it is MySQL which has failed and is now ready to be restarted.

First, we need to reset the failure count for the MySQL service.

pcs resource failcount reset MySQL

Clear out the failure actions from the "pcs status" command:

pcs resource failcount reset MySQL

Update the resource information:

pcs resource update MySQL

Finally, restart the resource:

pcs resource restart MySQL

Viewing Pacemaker Errors

Sometimes, you will see a message in the terminal telling you Pacemaker had an issue.  You can see a list of aborted issues with the following command:

abrt-cli list

Clearing Pacemaker Errors

If you run the "abrt-cli list" command and see output, you can inspect the directory specified for any debugging information.  When you are done with it, you may clear the error with a command such as the following (replace dirname with the directory specified in the above list command.

abrt-cli remove dirname

 

Failing Over Manually

I am certain there are multiple ways to do this.  Below is a simple method.  If everything is running on node1, you can cause a failure to node2 by stopping the cluster on node1.  The following command issues on node1 will do this:

pcs cluster stop node1 –all

Once it is failed over to node2, you can use the following command to restart the cluster on node1:

pcs cluster start node1 –all

As we gain understanding of PCS, we will likely develop a better procedure than the above, but this works for now.

Restarting Corosync

Although I am somewhat uncertain when this is necessary, the below command restarts the Corosync service:

pcs cluster reload corosync

 

Glossary

Term Meaning
Corosync A messaging protocol to allow the proper communication between cluster nodes.
CRM Cluster Resource Manager.  In our example, we will use Pacemaker.  This tool manages corosync  and heartbeat to monitor a node, its cluster, and resources for failure and manages moving the resources to the new node.
DC Designated Controller.  This is the master decision maker for the cluster.
Heartbeat This component watches nodes for failure and reports back to the CRM.
Pacemaker Pacemaker is a CRM.
PCS Pacemaker Configuration Shell.  The pcs tool will be utilized to configure and manage the cluster nodes and their resources.
Resource Agents A set of scripts which manage a service such as starting and stopping MySQL.  There is an abundance of these, so there is little to no need to modify these manually.  There is a template available in the event a custom one is ever needed.
 

PlanetMySQL Voting: Vote UP / Vote DOWN