ActiveMQ supports a number of high availability (HA) models for ensuring that a broker instance is always online and able to process message traffic. The two most common models involve sharing a filesystem over a network for the purpose of providing either LevelDB or KahaDB to the active and passive broker instances.
These failover models require that an OS-level lock can be obtained and maintained on a file in the LevelDB or KahaDB directories, simply called “lock.”
The first broker instance to obtain the file lock on the lock file becomes the active instance, or master, and the passive or slave instance periodically checks to see if it can lock the file. If it can, it assumes that the master has lost the lock, and it brings itself up into master mode.
Problems with this model
Although the solution is relatively elegant, there are some problems with this model that can lead to either a no-master situation, where the slave isn’t aware that it can lock the file, or even worse, a master-master configuration that results in index and/or journal corruption and ultimately message loss.
Most of these problems stem from things outside of ActiveMQ’s control. For instance, a poorly optimized NFS file store can cause locking data to become stale under load, leading to no-master downtime during failover. Sharing violations in CIFS/SMB network solutions can cause the same problem. SAN solutions that don’t provide accurate lock state to the OS’s VFS can result in master-master scenarios. The sheer variety of file system sharing solutions available make it near impossible for the ActiveMQ community to develop a locking solution that will work under all conditions.
Enter pluggable storage lockers
Since the majority of the problems with this HA solution stem from inaccurate OS-level file locking, the ActiveMQ community introduced the concept of a pluggable storage locker in version 5.7 of the broker. This allows a user to take advantage of a different means of the shared lock, using a row-level JDBC database lock as opposed to an OS-level filesystem lock. The first solution was called a Database Locker, and involved a persistent connection from the broker to a database to maintain the lock. However, this solution proved ineffective under conditions where the master broker crashes or loses its connection to the database, so we won’t go into that solution here.
Instead, we’ll focus on the Lease Database Locker solution. Instead of requiring a persistent connection to the database, the Master broker periodically leases the lock on the database row, and renews that lease on a configurable period. If the master doesn’t renew the lease, the lease expires and the slave is able to obtain the lock and become the new master. This works with bad network conditions, and forcibly brings the Master node down if it’s unable to obtain the lease.
The solution is very easy to implement, and in our experience has proven to be a much more stable and reliable locking solution for the ActiveMQ shared filesystem HA model.
For starters, you’ll need to configure your database. The solution is compatible with any JDBC-compliant database, and we’ve tested it with Postgres, MySQL/MariaDB, Oracle, and Microsoft SQL Server. Next, create a new database user. For our example, we’ll have a user called “activemq” with a password of “activemq”. Then, create a new database called “activemq”. Grant locking and read/write permissions to the “activemq” user, and create a table called “activemq_lock” with the following schema:
You’ll then need to insert a single row into that table. That row will be the one ActiveMQ attempts to lock, so it’s very important not to skip this step. You can insert a row with a statement like:
INSERT INTO activemq_lock(ID) VALUES (1);
Once you’ve set up your database, you’ll need to alter activemq.xml to use the Lease Database Locker, and create a Spring JDBC connection bean.
Persistence adapter configuration
You’ll need to alter your persistence adapter configuration in a way similar to the following:
<persistenceAdapter> <levelDB directory="/tmp/activemq-jdbc-locker-data" lockKeepAlivePeriod="5000"> <locker> <lease-database-locker lockAcquireSleepInterval="10000" dataSource="#postgres-ds"> <statements> <statements lockTableName="activemq_lock"/> </statements> </lease-database-locker> </locker> </levelDB> </persistenceAdapter>
In this example, we’re extending a typical LevelDB persistence store configuration to use a custom locker, the lease-database-locker. We’re telling the broker to renew the row-level lease every five seconds, and for the slave instance to attempt to acquire the lock every 10 seconds. We’re also directing the locker to a Spring datasource called “postgres-ds.” Note that we’re also giving it a “directory” parameter where LevelDB is accessed.
This is because we are still using a network mount to share LevelDB itself, that part of the original model doesn’t change at all. We’re simply replacing the locking mechanism with a JDBC solution, but both brokers will still need to access the persistence store itself over a shared network filesystem.
JDBC connection configuration
The next step is to tell ActiveMQ how to connect to the database. You can do this by creating a standard JDBC connection spring bean, like so:
<bean id="postgres-ds" class="org.postgresql.ds.PGPoolingDataSource" destroy-method="close"> <property name="serverName" value="localhost"/> <property name="databaseName" value="activemq"/> <property name="portNumber" value="0"/> <property name="user" value="activemq"/> <property name="password" value="activemq"/> <property name="dataSourceName" value="postgres"/> <property name="initialConnections" value="1"/> <property name="maxConnections" value="10"/> </bean>
In this case, we’re connecting to a Postgres database, and giving the bean an identifier that matches the identifier we configured in the lease-database-locker configuration above. Note that both of these configurations should be identical across both the master and slave configurations. You’ll also need to copy the JDBC driver .jar file into the /lib directory of ActiveMQ to give it access to the class you’re specifying in the
Firing it up
That’s it for configuration, fire up the brokers and watch what happens. You’ll notice some different verbiage in the active broker’s log, similar to:
INFO | amq-master, becoming master with lease expiry Mon Jun 27 15:27:01 EDT 2016 on dataSource: org.postgresql.ds.PGPoolingDataSource@600b90df
Meanwhile the slave instance will periodically stamp:
INFO | amq-slave failed to acquire lease. Sleeping for 10000 milli(s) before trying again...<br />
INFO | amq-slave Lease held by amq-master till Mon Jun 27 15:29:23 EDT 2016
As an added bonus, you can also determine which broker is currently in Master mode, and monitor failover scenarios, with a simple query to the activemq_lock table:
The broker_name value of the lock row will correspond to the broker name of the current master instance.
A note on time
It is extremely important that the master and slave nodes have their time synchronized through an NTP-like solution. When the master broker comes up it’ll use a system-generated time stamp to hold the lease. If there’s time drift between the master and slave instance, in our example, more than five seconds drift, the slave instance will think the master hasn’t renewed its lease in a timely fashion and attempt to come up as the master. This can lead to a dreaded master-master situation, and cause journal corruption and message loss.
There are many inherent flaws in the OS-level filesystem locking mechanism used in the tradition network shared filesystem HA model employed by ActiveMQ. None of them are the fault of ActiveMQ, they’re related to the network file share implementation itself. The Lease Database Locker solution provides a much more standard and compliant solution to providing this locking, while being a relatively non-invasive and easy configuration change. We’re recommending that many of our customers switch to this model to ensure a stable and highly available messaging implementation with ActiveMQ.