July 8, 2016

ActiveMQ High Availability: Pluggable Storage Lockers

Web Infrastructure

Middleware

ActiveMQ high availability (HA) models ensure that a broker instance is always online and able to process message traffic. In this blog, learn how pluggable storage lockers help manage broker resource locking in ActiveMQ and get a tutorial on how to implement one method, the Lease Database locker.

EDITOR'S NOTE: In 2016, the ActiveMQ team deprecated the use of LevelDB as a persistence store for ActiveMQ. It is no longer recommended for use with ActiveMQ.

How ActiveMQ High Availability Works
ActiveMQ High Availability Challenges
How to Use Pluggable Storage Lockers For ActiveMQ High Availability
Final Thoughts

How ActiveMQ High Availability Works

The two most common ActiveMQ high availability models involve sharing a filesystem over a network. The purpose is to provide either LevelDB or KahaDB to the active and passive broker instances.

These high availability models require that an OS-level lock can be obtained and maintained on a file in the LevelDB or KahaDB directories, simply called “lock.”

The first broker instance to obtain the file lock on the lock file becomes the active (or leader) instance. And the passive (or replica) instance periodically checks to see if it can lock the file. If it can, it assumes that the leader has lost the lock, and it brings itself up into leader mode.

ActiveMQ High Availability Challenges

There are some problems with this ActiveMQ high availability model. They can lead to either a no-leader situation, where the replica isn’t aware that it can lock the file. Or even worse, a leader-leader configuration that results in index and/or journal corruption and ultimately message loss.

Most of these problems stem from things outside of ActiveMQ’s control.

For instance, a poorly optimized NFS file store can cause locking data to become stale under load, leading to no-leader downtime during failover. Sharing violations in CIFS/SMB network solutions can cause the same problem. SAN solutions that don’t provide accurate lock state to the OS’s VFS can result in leader-leader scenarios.

The sheer variety of file system sharing solutions available make it near impossible for the ActiveMQ community to develop a locking solution that will work under all conditions.

How to Use Pluggable Storage Lockers For ActiveMQ High Availability

Since the majority of the problems with this HA solution stem from inaccurate OS-level file locking, the ActiveMQ community introduced the concept of a pluggable storage locker in version 5.7 of the broker.

This allows a user to take advantage of a different means of the shared lock, using a row-level JDBC database lock as opposed to an OS-level filesystem lock.

Here, we’ll focus on the Lease Database Locker solution.

Instead of requiring a persistent connection to the database, the leader broker periodically leases the lock on the database row, and renews that lease on a configurable period. If the leader doesn’t renew the lease, the lease expires and the replica is able to obtain the lock and become the new leader. This works with bad network conditions, and forcibly brings the leader node down if it’s unable to obtain the lease.

The solution is very easy to implement, and in our experience has proven to be a much more stable and reliable locking solution for the ActiveMQ shared filesystem HA model.

1. Configure Your Database

For starters, you’ll need to configure your database. The solution is compatible with any JDBC-compliant database.We’ve tested it with Postgres, MySQL/MariaDB, Oracle, and Microsoft SQL Server.

Next, create a new database user. For our example, we’ll have a user called “activemq” with a password of “activemq”. Then, create a new database called “activemq”. Grant locking and read/write permissions to the “activemq” user, and create a table called “activemq_lock” with the following schema:

You’ll then need to insert a single row into that table. That row will be the one ActiveMQ attempts to lock, so it’s very important not to skip this step. You can insert a row with a statement like:

INSERT INTO activemq_lock(ID) VALUES (1);

Once you’ve set up your database, you’ll need to alter activemq.xml to use the Lease Database Locker, and create a Spring JDBC connection bean.

2. Alter Your Persistence Adapter Configuration

You’ll need to alter your persistence adapter configuration in a way similar to the following:

persistenceAdapter>
    <levelDB directory="/tmp/activemq-jdbc-locker-data" lockKeepAlivePeriod="5000">
        <locker>
            <lease-database-locker lockAcquireSleepInterval="10000" dataSource="#postgres-ds">
                <statements>
                    <statements lockTableName="activemq_lock"/>
                </statements>
            </lease-database-locker>
        </locker>
    </levelDB>
 </persistenceAdapter>

In this example, we’re extending a typical LevelDB persistence store configuration to use a custom locker, the lease-database-locker. We’re telling the broker to renew the row-level lease every five seconds, and for the replica instance to attempt to acquire the lock every 10 seconds. We’re also directing the locker to a Spring datasource called “postgres-ds.” Note that we’re also giving it a “directory” parameter where LevelDB is accessed.

This is because we are still using a network mount to share LevelDB itself, that part of the original model doesn’t change at all. We’re simply replacing the locking mechanism with a JDBC solution, but both brokers will still need to access the persistence store itself over a shared network filesystem.

3. Create a JDBC Connection Configuration

The next step is to tell ActiveMQ how to connect to the database. You can do this by creating a standard JDBC connection spring bean, like so:

 <bean id="postgres-ds" class="org.postgresql.ds.PGPoolingDataSource" destroy-method="close">
    <property name="serverName" value="localhost"/>
    <property name="databaseName" value="activemq"/>
    <property name="portNumber" value="0"/>
    <property name="user" value="activemq"/>
    <property name="password" value="activemq"/>
    <property name="dataSourceName" value="postgres"/>
    <property name="initialConnections" value="1"/>
    <property name="maxConnections" value="10"/>
  </bean>

In this case, we’re connecting to a Postgres database, and giving the bean an identifier that matches the identifier we configured in the lease-database-locker configuration above. Note that both of these configurations should be identical across both the leader and replica configurations. You’ll also need to copy the JDBC driver .jar file into the /lib directory of ActiveMQ to give it access to the class you’re specifying in the configuration, and this will differ depending on the type of database that you’re using.

4. Fire It Up

That’s it for configuration, fire up the brokers and watch what happens. You’ll notice some different verbiage in the active broker’s log, similar to:

INFO | amq-master, becoming master with lease expiry Mon Jun 27 15:27:01 EDT 2016 on dataSource: org.postgresql.ds.PGPoolingDataSource@600b90df

Meanwhile the slave instance will periodically stamp:

INFO | amq-slave failed to acquire lease. Sleeping for 10000 milli(s) before trying again...
INFO | amq-slave Lease held by amq-master till Mon Jun 27 15:29:23 EDT 2016

As an added bonus, you can also determine which broker is currently in leader mode, and monitor failover scenarios, with a simple query to the activemq_lock table:

SELECT * FROM activemq_lock

The broker_name value of the lock row will correspond to the broker name of the current leader instance.

A note on time

It is extremely important that the leader and replica nodes have their time synchronized through an NTP-like solution. When the leader broker comes up, it’ll use a system-generated time stamp to hold the lease. If there’s time drift between the leader and replica instance, in our example, more than five seconds drift, the replica instance will think the leader hasn’t renewed its lease in a timely fashion and attempt to come up as the leader. This can lead to a dreaded leader-leader situation, and cause journal corruption and message loss.

Final Thoughts

There are many inherent flaws in the OS-level filesystem locking mechanism used in ActiveMQ high availability. None of them are the fault of ActiveMQ; they’re related to the network file share implementation itself.

The Lease Database Locker solution provides a much more standard and compliant solution to providing this locking, while being a relatively non-invasive and easy configuration change. We’re recommending that many of our customers switch to this model to ensure a stable and highly available messaging implementation with ActiveMQ.

Of course, ActiveMQ high availability can be tricky to get right. That's why you may consider enlisting the help of open source experts. OpenLogic's open source architects can help you set up ActiveMQ high availability — and get more out of ActiveMQ.

Need ActiveMQ Support or Training For Your Team?
Get in touch with OpenLogic today to set up ActiveMQ high availability and get ActiveMQ support and/or ActiveMQ training for your team.
Talk to Us

CentOS Guide for Enterprise