ActiveMQ ships with two destination policies that allow you to forcibly terminate a connection to a slow consumer, AbortSlowConsumerStrategy and AbortSlowAckConsumerStrategy. These policies can be a solid resolution if the consumer is causing other consumers to delay processing or if backlogged messages are gumming up the broker.
The mechanisms behind these two strategies are often misunderstood (and not well documented) but it’s important to understand their methods of slow consumer detection to decide which one is right for your messaging system. Spoiler alert: It’s probably AbortSlowAckConsumerStrategy. Here’s why:
The AbortSlowConsumerStrategy looks at a consumer’s prefetch buffer to determine whether the consumer is slow and whether to terminate the consumer’s client connection. The check itself is blind to the actual contents of the prefetch buffer, however, and only looks at the size of that buffer. It does this check periodically, defaulting to every 30 seconds.
It follows, then, that if you’ve set your prefetch buffer to 1, which is the best setting in most cases involving multiple consumers handling persistent messages, this check may not accurately reflect the true state of the consumer. If your message system is constantly pushing messages into prefetch (the norm for most moderately-utilized messaging systems) then each time the check is performed, it’s likely that there will be one message in prefetch. Since the AbortSlowConsumerStrategy only looks at the number of messages in the buffer to determine if a consumer is slow, this will cause a false positive and the consumer will be aborted. Even if you’re going with a higher prefetch buffer size, or the default of 1000, this periodicity will still produce false positives under high load.
Try it for yourself
Go ahead, try it! Enable the Logging Interceptor and set logAll to true. Then enable the AbortSlowConsumerStrategy as a destination policy, like so:
<destinationPolicy> <policyMap> <policyEntries> <policyEntry queue=">" > <slowConsumerStrategy> <abortSlowConsumerStrategy /> </slowConsumerStrategy> </policyEntry> </policyEntries> </policyMap> </destinationPolicy>
Start a high-frequency load test and watch the log for slow consumer eviction events. You’ll notice that every 30 seconds, consumer connections will be marked as slow and terminated. As a bonus, set the checkPeriod parameter for the policy down to an obscenely low value, like 5ms. At that short sampling rate, the checks become far more accurate.
This inconsistency has been reported to the community in JIRA AMQ-6231. The community has suggested a workaround instead of providing a fix, and that workaround is the AbortSlowAckConsumerStrategy. Note that the AbortSlowAckConsumerStrategy was introduced in version 5.9 of ActiveMQ. If you’re stuck on an older version, the checkPeriod workaround above may be your only option, though expect higher CPU utilization as the broker performs very frequent checks of every consumer prefetch buffer connected. Of course, the best thing to do if you’re on pre-5.9 is to upgrade!
The AbortSlowAckConsumerStrategy does what it sounds like it does. Rather than rely on a relatively inaccurate check of the prefetch buffer occupancy, it actually looks at the amount of time a consumer is taking to acknowledge receipt of a message back to the broker. In most cases, this check will be much more accurate. You can enable the policy much in the same way that you enable the AbortSlowConsumerStrategy:
<destinationPolicy> <policyMap> <policyEntries> <policyEntry queue=">" > <slowConsumerStrategy> <abortSlowAckConsumerStrategy /> </slowConsumerStrategy> </policyEntry> </policyEntries> </policyMap> </destinationPolicy>
You also have a number of options you can set to further tune the behavior of this plugin:
• maxTimeSinceLastAck – Defaults to 30000, the amount of time that must pass before a message ack before the consumer will be considered slow
• ignoreIdleConsumers – Defaults to true, specifies if consumers who haven’t had any dispatched messages will be ignored during the slow check
• maxSlowCount – Defaults to -1 (infinite), specifies the total number of times that a consumer can be considered slow
• maxSlowDuration – The maximum amount of time that a consumer can be considered slow before aborting the connection
• checkPeriod – Defaults to 30000, the sampling rate at which a consumer is checked for slowness
• abortConnection – Defaults to false, whether the broker politely asks the consumer to disconnect, or if it just aborts its TCP connection
A few of these are really important to understand. First, ignoreIdleConsumers will allow consumers who simply haven’t had any messages dispatched to them in a while to be immune to the eviction check. For systems that are idle during long periods of time, this is exactly the behavior that you want.
The abortConnection parameter should be understood as well. The broker can send a terminate request down to a consumer and ask it to terminate its broker connection but, if the consumer process is legitimately hung, how could it respond to that request? If you’re using consumers which are prone to locking problems, setting this parameter to true will aggressively terminate that connection.
Aborting a slow consumer is often the best way to handle one, and so these dispatch policies can be very helpful in ensuring optimal throughput and client fault tolerance in a messaging system. However, for most scenarios, the original AbortSlowConsumerStrategy is simply inaccurate, and can do more harm than good. If you’re on ActiveMQ 5.9 or higher, the AbortSlowAckConsumerStrategy is probably your best bet. If you simply can’t upgrade to 5.9 or higher, though, tune the checkPeriod parameter for the AbortSlowConsumerStrategy to one that makes sense for the frequency of your messages.