Wednesday, December 14, 2011

AutoScaling Amazon SQS Queue Processors

One of my favorite things about running servers in Amazon EC2 is the ability to use AutoScaling to automatically add and remove nodes as web traffic increases and decreases.  Not only does this generally save money, but it also helps prepare a system to handle traffic spikes.

Recently I had some fun setting up AutoScaling for a different use case -- a cluster of machines processing messages on an Amazon SQS queue.  The idea was to add and remove nodes as the number of visible messages on the queue fluctuated.  Again, this keeps costs lower by only running as many nodes as are necessary to process the current workload and handles workload spikes.

Here are sample steps and commands that you can use for setting up AutoScaling for SQS queue processors:

Create an AMI for a node that will process SQS messages.  The node that boots from this AMI should automatically launch one or more queue processing processes.  A user-data script may be useful for this.

Create a Launch Configuration for the queue processor node:

as-create-launch-config MyQueueConfig --image-id [INSERT YOUR AMI ID HERE] --instance-type c1.medium --key [INSERT YOUR KEYNAME HERE] --user-data [INSERT YOUR USER DATA SCRIPT HERE]

Create an AutoScaling group for queue processors:

as-create-auto-scaling-group MyQueueGroup --launch-configuration MyQueueConfig --availability-zones us-east-1b --min-size 1 --max-size 10

Create Policies that add/remove 1 node to/from the cluster.  They will be invoked when the number of messages on the queue grows excessively high or decreases to an acceptable level:

as-put-scaling-policy MyScaleUpPolicy -g MyQueueGroup --adjustment=1 --type ChangeInCapacity

as-put-scaling-policy MyScaleDownPolicy -g MyQueueGroup --adjustment=-1 --type ChangeInCapacity

Create Alarms to scale up/down when the number of messages on the queue grows excessively high or decreases to an acceptable level.  Use the Policy ARN's returned by the previous as-put-scaling-policy commands.

mon-put-metric-alarm --alarm-name MyHighMessagesAlarm --alarm-description "Scale up when number of messages on queue is high" --metric-name ApproximateNumberOfMessagesVisible --namespace AWS/SQS --statistic Average --period 60 --threshold 1000 --comparison-operator GreaterThanThreshold --dimensions QueueName=MyQueue --evaluation-periods 10 --alarm-actions [INSERT MyScaleUpPolicy ARN HERE]

mon-put-metric-alarm --alarm-name MyLowMessagesAlarm --alarm-description "Scale down when number of messages on queue is low" --metric-name ApproximateNumberOfMessagesVisible --namespace AWS/SQS --statistic Average --period 60 --threshold 100 --comparison-operator LessThanThreshold --dimensions QueueName=MyQueue --evaluation-periods 10 --alarm-actions [INSERT MyScaleDownPolicy ARN HERE]

In this example the Alarms cause the cluster to scale up when the number of visible messages on the queue remains above 1000 for 10 consecutive minutes and scale down when the number of visible messages falls below 100 for 10 consecutive minutes.

The Policies above adjust the amount of nodes in the cluster by a specific amount, but it is also possible to specify the adjustment in terms of percentages.  Using --adjustment 10 --type PercentChangeInCapacity would adjust the amount of nodes by 10 percent.

It also would have been possible to base scaling activities on other AWS/SQS metrics such as:
  • NumberOfMessagesSent
  • NumberOfMessagesReceived
  • NumberOfMessagesDeleted
  • NumberOfEmptyReceives
  • ApproximateNumberOfMessagesVisible
  • ApproximateNumberOfMessagesNotVisible
  • ApproximateNumberOfMessagesDelayed
  • SentMessageSize
Here are a few online references relevant to AutoScaling SQS queue processors:


Jeff Rhines said...

Exactly what I was looking for. Thanks!

Myles Megyesi said...

Hey Ken, thanks for the info! I have a question about the scale down procedure. How do you prevent Amazon from killing a server that is currently processing a SQS message?

StuartB__ said...

Don't think you can avoid that. Your processing app has to be stateless I guess, the message in progress will go back in the queue automagically after its invisibility timeout.

Anonymous said...

Thanks Ken for your post.

Do you know if there is a way to terminate instances based on more than a SQS? For instance, imagine I have sqs1 with 500 visible messages and sqs2 with 0 visible messages.
Then, obviously, I don't want to terminate instance until both queues are 0 (AND conditional).
I did not find a way how can I do this conjugation. Do you know if is it possible?


Unknown said...
This comment has been removed by a blog administrator.