AWS can overwhelm with the number of services. Especially if multiple services seem to do a very similar job. Let’s look at the cloud-native AWS messaging services – the SQS, SNS, Kinesis, and EventBridge. What are the differences and when to use which one?
This post is an extension to a thread I posted some time ago on Twitter:
Like there, I’m looking only at the AWS custom, serverless messaging services.
The goal of the messaging solutions is to decouple the producer from the consumer. The producer(s) do not know about the consumer(s), and vice versa. They can be part of the same (micro)service or live in two separate systems.
Introducing messaging system instead of directly invoking resources makes the processing asynchronous. As a result, the producer can push messages as fast as he wants, while the consumer may process them at his peace, scaling out if needed.
While they all introduce a loosely coupled architecture, messaging services from the title have different capabilities. Therefore, depending on the use case, you have to choose an appropriate solution to fit the needs.
AWS messaging services
SQS: Simple Queue Service
Multiple producers can push messages to it, and the consumers will take them off the queue. There may be many consumers, although only one consumer will read an individual message. For that reason, the usual approach is to have a single application reading from the queue. However, you can parallelize processing with multiple application instances, like several Lambda function environments or a fleet of EC2 machines.
In SQS, the consumer pulls the messages. There is a long-polling option, where the consumer calls the SQS and waits until there are messages available or the max polling time is reached. While with a Lambda as a consumer, this is done under the hood, you still pay for it. But more on this later.
SQS is highly scalable, and you don’t need to specify the throughput capacity for it. You simply send as much as you need.
SNS: Simple Notification Service
While SQS is a queue of messages processed by a single consumer, the SNS allows sending messages to multiple receivers. Or, more precisely said, subscribers, because it’s a Publisher-Subscriber model. Contrary to the SQS, the SNS sends messages to the receiver without polling from the consumer.
With SNS, you can send messages to:
- Lambda function
- HTTP endpoint
- mobile phone – via SMS
- mobile app – via push notification
- Kinesis Firehose
You can optionally assign filters to individual subscribers to limit messages delivered to them based on the message attributes.
In SNS, as with SQS, you do not have to provision throughput.
Kinesis Data Streams
While SQS and SNS are pretty simple to distinct, here the confusion begins.
Kinesis Data Streams is a message streaming service that seems to do the same thing as the SQS. The producer sends messages, and the consumer reads them. Also, like in the SQS, the consumer pulls the messages. But here the similarities end.
Firstly, with Kinesis you can have multiple distinct consumers, all of them getting all the messages from the stream. Each consumer must track the last position in the stream that he read and provide it in the following request to get the next batch of messages. This way, each consumer may process messages at their own pace.
Since multiple consumers can read messages at a different rate, messages must be durable. And they are. Kinesis keeps the messages for a specified time, from 24 hours (default) to 365 days. Each consumer can go back in the history as far as he wants, re-reading all the messages in this time range. This is useful to re-process messages after some fault, like a bug in the consumer logic.
Kinesis is more than just messaging – it’s about streaming vast amounts of data. As such, it comes with two other integrated services: Kinesis Data Firehose and Kinesis Data Analytics. They let you transform, process, analyze, and store the data stream with minimal to no code.
Unfortunately, while Kinesis can handle huge amounts of data, it does not autoscale. Instead, we must explicitly set the throughput by defining the number of shards our stream will consist of, each shard providing a specified capacity.
Kinesis is a powerful service, but you know what Uncle Ben said about the great power. So if you plan to use Kinesis together with Lambda, you may look at the mistakes I did in the past so that you won’t repeat them:
Amazon EventBrige is the youngest kid on the block, introduced at the end of 2019. Again, it may seem similar to other services we discussed above.
EventBridge is an event bus for messages that you want to propagate across your (micro)services. Those events can come from state changes of AWS services, other AWS accounts, or external applications like Auth0, Shopify, and others. You can, of course, also send your custom messages.
Similar to SNS, also here we have a publisher-subscriber model. Each subscriber sets filtering rules to select what kind of messages he wants to receive. To give some examples, you may want to trigger a Lambda function whenever a specific Step Function execution finishes (event from AWS service) or a new user creates an account (custom event).
Apart from triggering Lambda functions, you can also send messages to several AWS services. That includes other messaging solutions discussed above – SQS, SNS, and Kinesis Data Stream.
EventBridge allows for true systems decoupling. With all other solutions, we usually create a single-purpose resource – for example, an SQS queue for specific messages that a single consumer would receive. In the case of EventBridge, on the other hand, we have a single central event bus, to which all the producers write, and all the consumers subscribe.
On top of that, you can archive the messages coming to the EventBridge and replay them later. While with the Kinesis stream an individual client chooses to re-processes the messages, in the EventBridge we trigger re-sending messages from the event bus level.
Oh, and you don’t need to provision capacity for it.
Order and duplicates
When discussing highly scalable, distributed messaging systems, there are always two concerns: duplicated deliveries and order of messages.
The SQS, SNS, and EventBridge guarantee at-least-once message delivery. It means that, occasionally, a single message can be read twice by the consumer. The best approach to handle this is to make the consumer idempotent: the processing result and system state should be the same after multiple invocations with the same payload.
Apart from the number of deliveries, also the order of messages is not guaranteed. They will generally be delivered in the same order as the producer sent them, but exceptions may happen.
Those two problems, number and order of deliveries, can be mitigated by switching to FIFO SQS queues and FIFO SNS topics. They are a little bit more expensive versions with exactly-once, in-order message delivery.
The Kinesis stream is different because there the consumer is responsible for tracking the read position. That removes the duplicate reads problem, at least from the Kinesis point of view. What about the order of the messages? It is preserved but on the shard level only. That’s why you should put all the related messages, like the actions of a single user, into a single shard.
The speed of message delivery differs between SQS, SNS, Kinesis, and EventBridge. With all four, your message will reach the target in under 1 second. While for most systems, it’s enough, sometimes we need to reduce the latency to the minimum. In such a case, the offered delivery speed can be a deciding factor.
|Kinesis Data Streams||~200 ms|
~70 ms with enhanced fan-out
I skipped an important part of those services characteristics: pricing. So let’s fix it now. But we will look at how the charges are calculated rather than the individual prices.
SQS pricing seems straightforward: you pay per request. But what is a request in this case?
A request is every write and read call you make. You remember that the client needs to poll for messages, right? So here is a surprise: a Lambda function with an SQS trigger, with no messages in a queue at all, will make a request every 20 seconds (max polling time). This translates to 129600 requests per month – for a queue with 0 messages. But no big worries, even without a free tier, it’s only $0.05 (in the us-east-1 region). The serious costs start when you actually send messages and do it on a scale.
On the bright side, a single request can contain up to 10 messages. So if you push 10 messages in a single batch call, and read them in the same way, you are billed for only 2 requests (one for write and one for read), not 20. Keep in mind that there are size limitations for a single request, so I’m assuming small payloads here.
SNS pricing is quite simple, comparing to others. You pay for sending messages to SNS. The first million a month is free.
Then you pay for delivering the messages. The price differs between the recipient types – with emails being the most expensive for some reason. But deliveries to Lambda and SQS are free.
Kinesis Data Streams pricing
With Kinesis, you pay for shards, PUT payload units, and then for several additional capabilities, like data retention above seven days. But let’s focus on those shards and PUT payload units.
As I mentioned before, Kinesis does not autoscale. Instead, we set the number of shards that we want our stream to have. This is the first thing we pay for – for each shard, for how long it exist.
Secondly, we pay for sending messages to the stream. In the simplest scenario, every message corresponds to a single PUT payload unit. But we can send multiple messages to the stream in a batch, and every 25 KB of the payload counts as a 1 PUT payload unit. Again, it also works the other way around – a single message greater than 25 KB will count as multiple PUT payload units.
The EventBridge pricing model is similar to SNS one. You pay for sending messages to it. You don’t pay for messages sent to the EventBridge automatically from various AWS services.
You are also charged for every delivery of the message to the subscribers. It’s a single price per event, no matter the type of subscriber.
Necessary pricing note
Take into account that using those services may generate other charges along the way. For example, it may be Lambda function invocations or data transfer: out to the internet or between the regions.
Selecting proper messaging service
While I can provide you some general guidance, choosing a proper messaging service is not always easy. Sometimes you have to balance between the pros and cons of alternative approaches because you can achieve the same effect with different services or a combination of them.
But here are a few tips and examples that I can give you:
- Massive amount of events, like a clickstream from an app? Kinesis Data Stream
- System events that multiple services must receive and react to, such as a new user registered? EventBridge
- An event processed by a single consumer and specific for that consumer, like an image processing job to do? SQS
- Sending messages to people? SNS
From the financial point of view, a lot depends on the actual throughput. Just a few messages per hour? SQS will be almost free, while with Kinesis you need to pay for a whole one shard anyway. But send a clickstream from a popular application, and Kinesis will be cheaper by several orders of magnitude.
I wanted this overview to be brief, but somehow <280 characters tweets exploded into several paragraphs each. And still, it covers only the basic ideas of each of the messaging services. They all have additional features that you can utilize.
I hope this will make it a little simpler for you to understand the differences between the SQS, SNS, Kinesis, and EventBridge, and help you make a good choice next time you need a messaging solution.