- AWS Messaging Decision Tree
- Choices explained step by step
- Sending messages to end-users?
- Sending messages to external APIs?
- Sending high volume of messages?
- Analyzing messages stream in real-time?
- Archiving and replaying messages?
- Sending heterogeneous messages?
- Sending messages to multiple consumers?
- The other 10%
- Why no MSK?
- Conclusion & Trivia
Have you ever been stuck deciding between SQS, SNS, Kinesis Streams, and EventBridge? Struggled to pick the right one for your use case? If the answer is “yes”, I’ve got you covered with a simple decision tree to help you make the right decision.
I already wrote an overview of messaging on AWS, describing the capabilities of SQS, SNS, Kinesis Data Streams, and EventBridge. It included a few examples of when to use which service. This follow-up extends this topic, intended to make a choice as easy as possible.
AWS Messaging Decision Tree
You came here to see the nice diagram, so here it is. As per my calculations, following it gives you a 90% chance of making the right choice.
Choices explained step by step
Sending messages to end-users?
If you want to send messages directly to end-users (A2P – application-to-person messaging), the only viable option is the SNS. To send SMS or push notifications, you don’t need to implement anything more on your own.
In the past, I developed and maintained an application for sending push notifications to Android, iOS, and Windows Phone (old days). Each provider had a different API and requirements. iOS was especially nasty, with a long-living connection and binary protocol. You don’t ever want to do it yourself, believe me.
While SNS also supports sending emails, it’s intended for alarms and system notifications, not marketing and transactional emails to your users. But it’s perfect for sending CloudWatch Alarms to developers.
Sending messages to external APIs?
AWS can also take the messaging to external systems off your shoulders. Using EventBridge, you can communicate over the most common application interface – HTTP(S). With a built-in retry and rate-limit per second capabilities, it’s an excellent replacement for a custom Lambda function sending data to an external endpoint.
With flexible customization of the HTTP method, path, authorization, headers, and body, you can reliably send requests to any SaaS application accepting webhooks.
While SNS also supports HTTP endpoints as a destination, it’s rather limited. It does not allow you to modify the request body format. In addition, the target needs to handle signature verification against the provided X509 certificate for authorization. Thus the EventBridge will almost always be a better choice in this field.
Sending high volume of messages?
Kinesis Streams is the best option for sending a high volume of messages, like website or application clickstream, transactions log, tracking events – everything requiring a high throughput. On the other hand, Kinesis will be far more costly than other services if you send a relatively small number of messages.
A very simplified calculation shows that the threshold is around 200 messages per minute. After that, it’s more profitable to use Kinesis Streams than alternatives. But take it with a grain of salt and calculate for your specific conditions. Data on this chart represents a perfect case with several assumptions (uniform distribution over time, single receiver, etc.).
Analyzing messages stream in real-time?
Kinesis Streams is also the go-to option if you need to analyze the messages as the stream, not only as the individual values.
Real-time stream analytics includes operations like detecting value changes compared to the previous values and time window-based aggregations. Time windows allow calculations on values aggregated per non-overlapping time intervals (tumbling windows) or in continuously moving time range (sliding windows).
Such analysis on Kinesis Streams can be done in three ways:
- with Kinesis Analytics
- using SQL
- using Apache Flink
- with Lambda (tumbling windows only)
Archiving and replaying messages?
Keeping the log of events and the ability to push them again through the system to repeat operations or transactions is not an uncommon business requirement. Wouldn’t it be nice to have this supported out of the box?
That’s exactly what EventBridge offers.
Sending heterogeneous messages?
Another reason to choose EventBridge is when you send messages of multiple various types. With other messaging services, there is a convention to create a single resource (SQS queue, SNS topic, or Kinesis Data Streams stream) per message type. With EventBridge, you push everything to a single event bus and let consumers subscribe for the messages they are interested in. Therefore, if you have a variety of message types, it’s easier to send them all to a single EventBridge event bus than to maintain a set of separate SNS topics.
For instance, you may need to propagate different user activities handled by separate microservices. It can include account creation, successful payment, subscription cancelation, etc. With EventBridge, you don’t need multiple queues or topics, while consumer services still receive only what they are interested in.
Sending messages to multiple consumers?
The last distinction in the decision tree is the number of consumers. If you want multiple consumers to be able to receive and process messages, use SNS. Otherwise, with only a single consumer processing messages (like a single Lambda function), use SQS.
The other 10%
I noted above that this decision tree will work for only 90% of cases. What with the other 10%?
Sometimes your choice may be dictated by the unique features and capabilities of individual messaging services, such as:
- processing the queue of messages with rate-limiting with SQS,
- maintaining the messages for up to 365 days and letting the individual consumers fetch the historical records with Kinesis Streams.
Yet another case is service composition. If at any step of the decision tree your answer was “yes, but…”, then more than one service may be needed to achieve the expected result. Thankfully, they cooperate nicely, allowing sending messages directly from one to another. An example could be an SNS topic to send messages to multiple consumers, including an SQS queue, from which a particular client processes messages at their own rate.
Why no MSK?
(Update from 2022-04-23)
After sharing this decision tree on Twitter and receiving tons of likes, the most common question was: why is there no MSK? MSK being the AWS managed Kafka cluster. Formally, Amazon Managed Streaming for Apache Kafka.
Firstly, I don’t use it. Not knowing something is a solid reason not to recommend something, in my opinion.
But why I don’t use it? I see it as a solution for when you need Kafka in the cloud because of existing applications or specific capabilities. It’s not something I would consider for new, cloud-native applications. Even while AWS manages the cluster and availability, it’s our responsibility to manage VPC, underlying machines size, storage size, and cluster configuration.
Even if you are not doing serverless, it’s much more to manage than with SQS, SNS, EventBridge, and Kinesis.
Don’t get me wrong – I’m not saying the MSK is bad. It’s just if I would include it on the decision tree, it would be “need Kafka? → MSK”.
Conclusion & Trivia
It’s easy to get confused when trying to figure out the right AWS messaging service for your use case. SQS, SNS, Kinesis Streams, and EventBridge have as many similarities as different capabilities they provide. But now, armed with a good decision tree and a little common sense, you’ll always know which service to use when. At least, I hope so. And as always, if you have any questions or doubts, feel free to ask in the comments below.
I’ve played with a GPT-3-based text generation tool. Based on a single-sentence summary of the article, it generated the first paragraph (“Have you ever…”) that I only slightly adjusted. Pretty nice. I’ve also used another single sentence generated by it. 10 points for Gryffindor for pointing out which one – you can guess in the comments.