How to choose the best event source for pub/sub messaging with AWS Lambda
Publish-subscribe (pub/sub) messaging is a versatile pattern that enables loose coupling between message publishers and consumers. AWS offers several options for implementing pub/sub with AWS Lambda, including Amazon SNS, Kinesis streams, and DynamoDB streams as event sources. In this article, we‘ll take an in-depth look at these options and provide guidance on how to choose the best event source for your use case.
The pub/sub messaging pattern
In the pub/sub pattern, publishers send messages to a topic or channel without knowing who or how many subscribers there are. Subscribers express interest in one or more topics and only receive messages for topics they‘ve subscribed to. The pub/sub model allows for greater scalability and flexibility than traditional point-to-point messaging.
Some common use cases for pub/sub include:
- Notifying multiple consumers about events like changes in data or system state
- Enabling parallel asynchronous processing of messages
- Implementing event-driven architectures and serverless workflows
AWS Lambda is a natural fit for consuming messages in a pub/sub system. Lambda functions can be triggered by messages from SNS topics, Kinesis streams, and DynamoDB streams. The Lambda service takes care of scaling function instances up and down to match the rate of incoming messages.
Overview of AWS event sources for pub/sub with Lambda
There are three main options for event sources that support pub/sub messaging with Lambda:
- Amazon Simple Notification Service (SNS)
- Amazon Kinesis Data Streams
- Amazon DynamoDB Streams
Let‘s look at each of these in more detail.
Amazon SNS as an event source
Amazon SNS is a fully managed pub/sub messaging service. With SNS, you create a topic and control access to it using AWS Identity and Access Management (IAM). Publishers send messages to the SNS topic, and SNS delivers the message to all subscribers.
Lambda functions can subscribe to SNS topics. Whenever a message is sent to the subscribed topic, Lambda will invoke your function, passing in the message payload as a parameter. SNS is a push-based system, so Lambda is invoked as soon as a message is available.
Some characteristics of using SNS as an event source for Lambda:
- Automatic triggering of Lambda for each message
- Highly scalable – supports high throughput
- Parallel execution – each message invokes a separate instance of the Lambda function
- Configurable retry policy for failures (Lambda retries, dead-letter queues)
- Subject filtering – Lambda can filter messages by a field in the message attributes
- Fan-out architecture – deliver the same message to multiple subscribers
Pros of SNS:
- Simple setup and configuration
- Fully serverless and managed
- Supports wide variety of subscriber types (Lambda, SQS, HTTP/S, email, SMS)
- Built-in message filtering
Cons of SNS:
- No message ordering guarantees
- Limited to maximum message size of 256 KB
- Requires careful management of IAM permissions
SNS is a good choice when you need to fan out messages to multiple destinations and don‘t need strict message ordering. It‘s commonly used for notifications, webhooks, and simple workflows that can be accomplished with retries.
Amazon Kinesis as an event source
Kinesis Data Streams is a scalable real-time data streaming service. Data producers send records to a Kinesis stream, and consumers process the records in real time. Kinesis streams retain data for up to 7 days, allowing for replay and reprocessing.
Lambda can be configured to poll a Kinesis stream and invoke a function to process batches of records. Each Lambda function is associated with a particular shard in the stream. The number of shards determines the read throughput and affects Lambda parallelism.
Some characteristics of using Kinesis as an event source for Lambda:
- Polling-based – Lambda polls the stream for new records
- Configurable batch size (1 – 10,000) and batch window
- In-order processing per shard
- Reads are throttled by number of shards
- Automatic scaling of Lambda up to concurrent execution limit
- Records may be processed more than once in event of failure
- Data retention allows for replaying records
Pros of Kinesis:
- Supports high throughput and large messages (up to 1 MB)
- Ordered records within a shard
- Long data retention (1-7 days) allows replayability
- Dedicated throughput per shard
- Scales to handle high-velocity data
Cons of Kinesis:
- More complex to set up and manage than SNS
- Have to provision and scale shards, which affects cost
- Polling has some latency vs push model
- Repeated processing of the same records is possible
Kinesis is a good fit for use cases that require high throughput, ordered messaging, or the ability to replay records. Examples include real-time analytics, IoT data ingestion, clickstream processing, and complex event processing.
DynamoDB streams as an event source
DynamoDB is a NoSQL database that supports triggers via DynamoDB streams. Whenever data is modified in a DynamoDB table, the changes can be captured and streamed to Lambda.
With DynamoDB streams, Lambda polls the stream for changes and can be configured to process batches of records. Each Lambda function processes records from a particular shard, similar to Kinesis.
Some unique aspects of DynamoDB streams:
- Tightly integrated with DynamoDB as the source
- Captures create, update, and delete events
- Automatic scaling of shards based on table traffic
- No additional cost beyond regular DynamoDB pricing
From a development perspective, DynamoDB streams have some limitations:
- Each stream only contains events for one table
- Records describe low-level events, requiring translation to domain events
- 24 hour data retention, less than Kinesis
- Only supports Lambda as a consumer
Pros of DynamoDB streams:
- Easy to set up triggers for DynamoDB tables
- Automatic scaling and pricing along with table
- In-order, exactly-once processing semantics
Cons of DynamoDB streams:
- Limited to DynamoDB as the source
- Short data retention, no replay after 24 hours
- Records are low-level DynamoDB events
DynamoDB streams are a good choice when you primarily need to react to changes in a DynamoDB table. Common use cases include notifications, data aggregation, and updating denormalized copies of data.
Cost comparison
The cost of SNS, Kinesis and DynamoDB streams depends on your usage in terms of message volume, size, and velocity. Some high-level characteristics:
- SNS has no upfront costs, you pay per million requests + data transfer out
- Kinesis has a per-shard hourly cost + put payload unit cost
- DynamoDB streams costs are bundled with table reads/writes + data transfer out
To get a rough idea, here are some monthly cost projections at different usage levels, assuming 1KB messages:
1 msg/sec | 1,000 msg/sec | |
---|---|---|
SNS | $0.50 | $43 |
Kinesis (1 shard) | $11 | $32 |
DynamoDB streams | $0 | $1000+ |
Keep in mind these are simplified estimates. In reality, messages are likely to be larger and traffic may be bursty. You‘ll need to provision some extra capacity for Kinesis and DynamoDB streams to handle peaks.
The key takeaway is that while Kinesis has a higher upfront cost due to per-shard pricing, the per-message cost is lower than SNS at high volumes. DynamoDB streams costs can vary widely depending on table throughput.
Using Lambda as a message broker
Beyond SNS, Kinesis and DynamoDB streams, you can use Lambda functions themselves as brokers to propagate messages between services. The "lambda-fanout" project from AWS Labs demonstrates this pattern.
With lambda-fanout, a Lambda function consumes events from a source like Kinesis and forwards them to destinations like SQS queues or other Lambda functions. This allows propagating messages across accounts and regions where direct integration may not be possible.
While a valid approach for some niche use cases, using Lambda as a broker adds complexity and cost. You need to factor in additional messaging latency, error handling, and the operational burden of deploying and monitoring the fanout function. Generally it‘s best to use Lambda as a broker only when native integration is not possible.
Choosing the right event source
With the different options available, how do you choose the right event source for your pub/sub use case? Here are some of the key factors to consider:
- Scalability and throughput needs
- Ordering and exactly-once processing requirements
- Message size
- Latency sensitivity
- Durability and retention
- Ease of use and management
- Cost at your usage level
Here are some common scenarios and the event source that might fit best:
Scenario | Event Source | Rationale |
---|---|---|
Sending notifications to many subscribers | SNS | Simple pub/sub model, supports multiple subscriber types |
Real-time stream processing | Kinesis | High throughput, supports ordering, long retention |
Triggering on DynamoDB changes | DynamoDB streams | Tightly integrated with table, exactly-once semantics |
Routing messages across regions | Lambda as broker | Enables forwarding where direct integration is not possible |
In reality, many systems will utilize multiple event sources. You might use SNS for notification events, Kinesis for streaming data, and DynamoDB streams for database triggers. The key is to understand the tradeoffs and choose the right tool for each job.
Best practices for Lambda-based pub/sub
Whichever event source you use, there are some best practices to keep in mind for building reliable and scalable pub/sub systems with Lambda:
- Manage Lambda permissions with IAM roles and resource policies
- Control concurrency with reserved concurrency limits
- Avoid long-running tasks that may exceed Lambda timeouts
- Use dead-letter queues for messages that can‘t be processed
- Monitor performance with Lambda metrics and logging
- Load test to validate scalability and throughput
- Deploy and update functions with CI/CD pipelines
- Consider VPC and security configurations for private resources
Conclusion
AWS provides a spectrum of options for implementing pub/sub messaging with Lambda, each with its own strengths and tradeoffs. SNS is a simple and versatile managed pub/sub service, Kinesis enables high-throughput streaming use cases, and DynamoDB streams unlock database triggers. In some cases, you can even use Lambda as a message broker.
The right choice depends on your requirements for scalability, ordering, durability, and cost. By understanding the capabilities and tradeoffs of each option, you can design an event-driven architecture that is scalable, flexible, and fits your use case. As always with serverless, make sure to follow Lambda best practices around permissions, concurrency, monitoring, and testing for maximum reliability.
Pub/sub messaging is a powerful pattern for building loosely coupled, event-driven systems. By leveraging the managed event sources for Lambda, you can implement pub/sub in a scalable and resilient way without having to manage the underlying messaging infrastructure. Choose the right event source, follow best practices, and let Lambda do the heavy lifting of processing your messages.