How to Implement Log Aggregation for AWS Lambda Functions

As a serverless developer, it‘s critical to have full visibility into the execution of your Lambda functions for monitoring, auditing, and troubleshooting purposes. With a traditional server-based application, you would typically install a logging agent on each server to collect and forward logs to a central log aggregation service for analysis. But with the serverless paradigm, you don‘t have access to the underlying infrastructure running your code.

Thankfully, AWS Lambda provides built-in log aggregation to CloudWatch Logs without any extra setup or configuration required. In this post, we‘ll dive deep into how Lambda logging works under the hood and explore best practices for implementing a scalable, cost-effective, and automated log aggregation pipeline for your serverless applications on AWS. Let‘s get started!

Understanding Lambda Logs and CloudWatch Logs

Whenever a Lambda function is invoked, any output written to stdout or stderr (e.g. using console.log in Node.js) is automatically captured by the Lambda runtime and sent to CloudWatch Logs asynchronously in the background. Importantly, this logging process does not add any overhead to your function‘s execution time or memory usage.

In CloudWatch Logs, a separate log group is created for each of your Lambda functions. Within each log group, log streams are created – one for every concurrent execution environment of that function. So if your function is processing multiple events simultaneously, their logs will appear in separate streams within the log group.

Here‘s an example of what the CloudWatch Logs console looks like for a Node.js Lambda function:

/aws/lambda/my-function-name
    2022/04/01/[$LATEST]0123456789abcdef0
        START RequestId: 0123456789abcdef0 Version: $LATEST
        2022-04-01T12:34:56.789Z    0123456789abcdef0   INFO    Hello from Lambda!  
        END RequestId: 0123456789abcdef0
        REPORT RequestId: 0123456789abcdef0 Duration: 123.45 ms Billed Duration: 124 ms Memory Size: 128 MB Max Memory Used: 57 MB  
    2022/04/01/[$LATEST]abcdef0123456789ab
        START RequestId: abcdef0123456789ab Version: $LATEST
        2022-04-01T12:34:57.890Z    abcdef0123456789ab  INFO    Hello again from Lambda!
        ...

As you can see, each invocation generates START, END and REPORT lines with metadata about the request, in addition to any application logs. This structured logging makes it easy to search and filter your logs in CloudWatch.

While CloudWatch Logs provides basic log viewing and searching capabilities, for production applications you‘ll likely want to forward your logs to a more powerful log aggregation service like Elasticsearch, Splunk, Datadog, Sumo Logic, etc. Let‘s look at some options for integrating CloudWatch with external logging solutions.

Streaming Lambda Logs to Amazon Elasticsearch

If you‘re already using the managed Amazon Elasticsearch Service (Amazon ES), the simplest way to aggregate your Lambda logs is to stream them directly from CloudWatch to Amazon ES. You can set this up right from the CloudWatch Logs console by selecting the log group and choosing "Stream to Amazon Elasticsearch Service".

You‘ll need to provide the endpoint of your Elasticsearch cluster, as well as the index name and IAM role to use. CloudWatch will then begin forwarding logs to Elasticsearch in near real-time where you can view and analyze them using Kibana.

Keep in mind that Amazon ES has some limitations compared to self-hosted or other managed Elasticsearch offerings. Be sure to evaluate factors like cost, scale, version currency, and plugin support to determine if it meets your needs.

Forwarding Logs to Other Services via Lambda

If you want to send logs to a service other than Amazon ES, you can use a Lambda function as an intermediary to ship logs from CloudWatch to your desired destination. AWS provides several blueprints for this purpose in the Lambda console, with pre-written functions for services like Sumo Logic, Splunk, and Loggly.

Here‘s a simple example of a Node.js function that forwards logs to a hypothetical logging HTTP endpoint:

const https = require(‘https‘);

exports.handler = async (event) => {
  const payload = Buffer.from(event.awslogs.data, ‘base64‘);
  const parsed = JSON.parse(zlib.gunzipSync(payload).toString(‘utf8‘));
  console.log(‘Received log events:‘, JSON.stringify(parsed));

  const logEvents = parsed.logEvents.map(logEvent => ({
    timestamp: new Date(logEvent.timestamp).toISOString(),    
    message: logEvent.message
  }));

  const data = JSON.stringify(logEvents);

  const options = {
    hostname: ‘logs.example.com‘,
    port: 443,
    path: ‘/logs‘,
    method: ‘POST‘,
    headers: {
      ‘Content-Type‘: ‘application/json‘,
      ‘Content-Length‘: data.length
    }
  };

  return new Promise((resolve, reject) => {
    const req = https.request(options, (res) => {
      console.log(‘Status code:‘, res.statusCode);
      resolve();
    }); 

    req.on(‘error‘, (err) => {
     console.error(‘Error shipping logs:‘, err);
     reject(err); 
    });

    req.write(data);
    req.end();
  });
};

Here the function parses the CloudWatch Logs data, extracts the relevant fields, and sends them to an HTTPS endpoint. You would deploy this function and then subscribe your desired log groups to invoke it.

However, there are a couple important considerations when implementing a log forwarding function:

  1. Ensure you include logic to handle failures and retries to avoid losing log data if the destination is unavailable. You may want to use a library like async.retry or exponential backoff to resend failed requests.

  2. Be mindful of the execution time of your log forwarding function. Remember that you pay for Lambda invocations based on their duration, so try to batch log events together and keep processing time minimal. You can configure the batch size when subscribing a log group to Lambda.

  3. Monitor the health of your logging infrastructure by tracking metrics like total invocations, errors, latency, and unprocessed log events using CloudWatch metrics. Set up alarms to notify you if failures exceed a threshold.

In the next section, we‘ll look at how to automate subscribing new log groups and managing retention policies to keep your logging pipeline running smoothly.

Automating Log Subscriptions and Retention Policies

As you add more and more Lambda functions to your application, subscribing each new log group to your forwarder function can become a maintenance burden. Instead, you can automatically subscribe new log groups as they are created using CloudTrail and CloudWatch Events.

First, enable CloudTrail logging for the CreateLogGroup API call in the CloudTrail console. Then, create a new CloudWatch Events rule to trigger on CloudTrail events where the eventName is CreateLogGroup:

resources:
  Resources:
    SubscribeRule: 
      Type: AWS::Events::Rule
      Properties:
        Description: Trigger subscription function on new log group
        EventPattern:
          source: 
            - aws.logs
          detail-type: 
            - AWS API Call via CloudTrail
          detail: 
            eventSource:
              - logs.amazonaws.com
            eventName:
              - CreateLogGroup
        State: ENABLED
        Targets:
          - Arn: !Sub "arn:aws:lambda:your-region:${AWS::AccountId}:function:your-subscribe-function"
            Id: SubscribeFunction

Here we‘re using CloudFormation to define the event rule, but you can also set it up in the CloudWatch console. The rule triggers a Lambda function that will call the PutSubscriptionFilter API to subscribe the new log group to your forwarder function.

Be sure to add logic to avoid infinitely invoking the subscription function by checking that the log group name does not match the one for the forwarder itself!

Next, let‘s address log retention. By default, when Lambda creates a new log group for a function, its retention policy is set to "Never Expire". While this ensures you never lose logs, the storage costs from CloudWatch Logs can balloon over time, especially with frequently invoked functions.

Since you‘re already streaming logs to an external service, you likely don‘t need to keep them in CloudWatch indefinitely. You can update your subscribe function to also call the PutRetentionPolicy API and set log groups to automatically expire after a reasonable period, such as 7 or 30 days.

Here‘s an example in Python:

import boto3

logs = boto3.client(‘logs‘) 

def lambda_handler(event, context):
    log_group_name = event[‘detail‘][‘requestParameters‘][‘logGroupName‘] 

    # call PutSubscriptionFilter to subscribe log group

    # update retention policy to 30 days
    logs.put_retention_policy(
        logGroupName=log_group_name,
        retentionInDays=30
    )

If you have existing log groups you need to update, you can write a simple script using the DescribeLogGroups API to list all log groups and then call PutRetentionPolicy on each. Be sure to paginate the results in case you have over 50 log groups.

For a complete working example of automating log subscriptions and retention policies, check out this open source Serverless Framework project on GitHub: https://github.com/serverless/examples/tree/master/aws-node-cloudwatch-logs-subscription

With these automation best practices in place, you can rest assured that your Lambda logs will flow seamlessly to your centralized log aggregation service without any manual toil. You‘ll have full visibility to monitor and debug your serverless application with the power and flexibility of your preferred logging solution.

Conclusion

Effective log aggregation is a critical component of any production serverless application. By taking advantage of Lambda‘s built-in CloudWatch integration and adding a few simple automation steps, you can implement a scalable, cost-effective, and low-maintenance logging pipeline for your Lambda functions.

Whether you choose to stream directly to Amazon Elasticsearch or use a Lambda forwarder to send logs to an external service, aggregating your Lambda logs will give you deeper visibility to optimize performance, troubleshoot bugs, and ensure the health of your serverless architecture. I hope this guide has equipped you with the tools and knowledge you need to implement log aggregation in your own AWS environment.

If you have any questions or suggestions to improve this logging setup, let me know in the comments below. Happy logging!

Similar Posts