All managed services will have trade-offs. When Scribd adopted AWS ElastiCache we could no longer use Datadog’s excellent Redis integration and lost some killer metrics we couldn’t live without. We deployed the AWS ElastiCache integration for Datadog which returned the desired metrics back to our dashbards with one notable exception: “slowlog” metrics.
The Redis SLOWLOG
is used to help identify queries
which are taking too long to execute. We use the slowlog metrics provided by the
Datadog Redis integration alert us when a Redis server’s behavior starts to go
south, a key indicator of looming user-impactful production issues.
Since AWS ElastiCache is a managed service, we obviously cannot deploy a Datadog agent onto AWS’ servers to run the Datadog Redis integration. The approach we have taken, which we have now open sourced, is to use AWS Lambda to periodically query our ElastiCache Redis instances and submit the missing slowlog metrics directly to Datadog, just as the Redis integration would have done.
The Lambda job
The first part of the equation is our Lambda job:
elasticache-slowlog-to-datadog
which connects to an AWS ElastiCache host (determined by the REDIS_HOST
parameter),
gather its slowlogs, and submit a
HISTOGRAM
metric type to Datadog. Basically mirroring the functionality of the Datadog Redis integration.
The application is packaged with its required libraries as a ready-to-deploy archive in our releases page. To deploy directly to AWS from the console, upload the “Full zip distribution” and supply the required parameters. I’d recommend using our Terraform module, however.
The Terraform Module
The second part of the equation is the Terraform module: terraform-elasticache-slowlog-to-datadog which will apply the elasticache-slowlog-to-datadog Lambda job to target AWS accounts and ElastiCache instances.
When Lambda jobs include libraries that must be vendored in, as
elasticache-slowlog-to-datadog
does, the existing patterns include building
locally, or uploading artifacts to
S3.
However, I like the approach of maintaining a separate repository and build
pipeline, as this works around Terraform’s intentionally limited build
functionality.
The terraform module consumes the
elasticache-slowlog-to-datadog
artifact.
Usage
To deploy elasticache-slowlog-to-datadog via Terraform, add the following to your terraform file:
module slowlog_check {
source = "git::https://github.com/scribd/terraform-elasticache-slowlog-to-datadog.git?ref=master"
elasticache_endpoint = "master.replicationgroup.abcdef.use2.cache.amazonaws.com"
elasticache_security_groups = ["sg-12345"]
subnet_ids = [ "subnet-0123456789abcdef", "subnet-abcdef1234567890", "subnet-1234567890abcdef", ]
vpc_id = "vpc-0123456789abcdef"
datadog_api_key = "abc123"
datadog_app_key = "abc123"
namespace = "example"
env = "dev"
tags = {"foo" = "bar"}
}
Conclusion
Using AWS Lambda, we can supplement the metrics we get natively from Datadog’s AWS ElastiCache integration.
Stay apprised of future developments by watching our release pages: