Copy edit and spice up the introduction for Jim's slowlog post

This commit is contained in:
R Tyler Croy 2020-04-28 09:24:17 -07:00
parent d02e40f06d
commit 10401f6798
No known key found for this signature in database
GPG Key ID: E5C92681BEF6CEA2
3 changed files with 101 additions and 57 deletions

View File

@ -1,57 +0,0 @@
---
layout: post
title: "Sending Elasticache slowlog metrics to Datadog"
authors:
- jimp
tags:
- terraform
- monitoring
team: Core Infrastructure
---
# Sending Elasticache slowlog metrics to Datadog
Weve recently migrated our Redis workloads to AWS Elasticache. We really like having a managed service, since we dont have any more Redis servers to maintain. However, as with all managed services, there are some tradeoffs. One of those tradeoffs was that we no longer had access to all of [Datadogs Redis integrations](https://docs.datadoghq.com/integrations/redisdb/) features. Instead, we have [Datadogs AWS Elasticache integration](https://docs.datadoghq.com/integrations/amazon_elasticache/#overview). One of the most noticeable features we saw as missing was the lack of slowlog metrics in AWS Elasticache. This metric is useful as it gives us valuable data to alert against when Redis behavior starts running afoul. There is no ability to run a Datadog agent on Elasticaches servers, so we had to obtain the metric some other way. 
We decided to use AWS Lambda to periodically query our Elasticache redis instances and submit those missing slowlog metrics directly to Datadog, much as the datadog-agent integration would have otherwise.  
## The Lambda job
We wrote [https://github.com/scribd/elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog) to connect to an AWS Elasticache job (given by the REDIS_HOST parameter), gather its slowlogs, and submit a [HISTOGRAM](https://docs.datadoghq.com/developers/metrics/types/?tab=histogram) metric type to Datadog, consistent with Datadogs Redis integration. 
The application is packaged with its required libraries as a ready-to-deploy archive in our [releases page](https://github.com/scribd/elasticache-slowlog-to-datadog/releases). To deploy directly to AWS from the console, upload the “Full zip distribution” and supply the [required parameters](https://github.com/scribd/elasticache-slowlog-to-datadog#parameters). Id recommend using our Terraform wrapper, however.
## The Terraform wrapper
We wrote [https://github.com/scribd/terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog) to apply the elasticache-slowlog-to-datadog lambda job to target AWS accounts and Elasticache instances. 
When lambda jobs include libraries that must be vendored in, as elasticache-slowlog-to-datadog does, the existing patterns include [building locally, or uploading artifacts to S3](https://www.terraform.io/docs/providers/aws/r/lambda_function.html#specifying-the-deployment-package). However, I like the approach of maintaining a separate repository and build pipeline, as this works around Terraforms [intentionally limited build functionality](https://github.com/hashicorp/terraform/issues/8344#issuecomment-361014199). Instead, the terraform wrapper merely [consumes the elasticache-slowlog-to-datadog artifact](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/blob/master/main.tf#L97).
## Usage
To deploy elasticache-slowlog-to-datadog via terraform, add the following to your terraform file: 
```
module slowlog_check {
  source                      = "git::https://github.com/scribd/terraform-elasticache-slowlog-to-datadog.git?ref=master"
  elasticache_endpoint        = "master.replicationgroup.abcdef.use2.cache.amazonaws.com"
  elasticache_security_groups = ["sg-12345"]
  subnet_ids                  = [ "subnet-0123456789abcdef", "subnet-abcdef1234567890", "subnet-1234567890abcdef", ]
  vpc_id                      = "vpc-0123456789abcdef"
  datadog_api_key             = "abc123"
  datadog_app_key             = "abc123"
  namespace                   = "example"
  env                         = "dev"
  tags                        = {"foo" = "bar"}
}
```
## Conclusion
Using AWS Lambda, we can supplement the metrics we get natively from Datadogs AWS Elasticache integration. 
Stay apprised of future developments by watching our release pages: 
- [elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog/releases)
- [terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/releases)

View File

@ -0,0 +1,95 @@
---
layout: post
title: "Sending ElastiCache slowlog metrics to Datadog"
authors:
- jimp
tags:
- terraform
- elasticache
- aws
- monitoring
team: Core Infrastructure
---
All managed services will have trade-offs, when we adopted AWS ElastiCache we
could no longer use Datadog's excellent excellent [Redis
integration](https://docs.datadoghq.com/integrations/redisdb/)
and some killer metrics we couldn't live without.
We deployed the [AWS ElastiCache
integration](https://docs.datadoghq.com/integrations/amazon_elasticache/#overview).
for Datadog which returned some of the desired metrics back to our dashbards
with one notable exception: "slowlog" metrics. The Redis
[`SLOWLOG`](https://redis.io/commands/slowlog) is used to help identify queries
which are taking too long to execute. We use the slowlog metrics provided by the
Datadog Redis integration alert us when a Redis server's behavior starts to go
south, a key indicator of looming user-impactful production issues.
Since AWS ElastiCache is a managed service, we obviously cannot deploy a
Datadog agent onto AWS' servers to run the Datadog Redis integration. The
approach we have taken, which we have now open sourced, is to use AWS Lambda to
periodically query our ElastiCache Redis instances and submit the missing
slowlog metrics _directly_ to Datadog, just as the Redis integration would have
done.  
## The Lambda job
The first part of the equation is our Lambda job:
[elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog)
which connects to an AWS ElastiCache host (determined by the `REDIS_HOST` parameter),
gather its slowlogs, and submit a
[`HISTOGRAM`](https://docs.datadoghq.com/developers/metrics/types/?tab=histogram)
metric type to Datadog. Basically mirroring the functionality of the Datadog Redis integration.
The application is packaged with its required libraries as a ready-to-deploy
archive in our [releases
page](https://github.com/scribd/elasticache-slowlog-to-datadog/releases). To
deploy directly to AWS from the console, upload the “Full zip distribution” and
supply the [required
parameters](https://github.com/scribd/elasticache-slowlog-to-datadog#parameters).
Id recommend using our Terraform wrapper, however.
## The Terraform wrapper
The second part of the equation is the Terraform module:
[terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog)
which will apply the elasticache-slowlog-to-datadog Lambda job to target AWS accounts
and ElastiCache instances. 
When Lambda jobs include libraries that must be vendored in, as
`elasticache-slowlog-to-datadog` does, the existing patterns include [building
locally, or uploading artifacts to
S3](https://www.terraform.io/docs/providers/aws/r/lambda_function.html#specifying-the-deployment-package).
However, I like the approach of maintaining a separate repository and build
pipeline, as this works around Terraforms [intentionally limited build
functionality](https://github.com/hashicorp/terraform/issues/8344#issuecomment-361014199).
In essence, the terraform wrapper merely [consumes the
elasticache-slowlog-to-datadog
artifact](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/blob/master/main.tf#L97).
## Usage
To deploy elasticache-slowlog-to-datadog via Terraform, add the following to your terraform file: 
```
module slowlog_check {
  source                      = "git::https://github.com/scribd/terraform-elasticache-slowlog-to-datadog.git?ref=master"
  elasticache_endpoint        = "master.replicationgroup.abcdef.use2.cache.amazonaws.com"
  elasticache_security_groups = ["sg-12345"]
  subnet_ids                  = [ "subnet-0123456789abcdef", "subnet-abcdef1234567890", "subnet-1234567890abcdef", ]
  vpc_id                      = "vpc-0123456789abcdef"
  datadog_api_key             = "abc123"
  datadog_app_key             = "abc123"
  namespace                   = "example"
  env                         = "dev"
  tags                        = {"foo" = "bar"}
}
```
## Conclusion
Using AWS Lambda, we can supplement the metrics we get natively from Datadogs AWS ElastiCache integration. 
Stay apprised of future developments by watching our release pages: 
- [elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog/releases)
- [terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/releases)

6
tag/elasticache/index.md Normal file
View File

@ -0,0 +1,6 @@
---
layout: tag_page
title: "Tag: elasticache"
tag: elasticache
robots: noindex
---