scribd.github.io/_posts/2020-04-22-how-scribd-manag...

4.3 KiB
Raw Blame History

layout title authors tags team
post How Scribd manages Datadogs AWS integration using Terraform
jimp
qphou
featured
terraform
monitoring
Core Infrastructure

Datadog comes with a builtin AWS integration to ship CloudWatch metrics to your Datadog account. Once enabled, the integration will automatically synchronize whitelisted CloudWatch metrics into your Datadog account.

While this integration is powerful and convenient to use, its setup process is actually quite involved. As outlined in Datadog's documentation, there are 18 manual steps required, including:

  • finding the right AWS account ID
  • creating the right IAM policy
  • copy pasting the right AWS resource ID into Datadog UI
  • etc.

If you have more than a few AWS accounts, you may prefer to use Terraform.

In this blog post, we would like to share how Scribd uses Terraform to automate our Datadog and AWS integration across the organization.

Enable Datadogs builtin AWS integration

To address this problem, we built the terraform-aws-datadog module. With only couple lines of HCL code, Terraform will perform all the necessary steps to setup Datadog integration with a specific AWS account with Scribds best practices:

module "datadog" {
  source                = "git::https://github.com/scribd/terraform-aws-datadog.git?ref=master"
  aws_account_id        = data.aws_caller_identity.current.account_id
  datadog_api_key       = var.datadog_api_key
  env                   = "prod"
  namespace             = "team_foo"
}

The benefit from an AWS Account maintainer point of view is that using the module is a convenient way to inherit centralized best practice. For module maintainers, any change to the datadog integration module can be released using a standard Terraform module release process.

Cloudwatch log synchronization

Initially, the module only sets up the base integration. As adoption increased, more features were added to the module by various teams. One of these features is automation for setting up log ingestion for cloudwatch.

Like setting up the official AWS integration app, the instructions for log synchronization are a bit overwhelming.

However, using the terraform-aws-datadog module, we can enable the feature with a single parameter:

module "datadog" {
  source                = "git::https://github.com/scribd/terraform-aws-datadog.git?ref=master"
  datadog_api_key       = var.datadog_api_key
  env                   = "prod"
  namespace             = "project_foo"
  cloudwatch_log_groups = ["cloudwatch_log_group_1", "cloudwatch_log_group_2"]
}

Thats it, Terraform will automatically create the datadog serverless function and triggers for specified log groups to forward all cloudwatch logs into Datadog. After running terraform apply, you should be able to see logs showing up in Datadog within minutes.

Future work

With both metrics and logs synchronized into Datadog, we are able to leverage Datadog as the central hub for all things monitoring. We are planning to bring more features to the module as we migrate Scribds infrastructure into AWS.

Metrics ingested through the official AWS integration are delayed by couple minutes, which is not ideal to use as signals for monitoring critical systems. There are opportunities to enable real time metrics synchronization by automating datadog agent setup.

The datadog-serverless-functions repo contains two other lambda based AWS augmentations that we may add as available features of the module: vpc_flow_log_monitoring and rds_enhanced_monitoring.

Stay apprised of future releases by watching our release page.

Special shout out to Taylor McClure and Hamilton Hord for starting the project, as well as Sai Kiran Burle, Kamran Farhadi and Eugene Pimenov for improvements and bug fixes.