lambda-delta-optimize/README.adoc

2.2 KiB

<html lang="en"> <head> </head>

Delta Optimize Lambda

This Lambda function can be used with a periodic trigger to optimize a configured Delta Lake table. Consult the deployment.tf file for an example of how to provision the function in AWS.

Building

Building and testing the Lambda can be done with cargo: cargo test build.

In order to deploy this in AWS Lambda, it must first be built with the cargo lambda command line tool, e.g.:

cargo lambda build --release --output-format zip

This will produce the file: target/lambda/lambda-delta-optimize/bootstrap.zip

Infrastructure

The deployment.tf file contains the necessary Terraform to provision the function, a DynamoDB table for locking, and IAM permissions. This Terraform does not provision an S3 bucket to optimize.

After configuring the necessary authentication for Terraform, the following steps can be used to provision:

cargo lambda build --release --output-format zip
terraform init
terraform plan
terraform apply
Note

Terraform configures the Lambda to run with the smallest amount of memory allowed. For a sizable table, this may not be sufficient for larger tables.

Environment variables

The following environment variables must be set for the function to run properly

Name Value Notes

DATALAKE_LOCATION

s3://my-bucket-name/databases/bronze/http

The s3:// URL of the desired table to optimize.

AWS_S3_LOCKING_PROVIDER

dynamodb

This instructs the deltalake crate to use DynamoDB for locking to provide consistent writes into s3.

OPTIMIZE_DS

yesterday

Only apply optimizations to the ds partition (YYYY-mm-dd), the yesterday value will use the previous day UTC

Licensing

This repository is intentionally licensed under the AGPL 3.0. If your organization is interested in re-licensing this function for re-use, contact me via email for commercial licensing terms: rtyler@brokenco.de

</html>