A simple Lambda function for optimizing Delta tables.
Go to file
R Tyler Croy 65d34abb9c Complete the necessary terraform code for all the necessary resources to run
This terraform code doesn't actually provision a bucket, the function assumes
that a bucket is defined elsewhere
2023-03-26 22:27:43 -07:00
src Initial commit of an optimization function 2023-03-26 21:14:43 -07:00
.gitignore Complete the necessary terraform code for all the necessary resources to run 2023-03-26 22:27:43 -07:00
Cargo.lock Initial commit of an optimization function 2023-03-26 21:14:43 -07:00
Cargo.toml Enable the most optimized release profile possible 2023-03-26 21:18:50 -07:00
LICENSE.txt Initial commit of an optimization function 2023-03-26 21:14:43 -07:00
README.adoc Initial commit of an optimization function 2023-03-26 21:14:43 -07:00
deployment.tf Complete the necessary terraform code for all the necessary resources to run 2023-03-26 22:27:43 -07:00

README.adoc

<html lang="en"> <head> </head>

Optimizing Lambda

This Lambda function exists to optimize the given table on a regular basis.

Infrastructure requirements

In order to deploy this in AWS Lambda, it must first be built with the cargo lambda command line tool, e.g.:

cargo lambda build --release --output zip

This will produce the file: target/lambda/http-to-delta/bootstrap.zip

Environment variables

Name Value Notes

DATALAKE_LOCATION

s3://my-bucket-name/databases/bronze/http

The s3:// URL of the desired bucket to be written, with the prefix for the specific table this function should write to such as in the example value.

AWS_S3_LOCKING_PROVIDER

dynamodb

This instructs the deltalake crate to use DynamoDB for locking to provide consistent writes into s3.

AWS configuration

In addition to setting up the Lambda function with the custom runtime, an S3 bucket for writing Delta records to must be created. The execution role for the Lambda must have access to perform S3 operations on that bucket, as well as access DynamoDB.

Create a DynamoDB table named delta_rs_lock_table with the partition key of key. This will ensure consistent writes to S3 among multiple Lambda functions.

</html>