R Tyler Croy
24834b9d13
Checkpoint every 10 commits.
...
Technically I suppose you could use the checkpoint lambda, but it's literally
one function call here :P
2023-05-07 19:16:43 -07:00
R Tyler Croy
aa8f2df8f2
Url decoded keys are needed much earlier in the processing of events
...
This commit incorporates a hack for aws_lambda_events not filling out
url_decoded_key to make everything easier downstream of the event loop
2023-05-07 19:09:12 -07:00
R Tyler Croy
3d0b48eb3a
Handle url-encoded hive partition key names
...
S3Object.url_decoded_key has let me down
2023-05-07 18:40:28 -07:00
R Tyler Croy
0fcb951063
The `lambda` feature is required to build the lambda
2023-05-07 18:25:01 -07:00
R Tyler Croy
a6d8121d42
Refactor the append_to_table function into the crate
...
This also should (untested) properly create new transactions with partition
information
2023-05-07 18:24:07 -07:00
R Tyler Croy
b565624df1
Update the readme to include details about using the oxbow lambda function
2023-05-07 18:14:04 -07:00
R Tyler Croy
92bd91beed
Properly prepare the ObjectMeta records to be added to the table
...
Basically the location of these objects needs to be relative to the _delta_log/
table, therefore the prune_prefix argument makes it easier to convert locations
It might make sense for this to be non-optional in the future but I'm erring on
the side of less refactoring
2023-05-07 18:05:23 -07:00
R Tyler Croy
b0bba64ada
Allow the lambda function to log to CloudWatch logs
...
This is pretty key for debugging 😸
2023-05-07 18:00:09 -07:00
R Tyler Croy
dbf5882a9b
Introduce the main lambda functionality of creating or appending to a table
2023-05-07 16:21:46 -07:00
R Tyler Croy
bddb3bdb8e
Properly build with both features (lambda/cli)
...
I'm still not entirely pleased with this approach. I'm going to plkay with it
some more. I'm keen to try this rather than having two different binaries built
2023-05-07 14:51:12 -07:00
R Tyler Croy
856bfac746
infer_log_path_from to help the lambda place the _delta_log correctly
2023-05-07 14:45:52 -07:00
R Tyler Croy
80e1edbb8d
Add stub directory tree for examples using oxbow
2023-05-07 14:20:52 -07:00
R Tyler Croy
455329c8c2
Scaffolding of the minimum terraform and lambda code to receive the bucket notifications
...
This is not yet functioning in the true sense of `oxbow` yet, but at least is
ready for test cycles with real use-cases in AWS
2023-05-07 14:18:54 -07:00
R Tyler Croy
e88f21ab5c
Correct parsing error in readme
2023-05-07 14:18:14 -07:00
R Tyler Croy
0192d04f69
Add an integration test for validating all the golden tables
...
This currently fails because a parquet file's schema is not delta compatible
somehow:
thread 'test_all_tables' panicked at 'Failed to convert the schema for creating the table: SchemaError("Invalid data type for Delta Lake: Timestamp(Nanosecond, None)")', /usr/home/tyler/source/github/buoyant-data/oxbow/src/lib.rs:118:10
I have a hunch that this might be similar to delta-io/delta-rs#1286
2023-05-06 14:52:50 -07:00
R Tyler Croy
5df34ed5f3
Clean up some suggestions from clippy
2023-05-06 14:31:07 -07:00
R Tyler Croy
61e3e98a4b
Support creating delta tables from storage with hive style partitioning schemes
2023-05-06 14:29:16 -07:00
R Tyler Croy
0055b693bc
Sync the hive/ test data with the connectors repository
...
I forgot that I had removed the _delta_log/ originally when testing. I'll need
these to compare the results in the integration tests
2023-05-06 12:04:48 -07:00
R Tyler Croy
7b31ec42e3
Create rust.yml
2023-05-06 11:58:44 -07:00
R Tyler Croy
7fe1204145
Fix code fencing
2023-05-06 09:26:20 -07:00
R Tyler Croy
1355f8a34a
Add some CLI usage information to the README
...
Before I forget!
2023-05-06 09:25:02 -07:00
R Tyler Croy
b45f11f163
Add an integration test to perform the most simple validation of conversion
...
This replicates what I was doing in the command line and ensures that there
won't be regressions as I refactor now
2023-05-06 09:17:21 -07:00
R Tyler Croy
b9bc10ec56
Add a slice of the golden data set from delta-io/connectors
2023-05-06 09:06:47 -07:00
R Tyler Croy
b5dd6e77d0
Implement the most simple use-case for a command line invocation
...
This will convert a single non-partitioned directory into a delta table
2023-05-06 08:32:11 -07:00
R Tyler Croy
bc2d2ccc4c
Move the lib functions into the lib module
2023-05-02 21:05:30 -07:00
R Tyler Croy
ba82fa93d7
Working on discover_parquet_files() for identifying parquet files to import
2023-05-02 21:02:25 -07:00
R Tyler Croy
d3f1c85fa7
Starting implementation of the CLI version with local files for testing
2023-05-02 19:09:24 -07:00
R Tyler Croy
1fa65baecb
Start structuring the project to support a CLI and in the future a lambda mode
2023-05-01 21:30:21 -07:00
R Tyler Croy
97efd6a37c
Add deltalake with the license before starting implementation
2023-05-01 21:01:28 -07:00
R Tyler Croy
a64bc4c69d
initial commit
2023-05-01 20:56:59 -07:00