delta-rs

Commit Graph

Author	SHA1	Message	Date
dependabot[bot]	81593e9194	chore(deps): update sqlparser requirement from 0.44 to 0.46 Updates the requirements on [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) to permit the latest version. - [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.44.0...v0.46.0) --- updated-dependencies: - dependency-name: sqlparser dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-07 08:53:54 -07:00
emcake	35664c0ef0	fix: Return unsupported error for merging schemas in the presence of partiton columns	2024-05-07 07:12:38 -07:00
KyJah Keys	cfb20f1795	applied cargo fmt	2024-05-07 06:47:12 -07:00
KyJah Keys	7192997604	fix(python, rust): region lookup wasn't working correctly for dynamo	2024-05-07 06:47:12 -07:00
Yijie Shen	e370d34571	fix(rust): unable to read delta table when table contains both null and non-null add stats (#2476 ) # Description To fix the issue when a delta table contains add action with stats_parsed: null. As shown in the test case, `001.json` contains an Add action with stats, while `002.json` contains an Add action with `stats_parsed: null`, before this fix, it will complain: ``` Arrow { source: InvalidArgumentError("all columns in a record batch must have the same length") } ``` The issue is that the array for `num_records` has two values, while for other stats such as null_count, the None value is filtered out by `flat_map`, so there is only one value in the array. # Related Issue(s) closes #2477 # Documentation <!--- Share links to useful documentation --->	2024-05-06 21:47:48 +00:00
R Tyler Croy	d7165cfef8	fix: check to see if the file exists before attempting to rename (#2482 ) In the case of /tmp existing on tmpfs with musl, the prior version of this would fail with a cross-device link error before bubbling up a not found error	2024-05-06 21:28:48 +00:00
Ion Koutsouris	e25aed70a0	fix(python, rust): use new schema for stats parsing instead of old (#2480 ) # Description In some edge cases where we schema evolve, it would parse the stats with the old schema result in these kind of errors: `Exception: Json error: whilst decoding field 'minValues': whilst decoding field 'foo': failed to parse 1000000000000 as Int8` ```python import polars as pl from deltalake import write_deltalake pl.DataFrame({ "foo": [1] }, schema={"foo": pl.Int8}).write_delta("TEST_TABLE_BUG") write_deltalake("TEST_TABLE_BUG", data = pl.DataFrame({ "foo": [1000000000000] }, schema={"foo": pl.Int64}).to_arrow(), mode='overwrite', overwrite_schema=True,engine='rust') ``` Instead of taking the old schema, I added an optional schema to be passed in the logMapper	2024-05-06 16:39:13 +00:00
Adrian Garcia Badaracco	d0617b5ca1	feat(python): add parameter to DeltaTable.to_pyarrow_dataset() (#2465 ) Otherwise there is no way to union this with another dataset.	2024-05-05 22:14:37 +00:00
R Tyler Croy	e7af965abc	chore: update the deltalake-aws version and clippy for release of #2452	2024-05-04 09:03:40 -07:00
Peter Ke	ad89cc3caf	format	2024-05-04 09:03:40 -07:00
Peter Ke	6ef3caa79a	abort commit	2024-05-04 09:03:40 -07:00
R Tyler Croy	85089b1c74	chore: update the changelog to include rust-v0.17.3	2024-05-01 23:09:01 -07:00
R Tyler Croy	f6d110815c	chore: update the python version and dependencies for release	2024-05-01 14:32:06 -07:00
R Tyler Croy	716acc31b7	chore: bump the metacrate and correct some of the version ranges for patch releases	2024-05-01 14:32:06 -07:00
R Tyler Croy	92128eb7c6	chore: bump deltalake-azure for release	2024-05-01 14:32:06 -07:00
R Tyler Croy	55a0c6ea0d	chore: update the deltalake-azure number for release	2024-05-01 14:32:06 -07:00
R Tyler Croy	b54cf99605	chore: increment the patch version for deltalake-gcp	2024-05-01 14:32:06 -07:00
R Tyler Croy	27c1e48cd9	chore: bump the deltalake-aws version for release	2024-05-01 14:32:06 -07:00
Michele Vigilante	0c8e5d56d3	feat(python, rust): add OBJECT_STORE_CONCURRENCY_LIMIT setting for ObjectStoreFactory (#2458 ) # Description This PR adds a configuration to control concurrent access to the underlying object store. It also includes a visibility change to the S3LogStoreFactory to align it with all other provider implementations. # Related Issue(s) - closes #2457 - resolves #2353 # Documentation https://docs.rs/object_store/latest/object_store/limit/struct.LimitStore.html --------- Co-authored-by: Michele Vigilante <michele.vigilante@radancy.com> Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>	2024-05-01 16:04:41 +00:00
R Tyler Croy	68965afa8d	chore: bump the core and python crate for its next release	2024-05-01 08:37:43 -07:00
Stephen Carman	4dce000f02	feat: cdf reader for delta tables (#2048 ) # Description This PR is the initial work for Change Data Feed (CDF) readers for delta tables. This PR looks a lot larger than it really is because a physical test table is checked in with this which will be removed once the loop is closed on CDF reading/writing. # Related Issue(s) # Documentation https://github.com/delta-io/delta/blob/master/PROTOCOL.md#change-data-files https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-cdc-file --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>	2024-04-30 18:15:39 -07:00
R Tyler Croy	41cb9d7b73	chore: increment the version of the deltalake-gcp crate This change also loosens the meta-crate version dependency to allow more easy upgrades in the future	2024-04-30 11:19:40 -07:00
Adrian Garcia Badaracco	2aa4571a21	Update mod.rs	2024-04-30 11:19:40 -07:00
Adrian Garcia Badaracco	fb0a2decfa	Update crates/gcp/src/storage.rs	2024-04-30 11:19:40 -07:00
Adrian Garcia Badaracco	771991393d	add debug	2024-04-30 11:19:40 -07:00
Adrian Garcia Badaracco	5a301288e2	Add file	2024-04-30 11:19:40 -07:00
Adrian Garcia Badaracco	99a4121681	Handle 429 from GCS	2024-04-30 11:19:40 -07:00
Ion Koutsouris	28ad3950d9	feat(rust): advance state in post commit (#2396 ) # Description We advance the state in the post commit now, so it's done in a single location as per suggestion from @Blajda here: https://github.com/delta-io/delta-rs/pull/2391#issuecomment-2041500757 This PR also supersedes this one: https://github.com/delta-io/delta-rs/pull/2280 # Related Issue(s) - fixes #2279 - fixes #2262	2024-04-27 13:08:43 -04:00
Luis	9d3ecbeb62	chore(rust): bump arrow v51 and datafusion v37.1 (#2395 ) # Description Update the arrow and datafusion dependencies. # Related Issue(s) - closes #2328 # Documentation <!--- Share links to useful documentation ---> --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de> Co-authored-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2024-04-26 18:28:50 +00:00
Ion Koutsouris	6a7c684d9b	fix(python): reuse table state in write engine (#2453 ) # Description Instead of reusing the table state, it was being instantiated every time you call write with the rust engine. - https://github.com/delta-io/delta-rs/discussions/2448	2024-04-25 06:37:31 -07:00
R Tyler Croy	f55ddc64a3	Introduce the `Operation` trait for all operations to implement Currently this is nothing but a shim that ensures everything implements IntoFuture, which they all already do, but in the future this will help enforce consistency as well as provide common behaviors.	2024-04-25 01:25:08 -07:00
KyJah Keys	dd358ef8e8	fix(python, rust): remove imds calls from profile auth and region (#2442 ) # Description The AWS SDK uses EC2 instance metadata in the default provider chain, the profile chain and the region provider # Related Issue(s) <!--- For example: - closes #106 ---> - closes #2377 # Documentation <!--- Share links to useful documentation --->	2024-04-23 16:23:52 +00:00
Ion Koutsouris	12979dd881	fix(python, rust): check timestamp_ntz in nested fields, add check_can_writestamp_ntz in pyarrow writer (#2443 ) # Description The nested fields weren't checked, which meant you could get a timestampNtz in your schema but not have the reader/writer features set. This check is now done recursively.	2024-04-23 15:43:14 +00:00
Ion Koutsouris	da6ed7b39d	fix(python, rust): use from_name during column projection creation (#2441 ) # Description @Blajda I don't think `from_qualified_name_ignore_case` was needed here since the delta_fields don't have relation information, they are just the column names. `from_qualified_name_ignore_case` will try to parse `__delta_rs_c_y--1` and results into `__delta_rs_c_y`, while `from_name `just keeps the column name as-is, which is preferred. # Related Issue(s) - closes https://github.com/delta-io/delta-rs/issues/2438	2024-04-22 22:24:58 -04:00
Ion Koutsouris	15abe448dc	chore: bump python for 0.17 release (#2439 ) # Description The description of the main changes of your pull request # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation --->	2024-04-22 17:24:31 +00:00
Ion Koutsouris	f12834e22f	fix(python,rust): missing remove actions during `create_or_replace` (#2437 ) # Description The overwrite mode never added the remove actions, which causes your table to get in an invalid state.	2024-04-22 17:02:00 +00:00
Igor	5f137ca8af	fix(python): load_as_version with datetime object with no timezone specified (#2429 ) # Description Upon attempting to retrieve the version with a datetime object, the `load_as_version` method throws a `ValueError: Failed to parse datetime string: premature end of input`. Datetime objects without a specified timezone will be treated as UTC datetimes.	2024-04-21 21:38:09 +00:00
Jonas Irgens Kylling	ebbdd69274	feat: implement repartitioned for DeltaScan (#2421 ) # Description This implements repartitioned from the ExecutionPlan trait of DeltaScan. Currently, Delta tables without partitions are read with all its files in a single file group of the underlying `ParquetExec`. This seems to mean that Delta tables without partitions are read without concurrency. With repartitioned we can repartition the DeltaScan to get concurrency when reading.	2024-04-16 18:05:04 +00:00
Ion Koutsouris	9736522b87	feat: lazy static runtime in python (#2424 ) # Description As suggested by @wjones127 to create a lazy static runtime, supersedes this PR: https://github.com/delta-io/delta-rs/pull/1950	2024-04-16 16:35:33 +00:00
Yijie Shen	aa8f4d5390	fix(rust): stats_parsed has different number of records with stats (#2405 ) # Description - `stats_parsed` is a StructArray instead of StringArray - Parse `Add` action's `stats` to `stats_parsed` would panic due to the use of `slice.array_data()`. # Related Issue(s) <!--- For example: - ---> closes #2312 # Documentation <!--- Share links to useful documentation ---> https://docs.rs/arrow/51.0.0/arrow/array/struct.GenericByteArray.html#method.value_data --------- Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>	2024-04-15 08:55:17 +02:00
Ion Koutsouris	faa743a6f1	fix(rust): timestamp deserialization format not following protocol + missing timestampNtz deserialization (#2383 ) # Description Our timestamp deserialization format didn't include the %6f to decode this value: 1970-01-01 00:00:00.123456. Additionally during timestampNtz I didn't add deserialization of that primitive type :) - fixes https://github.com/delta-io/delta-rs/issues/2380 - fixes https://github.com/delta-io/delta-rs/issues/2381	2024-04-14 21:29:04 -07:00
Avril Aysha	d49d95ba4b	docs: add Daft integration (#2402 ) This adds an integration page for using Delta Lake with Daft. --------- Co-authored-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>	2024-04-13 08:09:39 +02:00
Ion Koutsouris	133941afdb	fix: time travel when checkpointed and logs removed (#2389 ) # Description It first sets a proper lower boundary instead of always assuming 0, since we can also have checkpointed tables which had logRetention that caused logs to be removed before a checkpoint. - closes https://github.com/delta-io/delta-rs/issues/521	2024-04-12 05:57:46 +00:00
KyJah Keys	64b3e54126	added missing file	2024-04-11 22:36:30 -07:00
KyJah Keys	0d1790306b	feat: added configuration variables to handle EC2 metadata service	2024-04-11 22:36:30 -07:00
Erdem Sarili	3094bd28ce	fix: return error when checkpoints and metadata get out of sync (#2406 ) # Description When a table is corrupted and `_last_checkpoint` file points to a version that has been deleted, `list_log_files_with_checkpoint` function panics. With this change `list_log_files_with_checkpoint` function returns an error allowing callers react to such issues. # Related Issue(s) - https://github.com/delta-io/delta-rs/issues/2290	2024-04-11 12:52:37 +00:00
Ion Koutsouris	5eade5e1f0	feat(rust): post commit hook (v2), create checkpoint hook (#2391 ) # Description Introduces a post commit, which can do additional actions before returning the FinalizedCommit. Current commit hook will creates a checkpoint if it meets the condition of the interval. Also bumping the default interval to 100 commits. 10 commits can be a bit aggressive # Related Issue(s) - closes https://github.com/delta-io/delta-rs/issues/913	2024-04-07 12:26:07 -04:00
Alessandro Rinaldi	fef111c129	docs: document required aws permissions (#2393 ) # Description This documents the required AWS permissions on S3 and DynamoDB to interact with deltalakes. # Related Issue(s) - mentions #1091	2024-04-06 22:05:48 +02:00
Peter Ke	69317f821e	fix(rust): remove flush after writing every batch (#2387 ) # Description Reverts https://github.com/delta-io/delta-rs/pull/2318 by removing `flush` after writing each batch since it was causing smaller than expected row groups to be written during compaction. # Related Issue(s) - closes #2386	2024-04-05 00:46:33 +00:00
Ion Koutsouris	6f81b8034d	fix(python, rust): expr parsing date/timestamp (#2357 ) # Description We weren't parsing all scalar values yet, parses date32/64 and timestampmicros now as well. # Related Issue(s) - fixes https://github.com/delta-io/delta-rs/issues/2344	2024-04-02 08:08:39 +02:00

1 2 3 4 5 ...

1423 Commits All Branches Search

1423 Commits

All Branches