Go to file
Ion Koutsouris 28ad3950d9
feat(rust): advance state in post commit (#2396)
# Description
We advance the state in the post commit now, so it's done in a single
location as per suggestion from @Blajda here:
https://github.com/delta-io/delta-rs/pull/2391#issuecomment-2041500757

This PR also supersedes this one:
https://github.com/delta-io/delta-rs/pull/2280

# Related Issue(s)
- fixes #2279
- fixes #2262
2024-04-27 13:08:43 -04:00
.cargo FIX #249 Rename camelCased fields to snake_case (#256) 2021-05-24 21:17:03 +00:00
.github chore: remove caches from github actions 2024-02-26 18:51:14 -08:00
crates feat(rust): advance state in post commit (#2396) 2024-04-27 13:08:43 -04:00
delta-inspect feat: arrow backed log replay and table state (#2037) 2024-01-22 20:40:33 -08:00
dev/release chore: update the changelog for rust-v0.17.1 (#2259) 2024-03-16 12:26:25 +00:00
docs docs: add Daft integration (#2402) 2024-04-13 08:09:39 +02:00
proofs chore: fix typos (#1813) 2023-11-07 05:47:31 +01:00
python fix(python): reuse table state in write engine (#2453) 2024-04-25 06:37:31 -07:00
tlaplus
.commitlintrc.js fix the config file 2023-04-30 13:28:56 -07:00
.gitignore fix: include .venv in .gitignore 2024-03-08 16:54:38 -08:00
.ignore Delete lasted Ruby references (#1107) 2023-01-27 21:44:18 +01:00
CHANGELOG.md chore: update the changelog for rust-v0.17.1 (#2259) 2024-03-16 12:26:25 +00:00
CODE_OF_CONDUCT.md
CONTRIBUTING.md Fixing example in CONTRIBUTING.md 2024-02-27 07:24:02 -08:00
Cargo.toml chore(rust): bump arrow v51 and datafusion v37.1 (#2395) 2024-04-26 18:28:50 +00:00
LICENSE.txt
README.md docs: add Daft integration (#2402) 2024-04-13 08:09:39 +02:00
docker-compose.yml chore: update datafusion and related crates (#1504) 2023-07-05 12:10:52 -07:00
mkdocs.yml docs: add Daft integration (#2402) 2024-04-13 08:09:39 +02:00

README.md

delta-rs logo

A native Rust library for Delta Lake, with bindings to Python
Python docs · Rust docs · Report a bug · Request a feature · Roadmap

Deltalake Crate Deltalake Deltalake #delta-rs in the Delta Lake Slack workspace

The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible by providing native low-level APIs aimed at developers and integrators, as well as a high-level operations API that lets you query, inspect, and operate your Delta Lake with ease.

Source Downloads Installation Command Docs
PyPi Downloads pip install deltalake Docs
Crates.io Downloads cargo add deltalake Docs

Table of contents

Quick Start

The deltalake library aims to adopt patterns from other libraries in data processing, so getting started should look familiar.

from deltalake import DeltaTable, write_deltalake
import pandas as pd

# write some data into a delta table
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("./data/delta", df)

# Load data from the delta table
dt = DeltaTable("./data/delta")
df2 = dt.to_pandas()

assert df.equals(df2)

The same table can also be loaded using the core Rust crate:

use deltalake::{open_table, DeltaTableError};

#[tokio::main]
async fn main() -> Result<(), DeltaTableError> {
    // open the table written in python
    let table = open_table("./data/delta").await?;

    // show all active files in the table
    let files: Vec<_> = table.get_file_uris()?.collect();
    println!("{:?}", files);

    Ok(())
}

You can also try Delta Lake docker at DockerHub | Docker Repo

Get Involved

We encourage you to reach out, and are committed to provide a welcoming community.

Integrations

Libraries and frameworks that interoperate with delta-rs - in alphabetical order.

Features

The following section outlines some core features like supported storage backends and operations that can be performed against tables. The state of implementation of features outlined in the Delta protocol is also tracked.

Cloud Integrations

Storage Rust Python Comment
Local done done
S3 - AWS done done requires lock for concurrent writes
S3 - MinIO done done requires lock for concurrent writes
S3 - R2 done done No lock required when using AmazonS3ConfigKey::CopyIfNotExists
Azure Blob done done
Azure ADLS Gen2 done done
Microsoft OneLake done done
Google Cloud Storage done done

Supported Operations

Operation Rust Python Description
Create done done Create a new table
Read done done Read data from a table
Vacuum done done Remove unused files and log entries
Delete - partitions done Delete a table partition
Delete - predicates done done Delete data based on a predicate
Optimize - compaction done done Harmonize the size of data file
Optimize - Z-order done done Place similar data into the same file
Merge done done Merge a target Delta table with source data
FS check done done Remove corrupted files from table

Protocol Support Level

Writer Version Requirement Status
Version 2 Append Only Tables done
Version 2 Column Invariants done
Version 3 Enforce delta.checkpoint.writeStatsAsJson open
Version 3 Enforce delta.checkpoint.writeStatsAsStruct open
Version 3 CHECK constraints semi-done
Version 4 Change Data Feed
Version 4 Generated Columns
Version 5 Column Mapping
Version 6 Identity Columns
Version 7 Table Features
Reader Version Requirement Status
Version 2 Column Mapping
Version 3 Table Features (requires reader V7)