delta-rs/python
Ion Koutsouris e25aed70a0
fix(python, rust): use new schema for stats parsing instead of old (#2480)
# Description
In some edge cases where we schema evolve, it would parse the stats with
the old schema result in these kind of errors:
`Exception: Json error: whilst decoding field 'minValues': whilst
decoding field 'foo': failed to parse 1000000000000 as Int8`

```python
import polars as pl
from deltalake import write_deltalake

pl.DataFrame({
    "foo": [1]
}, schema={"foo": pl.Int8}).write_delta("TEST_TABLE_BUG")


write_deltalake("TEST_TABLE_BUG", data = pl.DataFrame({
    "foo": [1000000000000]
}, schema={"foo": pl.Int64}).to_arrow(), mode='overwrite', overwrite_schema=True,engine='rust')
```

Instead of taking the old schema, I added an optional schema to be
passed in the logMapper
2024-05-06 16:39:13 +00:00
..
.cargo
deltalake feat(python): add parameter to DeltaTable.to_pyarrow_dataset() (#2465) 2024-05-05 22:14:37 +00:00
docs feat: merge schema support for the write operation and Python (with Rust engine) 2024-03-05 07:48:28 -08:00
licenses feat(python): add pyarrow to delta compatible schema conversion in writer/merge (#1820) 2023-11-24 21:23:15 +01:00
src chore: update the python version and dependencies for release 2024-05-01 14:32:06 -07:00
stubs feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeListType (#2005) 2024-01-07 11:32:46 +01:00
tests fix(python, rust): use new schema for stats parsing instead of old (#2480) 2024-05-06 16:39:13 +00:00
.gitignore
CONTRIBUTING.md feat: update to include pyarrow-hotfix (#1930) 2023-12-14 00:08:48 +00:00
Cargo.toml chore: update the python version and dependencies for release 2024-05-01 14:32:06 -07:00
Makefile feat: add more info for contributors (#1913) 2023-11-29 09:02:44 +01:00
README.md docs: move dynamo docs into new docs page (#2093) 2024-01-21 19:12:25 +01:00
conftest.py
pyproject.toml fix: fix ruff and mypy version and do formatting (#2240) 2024-03-04 10:08:50 +00:00

README.md

Deltalake-python

PyPI userdoc apidoc

Native Delta Lake Python binding based on delta-rs with Pandas integration.

Example

from deltalake import DeltaTable
dt = DeltaTable("../rust/tests/data/delta-0.2.0")
dt.version()
3
dt.files()
['part-00000-cb6b150b-30b8-4662-ad28-ff32ddab96d2-c000.snappy.parquet',
 'part-00000-7c2deba3-1994-4fb8-bc07-d46c948aa415-c000.snappy.parquet',
 'part-00001-c373a5bd-85f0-4758-815e-7eb62007a15c-c000.snappy.parquet']

See the user guide for more examples.

Installation

pip install deltalake

NOTE: official binary wheels are linked against openssl statically for remote objection store communication. Please file Github issue to request for critical openssl upgrade.

Build custom wheels

Sometimes you may wish to build custom wheels. Maybe you want to try out some unreleased features. Or maybe you want to tweak the optimization of the Rust code.

To compile the package, you will need the Rust compiler and maturin:

curl https://sh.rustup.rs -sSf | sh -s
pip install maturin

Then you can build wheels for your own platform like so:

maturin build --release --out wheels

For a build that is optimized for the system you are on (but sacrificing portability):

RUSTFLAGS="-C target-cpu=native" maturin build --release --out wheels

Cross compilation

The above command only works for your current platform. To create wheels for other platforms, you'll need to cross compile. Cross compilation requires installing two additional components: to cross compile Rust code, you will need to install the target with rustup; to cross compile the Python bindings, you will need to install ziglang.

The following example is for manylinux2014. Other targets will require different Rust target and Python compatibility tags.

rustup target add x86_64-unknown-linux-gnu
pip install ziglang

Then you can build the wheel with:

maturin build --release --zig \
    --target x86_64-unknown-linux-gnu \
    --compatibility manylinux2014 \
    --out wheels

If you expect to only run on more modern system, you can set a newer target-cpu flag to Rust and use a newer compatibility tag for Linux. For example, here we set compatibility with CPUs newer than Haswell (2013) and Linux OS with glibc version of at least 2.24:

RUSTFLAGS="-C target-cpu=haswell" maturin build --release --zig \
    --target x86_64-unknown-linux-gnu \
    --compatibility manylinux_2_24 \
    --out wheels

See note about RUSTFLAGS from the arrow-rs readme.