Delta with improved vacuum patch

--

Delta 0.6.x with efficient vacuum (parallel deletion support)

Problem Statement:

Deltalake vacuum jobs are taking to long to finish as underneath file deletion logic was sequential. Known bug for deltalake (v0.6.1)
Ref: https://github.com/delta-io/delta/issues/395

Solution:

Deltalake team has resolved this issue & yet to be released stable version for this.
Pull Request: https://github.com/delta-io/delta/pull/522
Lot of organizations are using 0.6.x in production & want this to be part of 0.6.x.
Following are quick steps to generate delta 0.6.1 jar with this patch

Steps for Patch Creation:

  1. Git clone deltalake repo v0.6.1
git clone — branch v0.6.1 https://github.com/delta-io/delta

2. Change Directory

cd delta

3. Compile, Build & Test Repo

build/sbt compile
build/sbt package
build/sbt test

4. Create & download patch

wget https://github.com/delta-io/delta/pull/522.patch

5. Apply Patch

git apply — 3way 522.patch

6. Resolve conflict in config file — DeltaSQLConf.scala — Keep both properties

val DELTA_CHECKPOINT_PART_SIZE = …
val DELTA_VACUUM_PARALLEL_DELETE_ENABLED = …

7. Re-compile, Build, Test:

build/sbt compile
build/sbt package
build/sbt test

8. Find the new jars

<root-dir>/target/scala-2.11/delta-core_2.11–0.6.1.jar
<root-dir>/target/scala-2.12/delta-core_2.12–0.6.1.jar

9. Pre-built Jars
Scala 2.11: delta-core_2.11–0.6.1.jar

Scala 2.12: delta-core_2.12–0.6.1.jar

--

--

Swapnil Chougule
Swapnil Chougule

No responses yet