This is the first in a series on delta-rs DML internals, based on work I’ve been doing on identity and default columns.
When you run an UPDATE against a Delta table, the engine can’t just patch rows in place — Parquet files are immutable. Instead, it has to figure out which files might contain matching rows, rewrite those files, and leave the rest untouched. Getting that candidate set right is one of the most consequential performance decisions in the whole write path.
Here’s how delta-rs does it today.
The Problem
A Delta table can consist of thousands of Parquet files. An UPDATE with a WHERE clause like:
UPDATE my_table SET status = 'archived' WHERE created_at < '2025-01-01'
…doesn’t need to touch files that only contain rows from 2025 onward. But how does the engine know which files those are without reading all of them?
Step 1: The Transaction Log
Delta Lake’s transaction log tracks metadata about every file that’s part of the current table version. Each add action in the log includes:
- The file path
- Row count and file size
- Min/max statistics per column
- Partition values (if the table is partitioned)
This is the starting point. The engine loads the current snapshot — an EagerSnapshot — and gets the list of active files along with their stats.
Step 2: Finding Candidate Files with scan_files_where_matches
The core of candidate selection lives in a single function: scan_files_where_matches. The execute function in update.rs calls it directly with the session, snapshot, and predicate:
let maybe_scan_plan = scan_files_where_matches(session, snapshot, predicate).await?;
If it returns None, no files matched and the UPDATE is a no-op — no commit needed. Otherwise, it returns a MatchedFilesScan containing the logical scan plan, the set of matched file paths, and the optimized predicate.
Here’s what happens inside that function:
Predicate Simplification and Validation
First, the predicate gets split into conjunction terms and simplified. Each term is validated through FindFilesExprProperties — a visitor that rejects non-deterministic expressions (only Volatility::Immutable is allowed) and checks whether the predicate touches only partition columns or also data columns.
Kernel-Level File Skipping
The simplified predicate terms are converted (best-effort) into a Delta kernel predicate via to_delta_predicate. This kernel predicate is passed to DeltaScanNext::builder().with_file_skipping_predicates(), which uses it to eliminate files at the metadata level — both partition pruning and stats-based data skipping happen here as a single pass through the kernel’s scan infrastructure. If a file’s partition values or column statistics prove the predicate can’t match, it’s skipped without reading any data.
The “best effort” part matters: not all DataFusion expressions can be translated to kernel predicates. Anything that can’t be translated is simply not pushed down — the downstream verification pass catches it.
Verification via Aggregate Scan
Kernel file skipping is necessary but not sufficient. It can eliminate files that definitely don’t match, but it can’t confirm that the remaining files actually contain matching rows (min/max stats only give bounds, not certainty).
So delta-rs does a verification pass: it builds a logical plan that scans the remaining files with the full predicate applied, projects just the file path column, and runs a DISTINCT to get the unique set of files that contain at least one matching record:
let files_plan = LogicalPlanBuilder::scan("files_scan", table_source.clone(), None)?
.filter(predicate.clone())?
.project([cast(col(FILE_ID_COLUMN_DEFAULT), DataType::Utf8View)
.alias(FILE_ID_COLUMN_DEFAULT)])?
.distinct()?
.build()?;
This is an actual data scan — it reads Parquet row groups from the candidate files and applies the predicate. The result is the definitive set of files that need rewriting.
Step 3: Rewriting Candidate Files
Once the candidate files are identified, delta-rs reads all rows from those files — not just the matching ones. This is because Parquet files are immutable; you can’t update rows in place, so the entire file must be rewritten.
The clever part is how updated vs. unchanged rows are handled in a single pass. A __delta_rs_update_predicate column is added to the scan: true for rows matching the WHERE clause, null for rows that don’t:
let predicate_null =
when(files_scan.predicate.clone(), lit(true))
.otherwise(lit(ScalarValue::Boolean(None)))?;
Then for each column being updated, a CASE expression conditionally applies the new value:
case(col(UPDATE_PREDICATE_COLNAME))
.when(lit(true), new_value_expr)
.otherwise(col(Column::from_name(field.name())))?
Matching rows get the SET expressions applied; non-matching rows pass through unchanged. The predicate column is dropped, and everything flows into a single write_execution_plan call that writes new Parquet files. The null count on that internal predicate column is used to track metrics — num_updated_rows and num_copied_rows — without any extra computation.
Finally, the original files get remove actions in the commit, and the new files get add actions.
Why This Matters
The gap between “scan everything” and “scan only candidates” can easily be 100x on a large table. If your stats are missing, kernel file skipping can’t help and the verification scan has to read everything. If your predicates don’t align with how the data is physically laid out, even good stats won’t tighten the candidate set much.
A few practical implications:
- Collect stats: Make sure your write pipeline produces per-column min/max statistics. No stats means no skipping.
- Think about data layout: Z-ordering or clustering on your filter columns tightens the min/max ranges per file, which makes kernel file skipping far more effective.
- Partition wisely: Partition predicates are resolved exactly from metadata — no data scan needed. If you have a natural time dimension, partitioning on it gives the candidate-finding step a fast first pass.
What’s Next
In upcoming posts, I’ll dig into the rest of the DML internals — how DELETE reuses most of this machinery, how MERGE combines match logic with the candidate-finding step, and what changes when identity columns and default columns enter the picture.