Skip to content

feat(plugin-iceberg): Make MV stitching and incremental refresh cost-based#27820

Draft
tdcmeehan wants to merge 7 commits into
prestodb:masterfrom
tdcmeehan:iceseq_mvr_sw_cb
Draft

feat(plugin-iceberg): Make MV stitching and incremental refresh cost-based#27820
tdcmeehan wants to merge 7 commits into
prestodb:masterfrom
tdcmeehan:iceseq_mvr_sw_cb

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan commented May 16, 2026

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

Summary by Sourcery

Introduce cost-based strategies and bounded snapshot advancement for Iceberg materialized view incremental refresh and stitching, with new configuration knobs and planner integration.

New Features:

  • Add configurable max_snapshots_per_refresh property and defaults to bound how far Iceberg materialized view refresh advances base table snapshots per run.
  • Introduce session and config-level strategies to control when stitching and incremental refresh rewrites fire (ALWAYS, NEVER, AUTOMATIC), enabling cost-based selection of MV plans.
  • Persist per-base-table snapshot watermarks and derive incremental refresh predicates from Iceberg sequence numbers to support bounded, lineage-aware refresh on V3 tables.

Bug Fixes:

  • Prevent snapshot watermark rewinds across rollbacks or non-ancestor heads by validating ancestry before computing bounded targets, avoiding data loss on refresh.

Enhancements:

  • Apply incremental refresh predicates as filters on base table scans, including column exposure/re-projection, and ensure they are not used for query-time stitching.
  • Extend materialized view status to carry both partition predicates and separate incremental-refresh-only predicates, and handle ancestry changes or partition detection failures by forcing full refresh.
  • Adjust Iceberg statistics gathering to drop metadata-column predicates before binding, improving robustness.
  • Enable SelectLowestCostMVRewrite when either explicit cost-based MV selection or AUTOMATIC stitching/incremental strategies are configured.
  • Emit detailed warnings when falling back from incremental refresh or stitching to full recompute due to strategy settings or missing predicates.

Tests:

  • Add extensive Iceberg REST and optimizer tests covering bounded refresh behavior, V2 vs V3 bases, compaction, rollback, partitioned MVs, stitching interactions, and session/default overrides.
  • Add planner rule tests for incremental refresh strategies, automatic cost-based MV rewrite enablement, and stitching strategy behavior.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 16, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 16, 2026

Reviewer's Guide

Adds cost-based control over Iceberg materialized view incremental refresh and stitching, introduces bounded per-refresh snapshot advancement with row-lineage-aware predicates, exposes new connector and session properties, and updates planner rules plus extensive tests to verify correctness, planning, and warnings.

File-Level Changes

Change Details Files
Add bounded incremental refresh for Iceberg materialized views using per-base snapshot watermarks and hidden sequence predicates, including persistence and rollback handling.
  • Store per-base snapshot watermarks and optional max_snapshots_per_refresh in Iceberg MV view properties and resolve them with a new chooseTargetSnapshot helper that respects row lineage, ancestry, and optional bounds.
  • Compute per-base MaterializedDataPredicates with both partition-level disjuncts (covering watermark-to-HEAD changes) and a separate incrementalRefreshPredicate based on LAST_UPDATED_SEQUENCE_NUMBER when a bounded target is in effect.
  • Update finishRefreshMaterializedView to advance base watermarks to either HEAD or the bounded target snapshot, preserving progress across refreshes and properly handling empty/rollback cases.
  • Warn and fall back to unbounded behavior when bounded refresh is requested on V2 Iceberg bases that lack row lineage, and treat watermark off-ancestry as NOT_MATERIALIZED to force full refresh.
  • Extend Iceberg tests (REST MVs and base MV tests) to cover bounded behavior, V2 fallback, partitioned views, compaction, rollback, session default override, hidden sequence column usability, and warning emission.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSessionProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMaterializedViewProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergWarningCode.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergRestMaterializedViews.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewsBase.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergConfig.java
Make incremental refresh planning and execution predicate-aware and optionally cost-based, and ensure bounded predicates never affect stitching reads.
  • Extend MaterializedViewStatus.MaterializedDataPredicates to carry an incrementalRefreshPredicate separate from partition disjuncts, and thread this through IncrementalRefreshRule.
  • Introduce applyIncrementalRefreshPredicates and an IncrementalRefreshPredicateRewriter that wraps base TableScanNodes with sequence-based filters, adding hidden columns as needed and re-projecting outputs.
  • In IncrementalRefreshRule, gate incremental refresh by a new session strategy, apply predicates to both delta and full plans, and in AUTOMATIC mode emit an MVRewriteCandidatesNode so SelectLowestCostMVRewrite can choose between full and delta plans.
  • Ensure V2 (no row lineage) incremental refresh behaves as unbounded with a warning, and that stitching paths never see the incrementalRefreshPredicate so stale reads remain complete.
  • Add optimizer unit tests to validate presence/absence of sequence predicates in bounded/unbounded/V2/stitching scenarios and new tests for incremental refresh strategy NEVER/AUTOMATIC fallback behavior.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/IncrementalRefreshRule.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/MaterializedViewRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/MaterializedViewRewriteStrategy.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SelectLowestCostMVRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/DifferentialPlanRewriter.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/TestIncrementalRefreshRule.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/TestMaterializedViewRewrite.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewOptimizer.java
Introduce cost-based strategies for when to apply stitching and incremental refresh, wired through system properties, features config, and warnings.
  • Add MaterializedViewRewriteStrategy enum (ALWAYS, NEVER, AUTOMATIC) and plumb it through FeaturesConfig defaults and configuration keys for both stitching and incremental refresh.
  • Expose new system session properties materialized_view_stitching_strategy and materialized_view_incremental_refresh_strategy, with accessors used by rules like MaterializedViewRewrite, IncrementalRefreshRule, and SelectLowestCostMVRewrite.
  • Adjust SelectLowestCostMVRewrite.isEnabled to consider AUTOMATIC strategies for stitching and incremental refresh in addition to the existing explicit cost-based flag.
  • Emit MATERIALIZED_VIEW_STITCHING_FALLBACK warnings when stitching or incremental refresh is disabled by strategy or cannot be applied due to missing predicates/partition info, and add tests asserting correct enable/disable behavior and absence of fallbacks under AUTOMATIC.
  • Add tests in the Iceberg MV base suite to validate behavior of NEVER and AUTOMATIC strategies for both incremental refresh and stitching in end-to-end queries and refreshes.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/SelectLowestCostMVRewrite.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/MaterializedViewRewrite.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSelectLowestCostMVRewrite.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/materializedview/TestMaterializedViewRewrite.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewsBase.java
Extend Iceberg connector/session configuration and statistics handling to support new MV behavior and metadata constraints.
  • Add new IcebergConfig property iceberg.materialized-view-default-max-snapshots-per-refresh and matching session property materialized_view_default_max_snapshots_per_refresh, with default 0 meaning unbounded and tests for mapping.
  • Expose connector-level MV table property max_snapshots_per_refresh with validation that the value is positive, plus helper to read it.
  • Update TableStatisticsMaker to strip metadata-column constraints from TupleDomain before binding to an Iceberg expression, avoiding failures when sequence predicates are present.
  • Refactor IcebergUtil to expose supportsRowLineage and a MIN_FORMAT_VERSION_FOR_ROW_LINEAGE constant used for V3 checks instead of hardcoding version numbers in validation logic.
  • Update documentation placeholders for materialized views and Iceberg connector (files touched, content not shown in diff).
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSessionProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergMaterializedViewProperties.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/TableStatisticsMaker.java
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergConfig.java
presto-docs/src/main/sphinx/admin/materialized-views.rst
presto-docs/src/main/sphinx/connector/iceberg.rst

Possibly linked issues

  • #: Yes. This PR delivers RFC-0016 planner-oriented, cost-based MV refresh/stitching behavior, focusing on Iceberg materialized views.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants