Skip to content

[GLUTEN-11559][VL] Add incremental C++ build script for fast development iteration#11595

Merged
baibaichen merged 5 commits intoapache:mainfrom
baibaichen:fix/incremental-build-script
Feb 26, 2026
Merged

[GLUTEN-11559][VL] Add incremental C++ build script for fast development iteration#11595
baibaichen merged 5 commits intoapache:mainfrom
baibaichen:fix/incremental-build-script

Conversation

@baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Feb 10, 2026

What changes were proposed in this pull request?

Gluten's incremental C++ builds take ~3 minutes even for no-op builds (make-based workflow), making the AI-driven edit → compile → test → fix loop impractically slow. With this new dev/inc-build-cpp.sh script using Ninja and smart vcpkg timestamp preservation, incremental builds now complete in under 30 seconds and zero-change builds in 0.27 seconds.

Performance Improvements:

  • Switch from Make to Ninja: 4m53s → 0.1s no-op builds (1400x faster)
  • Skip redundant cmake configure steps (let Ninja handle regeneration)
  • Default skip vcpkg check (use --update_vcpkg when needed)
  • Preserve vcpkg timestamps to avoid unnecessary relinking
  • Result: 3m14s → 0.27s total no-op time

Features:

  • Auto-detects BUILD_TYPE from velox _build/debug or _build/release directory
  • Auto-detects SCALA_VERSION from backends-velox/target/scala-X.XX directory
  • Auto-extracts vcpkg options (BUILD_TESTS, ENABLE_S3/GCS/ABFS) from CMakeCache.txt
  • Validates CMakeCache.txt exists (required for incremental build)
  • Self-contained: no dependency on build-helper-functions.sh for easy maintenance

Usage:

./dev/builddep-veloxbe-inc.sh [--build_type=Debug|Release] [--scala_version=2.13] [--update_vcpkg]

Why are the changes needed?

AI-assisted development requires fast iteration cycles. The existing dev/package-vcpkg.sh is designed for full builds and includes overhead from Velox source reset, OS dependency setup, and Maven builds. This script focuses purely on incremental C++ compilation for the edit → compile → test → fix loop.

Does this PR introduce any user-facing change?

No, this is a developer tool for faster C++ incremental builds.

How was this patch tested?

  • Syntax validation: bash -n dev/inc-build-cpp.sh
  • No-op build performance verified: 0.27s (down from 3m14s)
  • Incremental build after single-file change: ~25s
  • All lines ≤ 100 characters for code readability

Related Issue

Fixes #11559 (PR 2 of 2)
Previous PR: #11560


🤖 AI-Assisted Development: This PR was created with Claude Code to optimize the AI-driven development workflow.

@github-actions github-actions bot added the BUILD label Feb 10, 2026
@baibaichen
Copy link
Contributor Author

baibaichen commented Feb 10, 2026

@PHILO-HE @marin-ma

@baibaichen baibaichen force-pushed the fix/incremental-build-script branch from d52add0 to 1041287 Compare February 10, 2026 11:11
Copy link
Member

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this. Some comments. Please check if they make sense.

for arg in "$@"; do
case $arg in
--build_type=*) BUILD_TYPE="${arg#*=}" ;;
--scala_version=*) SCALA_VERSION="${arg#*=}" ;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this script is for supporting incremental build for C++ code. It seems strange to include scala version as build option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala_version is not for C++ compilation. It is used to determine the target directory when copying native libraries, since the output path differs between Scala 2.12 and 2.13. We assume 2.12 and 2.13 outputs are mutually exclusive, and a full build has already been done, so we can auto-detect it by checking whether scala-2.12 or scala-2.13 exists under the target directory.

fi

TARGET_DIR="$GLUTEN_DIR/backends-velox/target/scala-${SCALA_VERSION}"
TARGET_DIR="$TARGET_DIR/classes/$(platform)/$(arch)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems in the maven build phase, libgluten.so and libvelox.so will be copied to the target directory according to the code below. If so, it may be unnecessary to do the copy here.

https://github.com/apache/incubator-gluten/blob/ae2fc5b5ebeb5bb2b0d60a3c2588f4274597a29c/backends-velox/pom.xml#L229

Copy link
Contributor Author

@baibaichen baibaichen Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script targets the C++-only incremental iteration workflow: we build C++ and immediately place the artifacts where Maven/Scala expects them. If Scala code changes, a subsequent mvn run can skip steps due to timestamp behavior.


# Step 2: Build Velox
step 2 "Building Velox (incremental)"
export simdjson_SOURCE=AUTO Arrow_SOURCE=AUTO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting may be unnecessary now. Could you confirm it in local build?


source "$GLUTEN_DIR/dev/vcpkg/env.sh" $BUILD_OPTIONS

# Restore timestamps for unchanged files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: suggest making it more readable.

** unchanged binary files.

BUILD_OPTIONS="$BUILD_OPTIONS --enable_gcs=$(extract_opt ENABLE_GCS)"
BUILD_OPTIONS="$BUILD_OPTIONS --enable_abfs=$(extract_opt ENABLE_ABFS OFF)"

if [ "$UPDATE_VCPKG" = "ON" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUN_VCPKG_INSTALL seems more clear to developers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use UPDATE_VCPKG because if vcpkg.json changes, we need to refresh/update vcpkg dependencies, and that can happen in practice. RUN_VCPKG_INSTALL sounds like we always just run an install, which is less accurate.

# See the License for the specific language governing permissions and
# limitations under the License.

# Incremental C++ build for Gluten + Velox (no source reset, no Maven, no OS setup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment for clarification?

Only applicable when vcpkg is enabled for static linking.

@FelixYBW
Copy link
Contributor

Is ./dev/inc-build-cpp.sh going to relace builddep-veloxbe.sh? If so we may overwrite builddep-veloxbe.sh. If we need both, we may change the name to ./dev/builddep-veloxbe-inc.sh to align with existing one.

baibaichen and others added 5 commits February 25, 2026 05:55
Add dev/inc-build-cpp.sh for fast incremental C++ builds with auto-detection:

Performance improvements:
- Switch from Make to Ninja: 4m53s -> 0.1s no-op (1400x faster)
- Skip redundant cmake configure steps (use 'cmake --build' directly)
- Default skip vcpkg check (use --update_vcpkg when needed)
- Preserve vcpkg timestamps to avoid unnecessary relinking
- Result: 3m14s -> 0.27s total no-op time

Auto-detection:
- BUILD_TYPE from velox _build/debug or _build/release directory
- SCALA_VERSION from backends-velox/target/scala-X.XX directory
- Vcpkg options (BUILD_TESTS, ENABLE_S3/GCS/ABFS) from CMakeCache.txt
- Validate CMakeCache.txt exists (required for incremental build)

Helper functions extracted to build-helper-functions.sh:
- initialize_num_threads, get_platform, get_maven_arch
- log_build_step, validate_directory
- setup_vcpkg_environment, copy_gluten_libraries

Usage: ./dev/inc-build-cpp.sh [--build_type=Debug|Release] [--scala_version=2.13] [--update_vcpkg]
--update_vcpkg implies dependencies have changed, so preserving
timestamps to avoid Ninja relinking was wrong — it would hide real
library updates and produce incorrect incremental builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…build-cpp.sh

These variables default to AUTO in Velox's CMakeLists.txt and are only
read during cmake configure, not during cmake --build. The export had
no effect on incremental builds.
Align naming with existing builddep-veloxbe.sh convention.
@baibaichen baibaichen force-pushed the fix/incremental-build-script branch from 1041287 to b66f9f3 Compare February 25, 2026 05:56
@baibaichen baibaichen added the AI AI Related issues and PR label Feb 26, 2026
Copy link
Member

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@baibaichen baibaichen merged commit 9d34339 into apache:main Feb 26, 2026
60 of 61 checks passed
@baibaichen baibaichen deleted the fix/incremental-build-script branch February 26, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI AI Related issues and PR BUILD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Build] Improve incremental build time

3 participants