[GLUTEN-11559][VL] Add incremental C++ build script for fast development iteration#11595
Conversation
d52add0 to
1041287
Compare
PHILO-HE
left a comment
There was a problem hiding this comment.
Thanks for proposing this. Some comments. Please check if they make sense.
| for arg in "$@"; do | ||
| case $arg in | ||
| --build_type=*) BUILD_TYPE="${arg#*=}" ;; | ||
| --scala_version=*) SCALA_VERSION="${arg#*=}" ;; |
There was a problem hiding this comment.
Given that this script is for supporting incremental build for C++ code. It seems strange to include scala version as build option.
There was a problem hiding this comment.
scala_version is not for C++ compilation. It is used to determine the target directory when copying native libraries, since the output path differs between Scala 2.12 and 2.13. We assume 2.12 and 2.13 outputs are mutually exclusive, and a full build has already been done, so we can auto-detect it by checking whether scala-2.12 or scala-2.13 exists under the target directory.
| fi | ||
|
|
||
| TARGET_DIR="$GLUTEN_DIR/backends-velox/target/scala-${SCALA_VERSION}" | ||
| TARGET_DIR="$TARGET_DIR/classes/$(platform)/$(arch)" |
There was a problem hiding this comment.
It seems in the maven build phase, libgluten.so and libvelox.so will be copied to the target directory according to the code below. If so, it may be unnecessary to do the copy here.
There was a problem hiding this comment.
This script targets the C++-only incremental iteration workflow: we build C++ and immediately place the artifacts where Maven/Scala expects them. If Scala code changes, a subsequent mvn run can skip steps due to timestamp behavior.
dev/inc-build-cpp.sh
Outdated
|
|
||
| # Step 2: Build Velox | ||
| step 2 "Building Velox (incremental)" | ||
| export simdjson_SOURCE=AUTO Arrow_SOURCE=AUTO |
There was a problem hiding this comment.
This setting may be unnecessary now. Could you confirm it in local build?
dev/inc-build-cpp.sh
Outdated
|
|
||
| source "$GLUTEN_DIR/dev/vcpkg/env.sh" $BUILD_OPTIONS | ||
|
|
||
| # Restore timestamps for unchanged files |
There was a problem hiding this comment.
Nit: suggest making it more readable.
** unchanged binary files.
| BUILD_OPTIONS="$BUILD_OPTIONS --enable_gcs=$(extract_opt ENABLE_GCS)" | ||
| BUILD_OPTIONS="$BUILD_OPTIONS --enable_abfs=$(extract_opt ENABLE_ABFS OFF)" | ||
|
|
||
| if [ "$UPDATE_VCPKG" = "ON" ]; then |
There was a problem hiding this comment.
RUN_VCPKG_INSTALL seems more clear to developers?
There was a problem hiding this comment.
I use UPDATE_VCPKG because if vcpkg.json changes, we need to refresh/update vcpkg dependencies, and that can happen in practice. RUN_VCPKG_INSTALL sounds like we always just run an install, which is less accurate.
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Incremental C++ build for Gluten + Velox (no source reset, no Maven, no OS setup) |
There was a problem hiding this comment.
Could you add a comment for clarification?
Only applicable when vcpkg is enabled for static linking.
|
Is ./dev/inc-build-cpp.sh going to relace builddep-veloxbe.sh? If so we may overwrite builddep-veloxbe.sh. If we need both, we may change the name to ./dev/builddep-veloxbe-inc.sh to align with existing one. |
Add dev/inc-build-cpp.sh for fast incremental C++ builds with auto-detection: Performance improvements: - Switch from Make to Ninja: 4m53s -> 0.1s no-op (1400x faster) - Skip redundant cmake configure steps (use 'cmake --build' directly) - Default skip vcpkg check (use --update_vcpkg when needed) - Preserve vcpkg timestamps to avoid unnecessary relinking - Result: 3m14s -> 0.27s total no-op time Auto-detection: - BUILD_TYPE from velox _build/debug or _build/release directory - SCALA_VERSION from backends-velox/target/scala-X.XX directory - Vcpkg options (BUILD_TESTS, ENABLE_S3/GCS/ABFS) from CMakeCache.txt - Validate CMakeCache.txt exists (required for incremental build) Helper functions extracted to build-helper-functions.sh: - initialize_num_threads, get_platform, get_maven_arch - log_build_step, validate_directory - setup_vcpkg_environment, copy_gluten_libraries Usage: ./dev/inc-build-cpp.sh [--build_type=Debug|Release] [--scala_version=2.13] [--update_vcpkg]
--update_vcpkg implies dependencies have changed, so preserving timestamps to avoid Ninja relinking was wrong — it would hide real library updates and produce incorrect incremental builds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…build-cpp.sh These variables default to AUTO in Velox's CMakeLists.txt and are only read during cmake configure, not during cmake --build. The export had no effect on incremental builds.
Align naming with existing builddep-veloxbe.sh convention.
1041287 to
b66f9f3
Compare
What changes were proposed in this pull request?
Gluten's incremental C++ builds take ~3 minutes even for no-op builds (make-based workflow), making the AI-driven edit → compile → test → fix loop impractically slow. With this new
dev/inc-build-cpp.shscript using Ninja and smart vcpkg timestamp preservation, incremental builds now complete in under 30 seconds and zero-change builds in 0.27 seconds.Performance Improvements:
Features:
Usage:
Why are the changes needed?
AI-assisted development requires fast iteration cycles. The existing
dev/package-vcpkg.shis designed for full builds and includes overhead from Velox source reset, OS dependency setup, and Maven builds. This script focuses purely on incremental C++ compilation for the edit → compile → test → fix loop.Does this PR introduce any user-facing change?
No, this is a developer tool for faster C++ incremental builds.
How was this patch tested?
bash -n dev/inc-build-cpp.shRelated Issue
Fixes #11559 (PR 2 of 2)
Previous PR: #11560
🤖 AI-Assisted Development: This PR was created with Claude Code to optimize the AI-driven development workflow.