-
Notifications
You must be signed in to change notification settings - Fork 171
Remove ZetaSQL and Modernize Build Toolchain #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Remove ZetaSQL integration and filter_query functionality to eliminate complex dependency chain and simplify build process. The filter_query feature relied on ZetaSQL for SQL query parsing and validation. Changes: - Remove com_google_zetasql from WORKSPACE - Comment out filter_query dependencies in metadata_store BUILD - Remove filter_query implementation from query executors - Remove filter_query tests from metadata_store_test.py - Return UnimplementedError for filter_query usage This prepares the codebase for toolchain modernization by reducing dependency complexity. Users relying on filter_query should migrate to alternative filtering approaches.
Upgrade build system components to latest stable versions for improved compatibility with Ubuntu 24.04 and GCC 13. Bazel Dependencies: - rules_foreign_cc 0.9.0 -> 0.12.0 (removed obsolete patch) - Bazel Skylib 1.5.0 -> 1.7.1 - Abseil 20230802.1 (kept for compatibility) - Google Test 1.12.1 -> 1.15.2 - pybind11 2.10.1 -> 2.13.6 (Python 3.13 support) - SQLite 3.39.2 -> 3.47.2 Build Configuration: - Add --host_cxxopt=\"-std=c++17\" for host tool compilation - Fix GCC 13 compatibility in libmysqlclient.BUILD - Add MariaDB stdint.h type mappings Fixes: - Abseil C++14 requirement error in host tool compilations - GCC 13 implicit function declaration warnings - MariaDB type compatibility with modern compilers
Build Performance ImprovementBefore vs After
Improvement: 5m 58s faster (74% reduction) – ~3.8x speedup Performance AnalysisThe dramatic build time improvement is almost due to ZetaSQL removal: Primary Factor: ZetaSQL Dependency Chain (~6 minutes saved)
Toolchain ModernizationThe dependency upgrades (Bazel Skylib, rules_foreign_cc, Google Test, SQLite) likely contributed ** to build time improvement**. These updates primarily provide:
ConclusionZetaSQL removal is responsible for virtually all of the 74% build time reduction. The toolchain modernization delivers value through improved maintainability and future compatibility. Machine Informationczgdp1807@qgpu1:~/ml-metadata$ uname -a
Linux qgpu1 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:09:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
czgdp1807@qgpu1:~/ml-metadata$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper 2970WX 24-Core Processor
CPU family: 23
Model: 8
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 74%
CPU max MHz: 3000.0000
CPU min MHz: 2200.0000
BogoMIPS: 5987.95
czgdp1807@qgpu1:~/ml-metadata$ grep MemTotal /proc/meminfo | awk '{print $2/1024/1024 " GB
"}'
125.726 GB |
|
Native build on linux is now possible with |
Add automated setup script and environment configuration for building ml-metadata natively on Linux without Docker. - environment.yml: mamba environment with Python 3.11, GCC toolchain - setup_and_build.sh: automated Miniforge/Bazelisk install and build - .bazelrc: GCC compatibility flags (C11/C++17, warning suppressions) Enables building on SSH/bare metal systems without sudo access.
7424cff to
073d4d9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Enable native builds on Apple Silicon by fixing compilation issues: - Update zlib to 1.3.1 and patch upb for C23 compatibility - Configure libmysqlclient and PostgreSQL for ARM64 architecture - Fix Python extension symbol exports for macOS linker - Add conda environment specification for macOS development Resolves platform-specific type conflicts, iconv detection, and x86-specific assembly code issues on darwin_arm64.
The --no-as-needed flag is GNU ld specific and not supported on macOS. Make it Linux-only and add macOS-specific -Wl,-undefined,dynamic_lookup flag for ARM64 builds. Fixes protoc/grpc plugin linking failures.
Change --copt to --conlyopt for C-specific warning flags to fix "unrecognized command line option" errors on Linux. Make macOS iconv configuration conditional via environment variable. Add Linux conda environment specification with manylinux2014 toolchain.
Add conda/micromamba-based GitHub Actions workflows for building and testing ml-metadata wheels. Uses GCC 8.5.0 to match manylinux2014 compatibility, supports Python 3.9-3.11, and includes automated PyPI publishing.
Remove obsolete CI workflows in preparation for updated build system. The removed workflows relied on Docker-based builds that are being replaced with native platform builds.
Overview
This PR removes the ZetaSQL dependency, modernizes the build toolchain to support Ubuntu 24.04, GCC 13, Python 3.13, and enables native builds on macOS ARM64 (Apple Silicon). The changes improve long-term maintainability, reduce build complexity, and eliminate Docker dependencies while maintaining backward compatibility across Linux and macOS platforms.
ZetaSQL Removal
Removed ZetaSQL integration and the
filter_queryfunctionality to:Changes:
com_google_zetasqlfrom WORKSPACEfilter_queryin query executors (returnsUnimplementedError)metadata_store_test.pyImpact: Users relying on
filter_queryshould migrate to standard list operations with client-side filtering.Toolchain Updates Completed
Bazel Dependencies
Build Configuration (.bazelrc)
darwin_arm64-Wl,-undefined,dynamic_lookupfor proper Python extension symbol resolution-Wno-error=c23-extensionsfor modern Clang compatibilityCMAKE_ICONV_FLAGenvironment variable for libmysqlclient iconv configuration--no-as-needed) applied only on Linux--conlyopt=-std=c11)--cxxopt/-std=c++17,--host_cxxopt="-std=c++17")-Wno-array-parameter,-Wno-implicit-function-declaration) via--conlyopt-fpermissive)-Wno-error)Platform-Specific Fixes
darwin_arm64config for Apple Siliconuint/ushort/ulongtype mappings for GCC 13 / Ubuntu 24.04-liconvlinking for character encoding support via conda libiconv_GNU_SOURCEto fix implicit function declarationsCMAKE_ICONV_FLAGenvironment variableCI/CD - Conda-Based Native Builds
Replaced Docker-based CI with native conda builds for better platform support and maintainability.
.github/workflows/build.ymland.github/workflows/test.yml(Docker-based).github/workflows/conda-build.yml: Native build workflow with micromamba.github/workflows/conda-test.yml: Native test workflowci/environment-macos.yml: macOS-specific conda environment (libiconv, delocate)ci/environment.yml: Linux-specific conda environment (GCC 8.x, auditwheel, manylinux2014 sysroot)ubuntu-latest,macos-latest(x86_64 and ARM64 via GitHub-hosted runners)auditwheel(Linux manylinux2014),delocate(macOS)Updates Intentionally Avoided
Bazel 6.5.0 (7.4.1 attempted)
Reason: gRPC 1.46.3 incompatible with Bazel 7.x
Error:
'apple_common.multi_arch_split' value has no field or method 'platform_type'Blocker: Requires gRPC upgrade to 1.50+ which has breaking API changes
Abseil 20230802.1 (20240722.0 attempted)
Reason: API changes break existing code
Error:
'StrCat' is not a member of 'absl'Blocker: Requires comprehensive codebase refactoring for new Abseil API
Protobuf 3.21.9 (3.25.5 attempted)
Reason: Bazel rules removed in newer versions
Error:
file '@com_google_protobuf//:protobuf.bzl' does not contain symbol 'cc_proto_library'Blocker: Requires migration to new Protobuf Bazel rules (protobuf-22.x or 4.x)
PostgreSQL 12.1 (16.6 and 17.2 attempted)
Reason: File structure completely changed between major versions
Error:
missing input file '@postgresql//:src/port/thread.c'(and others)Blocker: Requires complete rewrite of
postgresql.BUILDfor new file structureNext Steps
Future Improvements
postgresql.BUILDfor modern PostgreSQL versionsmacos-14when available for native ARM64 CI buildsTesting Checklist
Compatibility
filter_queryfeature disabledTested on:
Build System: Bazel 6.1.0 via Bazelisk 1.20.0