-
Notifications
You must be signed in to change notification settings - Fork 3
Maint/upgrade to duckdb v1.4 #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| # RDKit vcpkg Port Notes | ||
|
|
||
| ## Summary | ||
|
|
||
| RDKit is not available in vcpkg. Creating a C++ only port is feasible. | ||
|
|
||
| ## Current Build Issue | ||
|
|
||
| The project uses: | ||
| - **vcpkg toolchain** for DuckDB extension build system | ||
| - **Spack-installed RDKit** at `/mnt/aux-data/teague/Dev/spack/var/spack/environments/duckdb/.spack-env/view/` | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This path seems to be specific to your system. Is this intended? Also in other portions of this file, for example line 34 |
||
|
|
||
| The conflict: RDKit's cmake config (`rdkit-targets.cmake:61-64`) declares dependencies on `Boost::system`, `Boost::serialization`, `Boost::iostreams`. When vcpkg's toolchain intercepts `find_package(Boost)`, it looks in vcpkg's install tree where Boost doesn't exist. | ||
|
|
||
| ## RDKit Dependencies (from Spack recipe) | ||
|
|
||
| ### Required | ||
| - `boost` (+system +serialization +iostreams) - **in vcpkg** | ||
| - `sqlite` - **in vcpkg** | ||
|
|
||
| ### Optional (for full build) | ||
| - `freetype` - **in vcpkg** | ||
| - `eigen3` (for 3D descriptors) - **in vcpkg** | ||
| - `coordgen` - **NOT in vcpkg** (would need port) | ||
| - `maeparser` - **NOT in vcpkg** (would need port) | ||
| - `freesasa` - **NOT in vcpkg** | ||
| - Python/NumPy (for wrappers) - not needed for C++ only | ||
|
|
||
| ### For C++ Only Build | ||
| Only need: boost, sqlite, optionally eigen3/freetype. All available in vcpkg. | ||
|
|
||
| ## vcpkg Port Structure | ||
|
|
||
| A port requires two files in `/home/teague/Dev/vcpkg/ports/rdkit/`: | ||
|
|
||
| ### vcpkg.json | ||
| ```json | ||
| { | ||
| "name": "rdkit", | ||
| "version": "2024.03.3", | ||
| "description": "RDKit: Open-Source Cheminformatics Software", | ||
| "homepage": "https://www.rdkit.org", | ||
| "license": "BSD-3-Clause", | ||
| "dependencies": [ | ||
| "boost-system", | ||
| "boost-serialization", | ||
| "boost-iostreams", | ||
| "sqlite3", | ||
| { | ||
| "name": "vcpkg-cmake", | ||
| "host": true | ||
| }, | ||
| { | ||
| "name": "vcpkg-cmake-config", | ||
| "host": true | ||
| } | ||
| ], | ||
| "features": { | ||
| "freetype": { | ||
| "description": "Build with FreeType support", | ||
| "dependencies": ["freetype"] | ||
| }, | ||
| "3d": { | ||
| "description": "Build 3D descriptor calculators", | ||
| "dependencies": ["eigen3"] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### portfile.cmake (skeleton) | ||
| ```cmake | ||
| vcpkg_from_github( | ||
| OUT_SOURCE_PATH SOURCE_PATH | ||
| REPO rdkit/rdkit | ||
| REF Release_2024_03_3 | ||
| SHA512 <calculate-sha512> | ||
| HEAD_REF master | ||
| ) | ||
|
|
||
| vcpkg_cmake_configure( | ||
| SOURCE_PATH "${SOURCE_PATH}" | ||
| OPTIONS | ||
| -DRDK_INSTALL_INTREE=OFF | ||
| -DRDK_BUILD_PYTHON_WRAPPERS=OFF | ||
| -DRDK_BUILD_COORDGEN_SUPPORT=OFF | ||
| -DRDK_BUILD_MAEPARSER_SUPPORT=OFF | ||
| -DRDK_BUILD_FREESASA_SUPPORT=OFF | ||
| -DRDK_BUILD_YAEHMOP_SUPPORT=OFF | ||
| -DRDK_BUILD_XYZ2MOL_SUPPORT=OFF | ||
| ) | ||
|
|
||
| vcpkg_cmake_install() | ||
| vcpkg_cmake_config_fixup(CONFIG_PATH lib/cmake/rdkit) | ||
| vcpkg_copy_pdbs() | ||
|
|
||
| file(REMOVE_RECURSE "${CURRENT_PACKAGES_DIR}/debug/include") | ||
| vcpkg_install_copyright(FILE_LIST "${SOURCE_PATH}/license.txt") | ||
| ``` | ||
|
|
||
| ## Effort Estimate (C++ Only) | ||
|
|
||
| | Task | Time | | ||
| |------|------| | ||
| | Create vcpkg.json | 15 min | | ||
| | Create portfile.cmake | 30 min | | ||
| | Debug build issues | 1-2 hours | | ||
| | Test on Linux | 30 min | | ||
| | **Total** | **2-4 hours** | | ||
|
|
||
| ## Alternative: Fix Current Build | ||
|
|
||
| Instead of creating a port, could: | ||
| 1. Add boost to vcpkg manifest so `Boost::system` target exists | ||
| 2. Disable vcpkg toolchain's find_package override for specific packages | ||
| 3. Use Spack entirely (remove vcpkg toolchain) | ||
|
|
||
| ## References | ||
|
|
||
| - Spack recipe: `spack edit rdkit` | ||
| - RDKit CMake docs: https://github.com/rdkit/rdkit/blob/master/Code/cmake/Modules/ | ||
| - vcpkg port tutorial: https://learn.microsoft.com/en-us/vcpkg/get_started/get-started-adding-to-registry | ||
| - Example complex port: `/home/teague/Dev/vcpkg/ports/eigen3/` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,7 +4,7 @@ | |
|
|
||
| #define DUCKDB_EXTENSION_MAIN | ||
| #include "cast.hpp" | ||
| #include "duckdb/main/extension_util.hpp" | ||
| #include "duckdb/main/extension/extension_loader.hpp" | ||
| #include "duckdb_rdkit_extension.hpp" | ||
| #include "mol_compare.hpp" | ||
| #include "mol_formats.hpp" | ||
|
|
@@ -20,39 +20,43 @@ | |
|
|
||
| namespace duckdb { | ||
|
|
||
| static void LoadInternal(DatabaseInstance &instance) { | ||
| duckdb_rdkit::RegisterTypes(instance); | ||
| duckdb_rdkit::RegisterCasts(instance); | ||
| duckdb_rdkit::RegisterFormatFunctions(instance); | ||
| duckdb_rdkit::RegisterCompareFunctions(instance); | ||
| duckdb_rdkit::RegisterDescriptorFunctions(instance); | ||
| static void LoadInternal(ExtensionLoader &loader) { | ||
| duckdb_rdkit::RegisterTypes(loader); | ||
| duckdb_rdkit::RegisterCasts(loader); | ||
| duckdb_rdkit::RegisterFormatFunctions(loader); | ||
| duckdb_rdkit::RegisterCompareFunctions(loader); | ||
| duckdb_rdkit::RegisterDescriptorFunctions(loader); | ||
|
|
||
| for (auto &fun : SDFFunctions::GetTableFunctions()) { | ||
| ExtensionUtil::RegisterFunction(instance, fun); | ||
| loader.RegisterFunction(fun); | ||
| } | ||
|
|
||
| // SDF replacement scan | ||
| auto &instance = loader.GetDatabaseInstance(); | ||
| auto &config = DBConfig::GetConfig(instance); | ||
| config.replacement_scans.emplace_back(SDFFunctions::ReadSDFReplacement); | ||
| } | ||
|
|
||
| void DuckdbRdkitExtension::Load(DuckDB &db) { LoadInternal(*db.instance); } | ||
| void DuckdbRdkitExtension::Load(ExtensionLoader &loader) { | ||
| LoadInternal(loader); | ||
| } | ||
|
|
||
| std::string DuckdbRdkitExtension::Name() { return "duckdb_rdkit"; } | ||
|
|
||
| } // namespace duckdb | ||
|
|
||
| #ifdef DUCKDB_BUILD_LOADABLE_EXTENSION | ||
| extern "C" { | ||
|
|
||
| DUCKDB_EXTENSION_API void duckdb_rdkit_init(duckdb::DatabaseInstance &db) { | ||
| duckdb::DuckDB db_wrapper(db); | ||
| db_wrapper.LoadExtension<duckdb::DuckdbRdkitExtension>(); | ||
| DUCKDB_CPP_EXTENSION_ENTRY(duckdb_rdkit, loader) { | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed the parquet extension uses this macro I'm not familiar with why to use one or the other. Is there an advantage of one approach over the other? |
||
| duckdb::LoadInternal(loader); | ||
| } | ||
|
|
||
| DUCKDB_EXTENSION_API const char *duckdb_rdkit_version() { | ||
| return duckdb::DuckDB::LibraryVersion(); | ||
| } | ||
| } | ||
| #endif | ||
|
|
||
| #ifndef DUCKDB_EXTENSION_MAIN | ||
| #error DUCKDB_EXTENSION_MAIN not defined | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| #pragma once | ||
| #include "common.hpp" | ||
| namespace duckdb_rdkit { | ||
| void RegisterCompareFunctions(DatabaseInstance &instance); | ||
| void RegisterCompareFunctions(ExtensionLoader &loader); | ||
| } // namespace duckdb_rdkit |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| #pragma once | ||
| #include "common.hpp" | ||
| namespace duckdb_rdkit { | ||
| void RegisterDescriptorFunctions(DatabaseInstance &instance); | ||
| void RegisterDescriptorFunctions(ExtensionLoader &loader); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand what this markdown file is for?