This guide documents the installation process for DPVO on modern NVIDIA GPUs (RTX 40/50 series) with CUDA 12.8 and PyTorch 2.8.
- GPU: NVIDIA RTX 5090 (Blackwell architecture, compute capability 12.0)
- CUDA: 12.8
- Python: 3.12
- PyTorch: 2.9.1
- NVIDIA Driver compatible with CUDA 12.8
- Conda (Miniconda or Anaconda)
- C++ compiler (g++)
- CUDA Toolkit 12.8
conda create -n visual_slam python=3.12
conda activate visual_slam# Pytorch 2.9.1 with CUDA 12.8
pip install torch torchvision
# Other dependencies
pip install tensorboard numba tqdm einops pypose kornia numpy plyfile evo opencv-python yacsSet the CUDA architecture for your GPU and build:
# For RTX 5090 (Blackwell, sm_120)
export TORCH_CUDA_ARCH_LIST="12.0"
# For RTX 4090 (Ada Lovelace, sm_89)
# export TORCH_CUDA_ARCH_LIST="8.9"
# For RTX 3090 (Ampere, sm_86)
# export TORCH_CUDA_ARCH_LIST="8.6"
# Install and BUild eigen3
wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
unzip eigen-3.4.0.zip -d modules
# Build and install
cd methods/dpvo
pip install --no-build-isolation .DPViewer requires the Pangolin library for 3D visualization.
⚠️ CRITICAL: ABI CompatibilityPangolin MUST be built with
-D_GLIBCXX_USE_CXX11_ABI=1to match PyTorch 2.9.1. If you have an existing Pangolin installation, you must rebuild it with this flag. Otherwise, you will getundefined symbolerrors when importing DPViewer.
# Install dependencies
sudo apt-get install libglew-dev libpython3-dev libeigen3-dev libgl1-mesa-dev \
libwayland-dev libxkbcommon-dev wayland-protocols libepoxy-dev
# Clone Pangolin (or use existing clone)
cd modules
git clone https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin
# IMPORTANT: Clean any previous build
rm -rf build
mkdir build && cd build
# Build with CXX11 ABI=1 (MUST match PyTorch 2.9.1)
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig
cd ../..Verify Pangolin ABI (should show std::__cxx11::basic_string):
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Expected output (ABI=1, correct):
# pangolin::CreateWindowAndBind(std::__cxx11::basic_string<char, ...>, ...)
# Wrong output (ABI=0, needs rebuild):
# pangolin::CreateWindowAndBind(std::string, ...)# Clean any previous build
cd modules
rm -rf DPViewer/build DPViewer/*.egg-info
# Install
pip install --no-build-isolation ./DPViewerVerify DPViewer installation:
python -c "from dpviewerx import Viewer; print('DPViewer OK')"The classical backend uses DBoW2 for closing very large loops.
Step 1. Install OpenCV C++ API:
sudo apt-get install -y libopencv-devStep 2. Build and Install DBoW2:
cd DBoW2
mkdir -p build && cd build
cmake ..
make -j$(nproc)
sudo make install
cd ../..Step 3. Install DPRetrieval:
pip install --no-build-isolation ./DPRetrieval/The pip install . command builds and installs:
- dpvo - Main Python package
- cuda_corr - CUDA extension for correlation operations (FP16 supported)
- cuda_ba - CUDA extension for bundle adjustment
- lietorch_backends - CUDA extension for Lie group operations (SE3, SO3, Sim3)
The CUDA extensions have been modified to support FP16 (Half precision) for Automatic Mixed Precision (AMP) training.
Set amp: true in your training config:
training:
amp: true # Enable Automatic Mixed PrecisionRun the test script to verify FP16 operations work correctly:
python correlation_test.pyExpected output:
============================================================
CUDA Correlation Extension - FP16 Support Test
============================================================
GPU: NVIDIA GeForce RTX 5090
CUDA: 12.8
PyTorch: 2.9.1+cu128
[Test 1] corr forward - FP32 ... PASSED
[Test 2] corr forward - FP16 (Half) ... PASSED
[Test 3] corr backward - FP32 ... PASSED
[Test 4] corr backward - FP16 (Half) ... PASSED
[Test 5] patchify forward - FP32 ... PASSED
[Test 6] patchify forward - FP16 (Half) ... PASSED
[Test 7] patchify backward - FP16 (Half) ... PASSED
[Test 8] corr forward with autocast ... PASSED
[Test 9] Numerical consistency (FP32 vs FP16) ... PASSED
Total: 9/9 tests passed
All tests passed! FP16 support is working correctly.
CUDA Kernel Changes (dpvo/altcorr/correlation_kernel.cu):
- Added
#include <cuda_fp16.h>for Half type support - Added
#include <ATen/cuda/Atomic.cuh>forgpuAtomicAdd(type-agnostic atomic operations) - Changed
atomicAddtogpuAtomicAddfor FP16 compatibility - Uses
AT_DISPATCH_FLOATING_TYPES_AND_HALFfor type dispatch
Correlation Layer Changes (dpvo/altcorr/correlation.py):
coordstensor is always converted to float32 (required by CUDA kernel design)fmap1,fmap2tensors support FP16 for faster computation
Lietorch Changes (dpvo/lietorch/group_ops.py):
- All Lie group operations (Exp, Log, Inv, Mul, Adj, AdjT, etc.) convert inputs to float32
- Outputs are converted back to original dtype after computation
- This ensures numerical stability for quaternion and matrix operations
Training Script Changes (train.py):
- Added
torch.amp.GradScalerfor gradient scaling - Wrapped forward pass with
torch.amp.autocast('cuda')
Utility Changes (utils/utils.py):
- Added
@torch.amp.autocast('cuda', enabled=False)decorator tokabsch_umeyama(SVD requires FP32)
FP16 vs FP32 comparison shows:
- Mean relative error: < 1% (typically ~0.6%)
- Max absolute difference: ~0.07
- Results are acceptable for training
- ~30% faster training iteration
- ~40% less GPU memory usage
- Enable larger batch sizes or longer sequences
After installation, verify by running:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
=>
PyTorch: 2.9.1+cu128
CUDA: 12.8
GPU: NVIDIA GeForce RTX 5090
python -c "import dpvo; print('DPVO imported successfully')"# With visualization
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --viz
# Without visualization (if DPViewer not installed)
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --plot| GPU Series | Architecture | Compute Capability | NVCC Flag |
|---|---|---|---|
| RTX 5090/5080 | Blackwell | 12.0 | sm_120 |
| RTX 4090/4080 | Ada Lovelace | 8.9 | sm_89 |
| RTX 3090/3080 | Ampere | 8.6 | sm_86 |
| RTX 2080 Ti | Turing | 7.5 | sm_75 |
Note: RTX 50 series (Blackwell) requires CUDA 12.8+ and PyTorch nightly builds with sm_120 support.
To check your GPU's compute capability:
nvidia-smi --query-gpu=name,compute_cap --format=csvWhen building CUDA extensions with pip install, pip normally creates an isolated virtual environment for the build process. This causes problems because:
- PyTorch dependency: The CUDA extensions need to link against PyTorch's CUDA libraries during compilation
- Build isolation: In an isolated environment, pip installs only packages listed in
build-requires, but the extensions need the exact PyTorch version you installed - Header files: The extensions include PyTorch header files (
<torch/extension.h>) which must match your installed version
Without --no-build-isolation:
ModuleNotFoundError: No module named 'torch'
With --no-build-isolation:
- pip uses your current conda environment directly
- The build can access your installed PyTorch
- Headers and libraries match correctly
The C++ ABI (Application Binary Interface) defines how C++ code is compiled at the binary level, including:
- How function names are encoded (name mangling)
- How
std::stringand other STL types are represented in memory - How exceptions are handled
GCC introduced a new ABI in GCC 5.1 (2015) with the flag _GLIBCXX_USE_CXX11_ABI:
- ABI=0 (old): Pre-C++11 std::string implementation
- ABI=1 (new): C++11 compliant std::string with small string optimization
All C++ libraries that share objects (like std::string) must use the same ABI. If they don't:
undefined symbol: _ZN8pangolin19CreateWindowAndBindENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE...
This error shows Pangolin was built with ABI=0 (uses std::string) but DPViewer expects ABI=1 (uses std::__cxx11::basic_string).
python -c "import torch; print('CXX11_ABI:', torch._C._GLIBCXX_USE_CXX11_ABI)"PyTorch 2.9.1 uses ABI=1 (True), so all linked libraries must also use ABI=1.
Step-by-step fix for Pangolin ABI mismatch:
# 1. Check current Pangolin ABI
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# If it shows "std::string" instead of "std::__cxx11::basic_string", rebuild is needed
# 2. Go to Pangolin directory and clean
cd ~/Pangolin
rm -rf build
# 3. Rebuild with correct ABI
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig
# 4. Verify the fix
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Should now show: std::__cxx11::basic_string<char, ...>
# 5. Rebuild DPViewer
cd ~/park/SOLUTION/DPVO
pip uninstall dpviewer -y
rm -rf DPViewer/build DPViewer/*.egg-info
pip install --no-build-isolation ./DPViewer
# 6. Verify DPViewer
python -c "from dpviewerx import Viewer; print('DPViewer OK')"For DPViewer (already configured in CMakeLists.txt):
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)This error occurs with PyTorch 2.x due to deprecated API.
Cause: PyTorch 2.x deprecated the .type() method which returned at::DeprecatedTypeProperties. The new API uses .scalar_type() which returns at::ScalarType directly.
Fix: Change all .type() calls to .scalar_type():
// Before (PyTorch 1.x)
AT_DISPATCH_FLOATING_TYPES(tensor.type(), "kernel_name", ([&] { ... }));
// After (PyTorch 2.x)
AT_DISPATCH_FLOATING_TYPES(tensor.scalar_type(), "kernel_name", ([&] { ... }));Files modified:
| File | Changes |
|---|---|
dpvo/altcorr/correlation_kernel.cu |
4 fixes |
dpvo/lietorch/src/lietorch_gpu.cu |
19 fixes |
dpvo/lietorch/src/lietorch_cpu.cpp |
19 fixes |
dpvo/lietorch/include/dispatch.h |
Updated macro |
Set the correct CUDA architecture for your GPU:
export TORCH_CUDA_ARCH_LIST="12.0" # Adjust for your GPU
pip install --no-build-isolation .This means the CUDA code was compiled for a different GPU architecture than your device.
Check your GPU:
nvidia-smi --query-gpu=name,compute_cap --format=csvCheck PyTorch supported architectures:
python -c "import torch; print(torch.cuda.get_arch_list())"Rebuild with correct architecture:
export TORCH_CUDA_ARCH_LIST="12.0" # Match your GPU
pip uninstall dpvo lietorch cuda_corr cuda_ba -y
pip install --no-build-isolation .This occurs in DPViewer when pybind11 can't find CMake's Python module properly.
Fix applied in DPViewer/CMakeLists.txt:
# Use find_package(Python ...) not find_package(Python3 ...)
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)CMake may find system Python instead of conda Python.
Check which Python CMake finds:
-- Found Python: /usr/bin/python3.10 (wrong!)
-- Found Python: /home/user/miniconda3/envs/dpvo/bin/python3.12 (correct!)
Fix applied in DPViewer/setup.py:
cmake_args = [
"-DPython_EXECUTABLE={}".format(sys.executable),
"-DPython3_EXECUTABLE={}".format(sys.executable),
# ...
]Pangolin libraries are not in the library path.
Fix:
sudo ldconfig
# Or set LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATHThis is a warning, not an error. The code works but uses deprecated API.
Location: dpvo/net.py:187
Fix (optional):
# Before
from torch.cuda.amp import autocast
# After
from torch.amp import autocast
# Use: autocast('cuda', enabled=True)All .type() calls changed to .scalar_type() for PyTorch 2.x compatibility.
DPViewer/CMakeLists.txt:
# Python detection for pybind11
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)
# Match PyTorch's ABI
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)DPViewer/setup.py:
# Ensure CMake uses conda Python
cmake_args = [
"-DPython_EXECUTABLE={}".format(sys.executable),
"-DPython3_EXECUTABLE={}".format(sys.executable),
# ...
]