Skip to content

Latest commit

 

History

History
532 lines (380 loc) · 13.9 KB

File metadata and controls

532 lines (380 loc) · 13.9 KB

DPVO Installation Guide

This guide documents the installation process for DPVO on modern NVIDIA GPUs (RTX 40/50 series) with CUDA 12.8 and PyTorch 2.8.

Environment

  • GPU: NVIDIA RTX 5090 (Blackwell architecture, compute capability 12.0)
  • CUDA: 12.8
  • Python: 3.12
  • PyTorch: 2.9.1

Prerequisites

  • NVIDIA Driver compatible with CUDA 12.8
  • Conda (Miniconda or Anaconda)
  • C++ compiler (g++)
  • CUDA Toolkit 12.8

Installation Steps

1. Create Conda Environment

conda create -n visual_slam python=3.12
conda activate visual_slam

2. Install Python Packages

# Pytorch 2.9.1 with CUDA 12.8
pip install torch torchvision


# Other dependencies
pip install tensorboard numba tqdm einops pypose kornia numpy plyfile evo opencv-python yacs

3. Build CUDA Extensions

Set the CUDA architecture for your GPU and build:

# For RTX 5090 (Blackwell, sm_120)
export TORCH_CUDA_ARCH_LIST="12.0"

# For RTX 4090 (Ada Lovelace, sm_89)
# export TORCH_CUDA_ARCH_LIST="8.9"

# For RTX 3090 (Ampere, sm_86)
# export TORCH_CUDA_ARCH_LIST="8.6"

# Install and BUild eigen3
wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
unzip eigen-3.4.0.zip -d modules

# Build and install
cd methods/dpvo
pip install --no-build-isolation .

4. Install Pangolin (Required for DPViewer)

DPViewer requires the Pangolin library for 3D visualization.

⚠️ CRITICAL: ABI Compatibility

Pangolin MUST be built with -D_GLIBCXX_USE_CXX11_ABI=1 to match PyTorch 2.9.1. If you have an existing Pangolin installation, you must rebuild it with this flag. Otherwise, you will get undefined symbol errors when importing DPViewer.

# Install dependencies
sudo apt-get install libglew-dev libpython3-dev libeigen3-dev libgl1-mesa-dev \
    libwayland-dev libxkbcommon-dev wayland-protocols libepoxy-dev

# Clone Pangolin (or use existing clone)
cd modules
git clone https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin

# IMPORTANT: Clean any previous build
rm -rf build
mkdir build && cd build

# Build with CXX11 ABI=1 (MUST match PyTorch 2.9.1)
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig
cd ../..

Verify Pangolin ABI (should show std::__cxx11::basic_string):

nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Expected output (ABI=1, correct):
#   pangolin::CreateWindowAndBind(std::__cxx11::basic_string<char, ...>, ...)
# Wrong output (ABI=0, needs rebuild):
#   pangolin::CreateWindowAndBind(std::string, ...)

5. Install DPViewer (Optional - for visualization)

# Clean any previous build
cd modules
rm -rf DPViewer/build DPViewer/*.egg-info

# Install
pip install --no-build-isolation ./DPViewer

Verify DPViewer installation:

python -c "from dpviewerx import Viewer; print('DPViewer OK')"

6. Install Classical Backend (Optional - for large loop closure)

The classical backend uses DBoW2 for closing very large loops.

Step 1. Install OpenCV C++ API:

sudo apt-get install -y libopencv-dev

Step 2. Build and Install DBoW2:

cd DBoW2
mkdir -p build && cd build
cmake ..
make -j$(nproc)
sudo make install
cd ../..

Step 3. Install DPRetrieval:

pip install --no-build-isolation ./DPRetrieval/

What Gets Installed

The pip install . command builds and installs:

  1. dpvo - Main Python package
  2. cuda_corr - CUDA extension for correlation operations (FP16 supported)
  3. cuda_ba - CUDA extension for bundle adjustment
  4. lietorch_backends - CUDA extension for Lie group operations (SE3, SO3, Sim3)

FP16/AMP Support

The CUDA extensions have been modified to support FP16 (Half precision) for Automatic Mixed Precision (AMP) training.

Enabling AMP

Set amp: true in your training config:

training:
  amp: true  # Enable Automatic Mixed Precision

Verifying FP16 Support

Run the test script to verify FP16 operations work correctly:

python correlation_test.py

Expected output:

============================================================
CUDA Correlation Extension - FP16 Support Test
============================================================
GPU: NVIDIA GeForce RTX 5090
CUDA: 12.8
PyTorch: 2.9.1+cu128

[Test 1] corr forward - FP32 ... PASSED
[Test 2] corr forward - FP16 (Half) ... PASSED
[Test 3] corr backward - FP32 ... PASSED
[Test 4] corr backward - FP16 (Half) ... PASSED
[Test 5] patchify forward - FP32 ... PASSED
[Test 6] patchify forward - FP16 (Half) ... PASSED
[Test 7] patchify backward - FP16 (Half) ... PASSED
[Test 8] corr forward with autocast ... PASSED
[Test 9] Numerical consistency (FP32 vs FP16) ... PASSED

Total: 9/9 tests passed
All tests passed! FP16 support is working correctly.

FP16 Implementation Details

CUDA Kernel Changes (dpvo/altcorr/correlation_kernel.cu):

  • Added #include <cuda_fp16.h> for Half type support
  • Added #include <ATen/cuda/Atomic.cuh> for gpuAtomicAdd (type-agnostic atomic operations)
  • Changed atomicAdd to gpuAtomicAdd for FP16 compatibility
  • Uses AT_DISPATCH_FLOATING_TYPES_AND_HALF for type dispatch

Correlation Layer Changes (dpvo/altcorr/correlation.py):

  • coords tensor is always converted to float32 (required by CUDA kernel design)
  • fmap1, fmap2 tensors support FP16 for faster computation

Lietorch Changes (dpvo/lietorch/group_ops.py):

  • All Lie group operations (Exp, Log, Inv, Mul, Adj, AdjT, etc.) convert inputs to float32
  • Outputs are converted back to original dtype after computation
  • This ensures numerical stability for quaternion and matrix operations

Training Script Changes (train.py):

  • Added torch.amp.GradScaler for gradient scaling
  • Wrapped forward pass with torch.amp.autocast('cuda')

Utility Changes (utils/utils.py):

  • Added @torch.amp.autocast('cuda', enabled=False) decorator to kabsch_umeyama (SVD requires FP32)

Numerical Accuracy

FP16 vs FP32 comparison shows:

  • Mean relative error: < 1% (typically ~0.6%)
  • Max absolute difference: ~0.07
  • Results are acceptable for training

Benefits of AMP Training

  • ~30% faster training iteration
  • ~40% less GPU memory usage
  • Enable larger batch sizes or longer sequences

Verification

After installation, verify by running:

python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
=> 
PyTorch: 2.9.1+cu128
CUDA: 12.8
GPU: NVIDIA GeForce RTX 5090

python -c "import dpvo; print('DPVO imported successfully')"

Running Demo

# With visualization
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --viz

# Without visualization (if DPViewer not installed)
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --plot

Troubleshooting

CUDA Architecture Reference

GPU Series Architecture Compute Capability NVCC Flag
RTX 5090/5080 Blackwell 12.0 sm_120
RTX 4090/4080 Ada Lovelace 8.9 sm_89
RTX 3090/3080 Ampere 8.6 sm_86
RTX 2080 Ti Turing 7.5 sm_75

Note: RTX 50 series (Blackwell) requires CUDA 12.8+ and PyTorch nightly builds with sm_120 support.

To check your GPU's compute capability:

nvidia-smi --query-gpu=name,compute_cap --format=csv

Why --no-build-isolation is Required

When building CUDA extensions with pip install, pip normally creates an isolated virtual environment for the build process. This causes problems because:

  1. PyTorch dependency: The CUDA extensions need to link against PyTorch's CUDA libraries during compilation
  2. Build isolation: In an isolated environment, pip installs only packages listed in build-requires, but the extensions need the exact PyTorch version you installed
  3. Header files: The extensions include PyTorch header files (<torch/extension.h>) which must match your installed version

Without --no-build-isolation:

ModuleNotFoundError: No module named 'torch'

With --no-build-isolation:

  • pip uses your current conda environment directly
  • The build can access your installed PyTorch
  • Headers and libraries match correctly

Understanding C++ ABI Compatibility

What is the C++ ABI?

The C++ ABI (Application Binary Interface) defines how C++ code is compiled at the binary level, including:

  • How function names are encoded (name mangling)
  • How std::string and other STL types are represented in memory
  • How exceptions are handled

GCC introduced a new ABI in GCC 5.1 (2015) with the flag _GLIBCXX_USE_CXX11_ABI:

  • ABI=0 (old): Pre-C++11 std::string implementation
  • ABI=1 (new): C++11 compliant std::string with small string optimization

Why Does This Matter?

All C++ libraries that share objects (like std::string) must use the same ABI. If they don't:

undefined symbol: _ZN8pangolin19CreateWindowAndBindENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE...

This error shows Pangolin was built with ABI=0 (uses std::string) but DPViewer expects ABI=1 (uses std::__cxx11::basic_string).

Check PyTorch's ABI

python -c "import torch; print('CXX11_ABI:', torch._C._GLIBCXX_USE_CXX11_ABI)"

PyTorch 2.9.1 uses ABI=1 (True), so all linked libraries must also use ABI=1.

Fixing ABI Mismatches

Step-by-step fix for Pangolin ABI mismatch:

# 1. Check current Pangolin ABI
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# If it shows "std::string" instead of "std::__cxx11::basic_string", rebuild is needed

# 2. Go to Pangolin directory and clean
cd ~/Pangolin
rm -rf build

# 3. Rebuild with correct ABI
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig

# 4. Verify the fix
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Should now show: std::__cxx11::basic_string<char, ...>

# 5. Rebuild DPViewer
cd ~/park/SOLUTION/DPVO
pip uninstall dpviewer -y
rm -rf DPViewer/build DPViewer/*.egg-info
pip install --no-build-isolation ./DPViewer

# 6. Verify DPViewer
python -c "from dpviewerx import Viewer; print('DPViewer OK')"

For DPViewer (already configured in CMakeLists.txt):

add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

Error: cannot convert 'at::DeprecatedTypeProperties' to 'c10::ScalarType'

This error occurs with PyTorch 2.x due to deprecated API.

Cause: PyTorch 2.x deprecated the .type() method which returned at::DeprecatedTypeProperties. The new API uses .scalar_type() which returns at::ScalarType directly.

Fix: Change all .type() calls to .scalar_type():

// Before (PyTorch 1.x)
AT_DISPATCH_FLOATING_TYPES(tensor.type(), "kernel_name", ([&] { ... }));

// After (PyTorch 2.x)
AT_DISPATCH_FLOATING_TYPES(tensor.scalar_type(), "kernel_name", ([&] { ... }));

Files modified:

File Changes
dpvo/altcorr/correlation_kernel.cu 4 fixes
dpvo/lietorch/src/lietorch_gpu.cu 19 fixes
dpvo/lietorch/src/lietorch_cpu.cpp 19 fixes
dpvo/lietorch/include/dispatch.h Updated macro

Error: nvcc fatal: Unsupported gpu architecture 'compute_XX'

Set the correct CUDA architecture for your GPU:

export TORCH_CUDA_ARCH_LIST="12.0"  # Adjust for your GPU
pip install --no-build-isolation .

Error: CUDA error: no kernel image is available for execution on the device

This means the CUDA code was compiled for a different GPU architecture than your device.

Check your GPU:

nvidia-smi --query-gpu=name,compute_cap --format=csv

Check PyTorch supported architectures:

python -c "import torch; print(torch.cuda.get_arch_list())"

Rebuild with correct architecture:

export TORCH_CUDA_ARCH_LIST="12.0"  # Match your GPU
pip uninstall dpvo lietorch cuda_corr cuda_ba -y
pip install --no-build-isolation .

Error: Unknown CMake command "python_add_library"

This occurs in DPViewer when pybind11 can't find CMake's Python module properly.

Fix applied in DPViewer/CMakeLists.txt:

# Use find_package(Python ...) not find_package(Python3 ...)
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)

Error: CMake finds wrong Python version

CMake may find system Python instead of conda Python.

Check which Python CMake finds:

-- Found Python: /usr/bin/python3.10 (wrong!)
-- Found Python: /home/user/miniconda3/envs/dpvo/bin/python3.12 (correct!)

Fix applied in DPViewer/setup.py:

cmake_args = [
    "-DPython_EXECUTABLE={}".format(sys.executable),
    "-DPython3_EXECUTABLE={}".format(sys.executable),
    # ...
]

Error: libpango_windowing.so: cannot open shared object file

Pangolin libraries are not in the library path.

Fix:

sudo ldconfig
# Or set LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

FutureWarning: torch.cuda.amp.autocast(args...) is deprecated

This is a warning, not an error. The code works but uses deprecated API.

Location: dpvo/net.py:187

Fix (optional):

# Before
from torch.cuda.amp import autocast

# After
from torch.amp import autocast
# Use: autocast('cuda', enabled=True)

Code Modifications Summary

PyTorch 2.x Compatibility

All .type() calls changed to .scalar_type() for PyTorch 2.x compatibility.

DPViewer CMake Fixes

DPViewer/CMakeLists.txt:

# Python detection for pybind11
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)

# Match PyTorch's ABI
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

DPViewer/setup.py:

# Ensure CMake uses conda Python
cmake_args = [
    "-DPython_EXECUTABLE={}".format(sys.executable),
    "-DPython3_EXECUTABLE={}".format(sys.executable),
    # ...
]