DPVO Installation Guide

This guide documents the installation process for DPVO on modern NVIDIA GPUs (RTX 40/50 series) with CUDA 12.8 and PyTorch 2.8.

Environment

GPU: NVIDIA RTX 5090 (Blackwell architecture, compute capability 12.0)
CUDA: 12.8
Python: 3.12
PyTorch: 2.9.1

Prerequisites

NVIDIA Driver compatible with CUDA 12.8
Conda (Miniconda or Anaconda)
C++ compiler (g++)
CUDA Toolkit 12.8

Installation Steps

1. Create Conda Environment

conda create -n visual_slam python=3.12
conda activate visual_slam

2. Install Python Packages

# Pytorch 2.9.1 with CUDA 12.8
pip install torch torchvision


# Other dependencies
pip install tensorboard numba tqdm einops pypose kornia numpy plyfile evo opencv-python yacs

3. Build CUDA Extensions

Set the CUDA architecture for your GPU and build:

# For RTX 5090 (Blackwell, sm_120)
export TORCH_CUDA_ARCH_LIST="12.0"

# For RTX 4090 (Ada Lovelace, sm_89)
# export TORCH_CUDA_ARCH_LIST="8.9"

# For RTX 3090 (Ampere, sm_86)
# export TORCH_CUDA_ARCH_LIST="8.6"

# Install and BUild eigen3
wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
unzip eigen-3.4.0.zip -d modules

# Build and install
cd methods/dpvo
pip install --no-build-isolation .

4. Install Pangolin (Required for DPViewer)

DPViewer requires the Pangolin library for 3D visualization.

⚠️ CRITICAL: ABI Compatibility

Pangolin MUST be built with -D_GLIBCXX_USE_CXX11_ABI=1 to match PyTorch 2.9.1. If you have an existing Pangolin installation, you must rebuild it with this flag. Otherwise, you will get undefined symbol errors when importing DPViewer.

# Install dependencies
sudo apt-get install libglew-dev libpython3-dev libeigen3-dev libgl1-mesa-dev \
    libwayland-dev libxkbcommon-dev wayland-protocols libepoxy-dev

# Clone Pangolin (or use existing clone)
cd modules
git clone https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin

# IMPORTANT: Clean any previous build
rm -rf build
mkdir build && cd build

# Build with CXX11 ABI=1 (MUST match PyTorch 2.9.1)
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig
cd ../..

Verify Pangolin ABI (should show std::__cxx11::basic_string):

nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Expected output (ABI=1, correct):
#   pangolin::CreateWindowAndBind(std::__cxx11::basic_string<char, ...>, ...)
# Wrong output (ABI=0, needs rebuild):
#   pangolin::CreateWindowAndBind(std::string, ...)

5. Install DPViewer (Optional - for visualization)

# Clean any previous build
cd modules
rm -rf DPViewer/build DPViewer/*.egg-info

# Install
pip install --no-build-isolation ./DPViewer

Verify DPViewer installation:

python -c "from dpviewerx import Viewer; print('DPViewer OK')"

6. Install Classical Backend (Optional - for large loop closure)

The classical backend uses DBoW2 for closing very large loops.

Step 1. Install OpenCV C++ API:

sudo apt-get install -y libopencv-dev

Step 2. Build and Install DBoW2:

cd DBoW2
mkdir -p build && cd build
cmake ..
make -j$(nproc)
sudo make install
cd ../..

Step 3. Install DPRetrieval:

pip install --no-build-isolation ./DPRetrieval/

What Gets Installed

The pip install . command builds and installs:

dpvo - Main Python package
cuda_corr - CUDA extension for correlation operations (FP16 supported)
cuda_ba - CUDA extension for bundle adjustment
lietorch_backends - CUDA extension for Lie group operations (SE3, SO3, Sim3)

FP16/AMP Support

The CUDA extensions have been modified to support FP16 (Half precision) for Automatic Mixed Precision (AMP) training.

Enabling AMP

Set amp: true in your training config:

training:
  amp: true  # Enable Automatic Mixed Precision

Verifying FP16 Support

Run the test script to verify FP16 operations work correctly:

python correlation_test.py

Expected output:

============================================================
CUDA Correlation Extension - FP16 Support Test
============================================================
GPU: NVIDIA GeForce RTX 5090
CUDA: 12.8
PyTorch: 2.9.1+cu128

[Test 1] corr forward - FP32 ... PASSED
[Test 2] corr forward - FP16 (Half) ... PASSED
[Test 3] corr backward - FP32 ... PASSED
[Test 4] corr backward - FP16 (Half) ... PASSED
[Test 5] patchify forward - FP32 ... PASSED
[Test 6] patchify forward - FP16 (Half) ... PASSED
[Test 7] patchify backward - FP16 (Half) ... PASSED
[Test 8] corr forward with autocast ... PASSED
[Test 9] Numerical consistency (FP32 vs FP16) ... PASSED

Total: 9/9 tests passed
All tests passed! FP16 support is working correctly.

FP16 Implementation Details

CUDA Kernel Changes (dpvo/altcorr/correlation_kernel.cu):

Added #include <cuda_fp16.h> for Half type support
Added #include <ATen/cuda/Atomic.cuh> for gpuAtomicAdd (type-agnostic atomic operations)
Changed atomicAdd to gpuAtomicAdd for FP16 compatibility
Uses AT_DISPATCH_FLOATING_TYPES_AND_HALF for type dispatch

Correlation Layer Changes (dpvo/altcorr/correlation.py):

coords tensor is always converted to float32 (required by CUDA kernel design)
fmap1, fmap2 tensors support FP16 for faster computation

Lietorch Changes (dpvo/lietorch/group_ops.py):

All Lie group operations (Exp, Log, Inv, Mul, Adj, AdjT, etc.) convert inputs to float32
Outputs are converted back to original dtype after computation
This ensures numerical stability for quaternion and matrix operations

Training Script Changes (train.py):

Added torch.amp.GradScaler for gradient scaling
Wrapped forward pass with torch.amp.autocast('cuda')

Utility Changes (utils/utils.py):

Added @torch.amp.autocast('cuda', enabled=False) decorator to kabsch_umeyama (SVD requires FP32)

Numerical Accuracy

FP16 vs FP32 comparison shows:

Mean relative error: < 1% (typically ~0.6%)
Max absolute difference: ~0.07
Results are acceptable for training

Benefits of AMP Training

~30% faster training iteration
~40% less GPU memory usage
Enable larger batch sizes or longer sequences

Verification

After installation, verify by running:

python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
=> 
PyTorch: 2.9.1+cu128
CUDA: 12.8
GPU: NVIDIA GeForce RTX 5090

python -c "import dpvo; print('DPVO imported successfully')"

Running Demo

# With visualization
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --viz

# Without visualization (if DPViewer not installed)
python demo.py --imagedir=<path_to_images> --calib=<path_to_calibration> --stride=1 --plot

Troubleshooting

CUDA Architecture Reference

GPU Series	Architecture	Compute Capability	NVCC Flag
RTX 5090/5080	Blackwell	12.0	sm_120
RTX 4090/4080	Ada Lovelace	8.9	sm_89
RTX 3090/3080	Ampere	8.6	sm_86
RTX 2080 Ti	Turing	7.5	sm_75

Note: RTX 50 series (Blackwell) requires CUDA 12.8+ and PyTorch nightly builds with sm_120 support.

To check your GPU's compute capability:

nvidia-smi --query-gpu=name,compute_cap --format=csv

Why `--no-build-isolation` is Required

When building CUDA extensions with pip install, pip normally creates an isolated virtual environment for the build process. This causes problems because:

PyTorch dependency: The CUDA extensions need to link against PyTorch's CUDA libraries during compilation
Build isolation: In an isolated environment, pip installs only packages listed in build-requires, but the extensions need the exact PyTorch version you installed
Header files: The extensions include PyTorch header files (<torch/extension.h>) which must match your installed version

Without --no-build-isolation:

ModuleNotFoundError: No module named 'torch'

With --no-build-isolation:

pip uses your current conda environment directly
The build can access your installed PyTorch
Headers and libraries match correctly

Understanding C++ ABI Compatibility

What is the C++ ABI?

The C++ ABI (Application Binary Interface) defines how C++ code is compiled at the binary level, including:

How function names are encoded (name mangling)
How std::string and other STL types are represented in memory
How exceptions are handled

GCC introduced a new ABI in GCC 5.1 (2015) with the flag _GLIBCXX_USE_CXX11_ABI:

ABI=0 (old): Pre-C++11 std::string implementation
ABI=1 (new): C++11 compliant std::string with small string optimization

Why Does This Matter?

All C++ libraries that share objects (like std::string) must use the same ABI. If they don't:

undefined symbol: _ZN8pangolin19CreateWindowAndBindENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE...

This error shows Pangolin was built with ABI=0 (uses std::string) but DPViewer expects ABI=1 (uses std::__cxx11::basic_string).

Check PyTorch's ABI

python -c "import torch; print('CXX11_ABI:', torch._C._GLIBCXX_USE_CXX11_ABI)"

PyTorch 2.9.1 uses ABI=1 (True), so all linked libraries must also use ABI=1.

Fixing ABI Mismatches

Step-by-step fix for Pangolin ABI mismatch:

# 1. Check current Pangolin ABI
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# If it shows "std::string" instead of "std::__cxx11::basic_string", rebuild is needed

# 2. Go to Pangolin directory and clean
cd ~/Pangolin
rm -rf build

# 3. Rebuild with correct ABI
mkdir build && cd build
cmake .. -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=1"
make -j$(nproc)
sudo make install
sudo ldconfig

# 4. Verify the fix
nm -DC /usr/local/lib/libpango_display.so | grep "CreateWindowAndBind"
# Should now show: std::__cxx11::basic_string<char, ...>

# 5. Rebuild DPViewer
cd ~/park/SOLUTION/DPVO
pip uninstall dpviewer -y
rm -rf DPViewer/build DPViewer/*.egg-info
pip install --no-build-isolation ./DPViewer

# 6. Verify DPViewer
python -c "from dpviewerx import Viewer; print('DPViewer OK')"

For DPViewer (already configured in CMakeLists.txt):

add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

Error: `cannot convert 'at::DeprecatedTypeProperties' to 'c10::ScalarType'`

This error occurs with PyTorch 2.x due to deprecated API.

Cause: PyTorch 2.x deprecated the .type() method which returned at::DeprecatedTypeProperties. The new API uses .scalar_type() which returns at::ScalarType directly.

Fix: Change all .type() calls to .scalar_type():

// Before (PyTorch 1.x)
AT_DISPATCH_FLOATING_TYPES(tensor.type(), "kernel_name", ([&] { ... }));

// After (PyTorch 2.x)
AT_DISPATCH_FLOATING_TYPES(tensor.scalar_type(), "kernel_name", ([&] { ... }));

Files modified:

File	Changes
`dpvo/altcorr/correlation_kernel.cu`	4 fixes
`dpvo/lietorch/src/lietorch_gpu.cu`	19 fixes
`dpvo/lietorch/src/lietorch_cpu.cpp`	19 fixes
`dpvo/lietorch/include/dispatch.h`	Updated macro

Error: `nvcc fatal: Unsupported gpu architecture 'compute_XX'`

Set the correct CUDA architecture for your GPU:

export TORCH_CUDA_ARCH_LIST="12.0"  # Adjust for your GPU
pip install --no-build-isolation .

Error: `CUDA error: no kernel image is available for execution on the device`

This means the CUDA code was compiled for a different GPU architecture than your device.

Check your GPU:

nvidia-smi --query-gpu=name,compute_cap --format=csv

Check PyTorch supported architectures:

python -c "import torch; print(torch.cuda.get_arch_list())"

Rebuild with correct architecture:

export TORCH_CUDA_ARCH_LIST="12.0"  # Match your GPU
pip uninstall dpvo lietorch cuda_corr cuda_ba -y
pip install --no-build-isolation .

Error: `Unknown CMake command "python_add_library"`

This occurs in DPViewer when pybind11 can't find CMake's Python module properly.

Fix applied in DPViewer/CMakeLists.txt:

# Use find_package(Python ...) not find_package(Python3 ...)
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)

Error: CMake finds wrong Python version

CMake may find system Python instead of conda Python.

Check which Python CMake finds:

-- Found Python: /usr/bin/python3.10 (wrong!)
-- Found Python: /home/user/miniconda3/envs/dpvo/bin/python3.12 (correct!)

Fix applied in DPViewer/setup.py:

cmake_args = [
    "-DPython_EXECUTABLE={}".format(sys.executable),
    "-DPython3_EXECUTABLE={}".format(sys.executable),
    # ...
]

Error: `libpango_windowing.so: cannot open shared object file`

Pangolin libraries are not in the library path.

Fix:

sudo ldconfig
# Or set LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated

This is a warning, not an error. The code works but uses deprecated API.

Location: dpvo/net.py:187

Fix (optional):

# Before
from torch.cuda.amp import autocast

# After
from torch.amp import autocast
# Use: autocast('cuda', enabled=True)

Code Modifications Summary

PyTorch 2.x Compatibility

All .type() calls changed to .scalar_type() for PyTorch 2.x compatibility.

DPViewer CMake Fixes

DPViewer/CMakeLists.txt:

# Python detection for pybind11
find_package(Python 3.12 REQUIRED COMPONENTS Interpreter Development)

# Match PyTorch's ABI
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

DPViewer/setup.py:

# Ensure CMake uses conda Python
cmake_args = [
    "-DPython_EXECUTABLE={}".format(sys.executable),
    "-DPython3_EXECUTABLE={}".format(sys.executable),
    # ...
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPVO Installation Guide

Environment

Prerequisites

Installation Steps

1. Create Conda Environment

2. Install Python Packages

3. Build CUDA Extensions

4. Install Pangolin (Required for DPViewer)

5. Install DPViewer (Optional - for visualization)

6. Install Classical Backend (Optional - for large loop closure)

What Gets Installed

FP16/AMP Support

Enabling AMP

Verifying FP16 Support

FP16 Implementation Details

Numerical Accuracy

Benefits of AMP Training

Verification

Running Demo

Troubleshooting

CUDA Architecture Reference

Why `--no-build-isolation` is Required

Understanding C++ ABI Compatibility

What is the C++ ABI?

Why Does This Matter?

Check PyTorch's ABI

Fixing ABI Mismatches

Error: `cannot convert 'at::DeprecatedTypeProperties' to 'c10::ScalarType'`

Error: `nvcc fatal: Unsupported gpu architecture 'compute_XX'`

Error: `CUDA error: no kernel image is available for execution on the device`

Error: `Unknown CMake command "python_add_library"`

Error: CMake finds wrong Python version

Error: `libpango_windowing.so: cannot open shared object file`

FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated

Code Modifications Summary

PyTorch 2.x Compatibility

DPViewer CMake Fixes

FilesExpand file tree

INSTALL.md

Latest commit

History

INSTALL.md

File metadata and controls

DPVO Installation Guide

Environment

Prerequisites

Installation Steps

1. Create Conda Environment

2. Install Python Packages

3. Build CUDA Extensions

4. Install Pangolin (Required for DPViewer)

5. Install DPViewer (Optional - for visualization)

6. Install Classical Backend (Optional - for large loop closure)

What Gets Installed

FP16/AMP Support

Enabling AMP

Verifying FP16 Support

FP16 Implementation Details

Numerical Accuracy

Benefits of AMP Training

Verification

Running Demo

Troubleshooting

CUDA Architecture Reference

Why --no-build-isolation is Required

Understanding C++ ABI Compatibility

What is the C++ ABI?

Why Does This Matter?

Check PyTorch's ABI

Fixing ABI Mismatches

Error: cannot convert 'at::DeprecatedTypeProperties' to 'c10::ScalarType'

Error: nvcc fatal: Unsupported gpu architecture 'compute_XX'

Error: CUDA error: no kernel image is available for execution on the device

Error: Unknown CMake command "python_add_library"

Error: CMake finds wrong Python version

Error: libpango_windowing.so: cannot open shared object file

FutureWarning: torch.cuda.amp.autocast(args...) is deprecated

Code Modifications Summary

PyTorch 2.x Compatibility

DPViewer CMake Fixes

Why `--no-build-isolation` is Required

Error: `cannot convert 'at::DeprecatedTypeProperties' to 'c10::ScalarType'`

Error: `nvcc fatal: Unsupported gpu architecture 'compute_XX'`

Error: `CUDA error: no kernel image is available for execution on the device`

Error: `Unknown CMake command "python_add_library"`

Error: `libpango_windowing.so: cannot open shared object file`

FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated