Skip to content

Build content for a 3-hour conference deployment tutorial #636

@jacobtomlinson

Description

@jacobtomlinson

A goal for 2026 is to build out enough tutorial material on the topic of deployment to fill a large 3-hour tutorial slot at a conference like GTC, PyCon or SciPy.

This content should also be built in orthogonal chapters that can be extracted and put into other content. This way we can scale our reach further.

Topics

The material should broadly cover the following topics:

  • The software stack from driver, through CUDA to Python
  • Common tools and package managers for installing GPU Python code (pip, uv, conda, pixi)
  • Verifying software environments
  • Troubleshooting common install problems
  • Multi-node deployments (Spark, Dask, Ray)
  • Monitoring
    • Local monitoring with nvidia-smi and nvtop
    • Broad monitoring with Prometheus and DCGM
  • Debugging
    • Attaching debuggers or running traces in managed cloud environments

Prior work

Much of this material already exists but has not been put together in a cohesive way. The following resources will be useful:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions