Skip to content
This repository was archived by the owner on May 13, 2025. It is now read-only.
This repository was archived by the owner on May 13, 2025. It is now read-only.

@mpc.run_multiprocess can not work correctly since CUDA initialization error occurs #511

@SlInevitable2003

Description

@SlInevitable2003

I have install the requirements as instructions and I can successfully execute Tutorial 1, which may indicates that the basic dependency have been installed. However tutorial 2 can not work correctly for me. When executing

import crypten.mpc as mpc
import crypten.communicator as comm 

@mpc.run_multiprocess(world_size=2)
def examine_arithmetic_shares():
    x_enc = crypten.cryptensor([1, 2, 3], ptype=crypten.mpc.arithmetic)
    
    rank = comm.get().get_rank()
    crypten.print(f"\nRank {rank}:\n {x_enc}\n", in_order=True)
        
x = examine_arithmetic_shares()

I get the following error:

Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/CrypTen/tutorials/crypten/mpc/context.py", line 29, in _launch
    crypten.init()
  File "/home/CrypTen/tutorials/crypten/mpc/context.py", line 29, in _launch
    crypten.init()
  File "/home/CrypTen/tutorials/crypten/__init__.py", line 77, in init
    _setup_prng()
  File "/home/CrypTen/tutorials/crypten/__init__.py", line 77, in init
    _setup_prng()
  File "/home/CrypTen/tutorials/crypten/__init__.py", line 202, in _setup_prng
    generators[key][device] = torch.Generator(device=device)
  File "/home/CrypTen/tutorials/crypten/__init__.py", line 202, in _setup_prng
    generators[key][device] = torch.Generator(device=device)
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
ERROR:root:One of the parties failed. Check past logs

It seems that the error is caused by initializing CUDA in multiple processes. Please help me with that!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions