Skip to content

Reticulate #1

@fra-pcmgf

Description

@fra-pcmgf

I am adding reticulate to the EMD computation so that it can be called by R directly

  • Refactored the code into gene_distance_calculate.py
  • Added progress bar using tqdm
  • Added parameters for setting number of processes and maxIterations
  • Use concurrent.futures.ThreadPoolExecutor. This allows to share the data between different threads instead of having to copy (or read the data multiple times and have a large footprint on systems with many CPUs). This also seems to fix issues on MacOS where starting other processes did not work well

Other ideas

  • Use multiprocessing.SharedMemory. An alternative to ThreadPoolExecutor but more cumbersome.
  • Optimize chunksize of pool.imap (did not change performances significantly 48m on 8 cpus w/o and 46 w/)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions