This repository contains a GPU-accelerated Julia implementation of the code and data necessary for recreating the figures in the publications
Quenches across the self-organization transition in multimode cavities
T. Keller, V. Torggler, S. Jäger, S. Schütz, H. Ritsch, and G. Morigi
New J. Phys. 20 025004 (2018). doi: 10.1088/1367-2630/aaa161
as well as
Phases of cold atoms interacting via photon-mediated long-range forces
T. Keller, S. Jäger, and G. Morigi
J. Stat. Mech. 064002 (2017). doi: 10.1088/1742-5468/aa71d7
We consider a system of
In a semi-classical regime the system is then described by the following set of
where
respectively. The Wiener processes
Due to the long-range interactions in the system, the evolution to a thermal steady-state occurs in different stages spread over several orders of magnitude, which seems to pose a problem even for advanced numerical solvers like the excellent DifferentialEquations.jl package.
We therefore use a stochastic Heun method with fixed time-step according to
with drift vector
The observables of interest are only calculated at pre-defined logarithmically spaced intervals to account for the different time-scales of the system.
The structure of the SDEs describing the system is very suitable for a parallelized solution on a GPU. We assign one GPU thread per particle for calculating its set of two equations and further group all N threads of a system into a block. Each block represents an independent trajectory of the SDE solution. By running several blocks simultaneously we can also parallelize the repetition of the SDE solution necessary for statistical significance of the results.
Simulating a typical system size of
Running the simulations on a GPU provides a 190 x speed-up in this case. Note that a bottleneck for the increase is the sum that needs to be performed across all threads in a block when calculating the order parameters and also the number of available blocks on the device for parallelizing SDE trajectories. Of course the CPU version could also be parallelized to some degree using different CPU threads and it is also likely not as optimized as it could be.

