Repository for reference regarding running SAMs over cosmological simulations. I describe how to run several codes: Dhalos, Galform and SHARK.
The code needs to be run before processing Galform and SHARK SAMs since it gives the appropriate format for the merger trees. It can be run in parallel.
I have not uploaded the updated versions to Github since it should be made public by John. There are 2 versions right now:
- The version I developed during my time working at UAM. This code is already compiled on taurus:
/home/chandro/dhalo-trees_rhalf+snap_nfids_nftab - The version I am using during my PhD (supposedly a more updated one). This code should be compiled since I have tried and I was not able to do it:
/home/chandro/dhalo-trees_clagos_mod
There are different parts of the code to be run:
- Find_Descendants: it creates connections between halo/subhalos and their descendants given some halo/subhalo catalogues from the halo finder and particle catalogues. This part does not need to be executed if we are provided with merger tree catalogues that only need to be processed by Dhalos. To run the Find_Descendants executable:
mpirun -np [number of halo/subhalo catalogue files] ./path/to/build/find_descendants [parameter_file]
- Build_Trees: once with the connections generated by Find_Descendants, it builds the merger trees (the temporal history for specific halo/subhalos). Another possibility is to reprocess already created merger tree catalogues from other codes (it modifies the catalogues as described in Jiang+2014 e.g. separating halos linked by a bridge of particles, but more importantly it produces merger trees with the suitable format to run the SAMs SHARK and GALFORM on top of them). To run the Build_Trees executable:
mpirun -np 1 ./path/to/build/build_trees [parameter_file] (MPIRUN NOT IMPLEMENTED FOR GADGET4 MERGER TREES)
- Trace_Particles: for type 2 galaxies (orphan galaxies: those whose hosting subhalo has lost enough particles when decaying into the central subhalo that is not detected anymore). It generates some particle catalogues with only the most bound particles of these orphan galaxies so that the SAM takes as their positions and velocities the ones from the most bound particles instead of making use of an analytical expression. To run the Trace_Particles executable:
mpirun -np [number of halo/subhalo catalogue files] ./path/to/build/trace_particles [parameter_file]
All these executables can be run through the Slurm queueing system. One submission script example is given here:
- submit_mpi_UNITsim+CT.sh: code employed to finally run the whole UNIT simulation parallelized with a 3-node communication (sbatch submit_mpi_UNITsim+CT.sh). In this case I simply ran Build_Trees through different nodes. (MPIRUN USED AS IT IS IMPLEMENTED FOR CONSISTENTTREES MERGER TREES)
Some examples of parameter files:
- UNITsim+CT.txt and UNITsim+CT.cfg: parameter files used when I ran the whole UNIT simulation with ConsistentTrees merger trees on Taurus. In this case I only ran Build_Trees. Data generated in
/data8/vgonzalez/SAMs/trees/ - Gadget4_DhaloMT.txt: parameter file to run over Gadget4 data generating own Dhalo merger trees. In this case you run first Find_Descendants, then Build_Trees and optionally Trace_Particles.
- Gadget4_G4MT.txt: parameter file to run over Gadget4 data using the Gadget4 merger trees. In this case you only run Build_Trees and optionally Trace_Particles.
When using Trace_Particles, it is important to run Gadget4 with the "SUBFIND_ORPHAN_TREATMENT" activated to have catalogues with only the most bound particles for the orphan galaxies instead of catalogues with all the simulated particles (way of saving memory). Anyways Gadget4 always produces as output the catalogues with all the simulated particles. Set the "flag update_tree_files" to T to modify positions and velocities of interpolated halos, but make sure you have a backup copy of the merger trees in case anything goes wrong.
- rhalf_consistenttrees_aprox (Build_Trees): whether or not an approximation for the half mass radius is used. Specific to the UNIT simulation and its consistent trees merger trees (since not all the mergre tree catalogues had the half mass radius data, so we used Rhalf=Rvir/2).
- snap0 (Build_Trees): whether or not the snap=0 is considered. Specific to the UNIT simulation and its consistent trees merger trees (since the snapshot 0 was not included in the merger tree catalogues).
- n_files_ids (Find_Descendants): number of files where particle data is splitted when constructing the merger tree directly with DHalos from Gadget data. Specific to the different formats that work with Subfind ("LGADGET2","LGADGET3","PGADGET3","COCO","GADGET4_HDF5","EAGLE"). It corresponds to the flag "NumFilesPerSnapshot" when running Gadget4.
- n_files_tab (Find_Descendants and Build_Trees): number of files where halo-subhalo data are splitted when constructing the merger tree directly with DHalos from Gadget data. Specific to the different formats that work with Subfind ("LGADGET2","LGADGET3","PGADGET3","COCO","GADGET4_HDF5","EAGLE"). It corresponds to the flag "NumFilesPerSnapshot" when running Gadget4.
- Gadget4 descendants (Gadget4 merger trees in the Build_Trees/gadget4_descendants.f90 code) for more than 1 file implemented.
Semi-analytical model described in Lacey+2016. The code is in /home/chandro/galform
To run the code we need a reference parameter file (.ref file) where the different parameter values are defined.
- UNIT.ref: example of the UNIT simulation based on the gp19 model. Parameters to vary: path "aquarius_tree_file" (Build_Trees output from Dhalos), "trace_particles" = true or false (if Trace_Particles was run or not), path "aquarius_particle_file" (Trace_Particles output from Dhalos), cosmology and power spectrum parameters, simulation volumes. Another important thing is to have the power spectrum of the cosmology employed. To generate it I used the CAMB code:
- camb_Pk.py: given the cosmology and power spectrum parameters it generates the Pk for the .ref file (PKfile parameter).
Later, all these parameters can be still modified using the following codes to run Galform and send jobs to slurm queues (the Slurm configuration can be modified as you wish).
To run Galform you need to provide a model, an Nbody simulation, a redshift and a subvolume.
- run_galform.csh: although the .ref file is used as reference, you can modify/overwrite some parameter values. In such a way, there is a wide variety of different models and simulations defined, as well as the output properties you can choose. Flags: set only "galform" (to run galform) and "elliott" (to produce the desired output for the emulator) to true, while "models_dir" to indicate the output path and
./delete_variable.csh $galform_inputs_file aquarius_particle_filein case there are no particle files (Trace_Particles has not been run). It generates the same number of subvolumes as the input Dhalos merger trees are distributed in. (I have usually used gp19.vimal as reference and then I may have changed some values). To delete a parameter:
./delete_variable.csh $galform_inputs_file [parameter_name] To change a parameter value: ./replace_variable.csh $galform_inputs_file [parameter_name] [new_parameter_value]
- qsub_galform.csh: you choose a model and a simulation and it is sent to a Slurm queue (1 slurm job, 1 subvolume per job, 16 cpus per subvolume). (
./qsub_galform.csh) - qsub_galform_par.sh: send 1 or more models parallelized (1 slurm job, 64 subvolumes per job, 1 cpu per subvolume). More efficient: 1 redshift/model at a time. (
./qsub_galform_par.sh) - qsub_galform_par_eff.sh: send 1 or more models parallelized (1 slurm job, 128 subvolumes per job, 1 cpu per subvolume). Even more efficient: 2 redshifts/models at the same time. (
./qsub_galform_par_eff.sh)
- run_galform_em.csh: the main difference respect to "run_galform.csh" is that Galform uses the model "gp19.vimal.em.project" in which each Galform run has a different set of free parameters (those we are going to study their variation), so the input free parameters take the value of the corresponding Latin Hypercube position and each model itself is stored in a different directory whose name indicate the parameter values employed.
- qsub_galform_par_em.sh: it reads the parameters of the Latin Hypercube from a file (each line corresponds to the 10-parameter values) for 1 redshift/model at a time (4 jobs, 16 subvolumes per job, 1 cpu per subvolume). (
./qsub_galform_par_em.sh) - qsub_galform_par_em_eff.sh: it reads the parameters of the Latin Hypercube from a file (each line corresponds to the 10-parameter values) for 2 redshifts/models at the same time (1 job, 128 subvolumes per job, 1 cpu per subvolume). (
./qsub_galform_par_em_eff.sh)
Semi-analytical model described in Lagos+2018. The code is in /home/chandro/shark
To run it we need to provide a parameter file collecting the free parameter values, the path to the Dhalos output and a file indicating the redshift-snapshot correspondence.
- UNIT.cfg: example of parameter file for the UNIT simulation once processed by Dhalos.
Not parallelized: this way doesn't produce the output distributed in subvolumes, but all the subvolumes together
- 1 subvolume:
./shark parameter_file "simulation_batches = [nº of subvolume]"
- All subvolumes:
./shark parameter_file -t "nº of threads" "simulation_batches = [0-maximum_subvolume]"
Parallelized: I suppose you can apply the same strategy as the one followed for Galform (sending different subvolumes to different Slurm queues), but the command
./shark-submit parameter_file "simulation_batches=[0-maximum_subvolume]" does not work with the following error "sbatch: error: Unable to open file hpc/shark-run" showing up.
Therefore, I have implemented a new "shark-run" file that send 1 subvolume per job, this way the output is distributed over the different subvolumes and it is easier to run in parallel.
- shark-run: new shark-run to launch 1 subvolume per job. What it remains is to implement the same parallelization carried out in Galform (send more than 1 emulator training model and make the code as efficient as possible).
You can find more info in the website https://shark-sam.readthedocs.io/en/latest/ For example I did not run it varying the different parameters, but it is described how to do it in https://shark-sam.readthedocs.io/en/latest/optim.html