This project investigates the ability of diffusion models to reconstruct cosmological initial conditions (ICs) under different scenarios.
We evaluate the model’s performance in the presence of noisy observations (halo fields) and varying numbers of training samples in order to study its robustness and reconstruction quality.
The core architecture of this project is based on “Posterior Sampling of the Initial Conditions of the Universe from Non-Linear Large-Scale Structures Using Score-Based Generative Models”(arXiv:2304.03788).
- We first generate datasets from the Quijote Latin Hypercube set of N-Body simulations for both z = 127 (initial conditions) and z = 0 (dark-matter and halo fields).
- We then apply a pixel-based diffusion model to these observational datasets to evaluate reconstruction performance under different conditions.
- Finally, we assess the model’s accuracy using three key metrics: Power Spectrum, Cross-Correlation Coefficient, and Transfer Function.
Details
The dataset used for this project is based on the Quijote simulation suite, which provides large-scale N-body simulations of the Universe.
These simulations are used here to generate both the initial-condition density fields (z = 127) and the observational fields — including dark matter and halo density fields (z = 0).
For this project, we use density fields from the Latin Hypercube (LH) set to train and test the model. Specifically, we employ the 2,000-simulation LH ensemble, in which each simulation is initialized with a different random seed and cosmological parameters sampled across rang. You can access the Quijote datasets at the official website: https://quijote-simulations.readthedocs.io. After downloading, use the generation scripts in this repository to produce voxelized 3D grids for training and testing.
Initial Conditions (z = 127):
To generate the IC density fields, you need to download the Latin Hypercube snapshot files from the Quijote simulation suite.
Each simulation ID includes 8 HDF5 snapshot files, which together contain the particle data necessary to reconstruct the dark matter density field at z = 127.
Running the IC generation script below will convert these snapshots into voxelized 3D density fields for all 2,000 simulations.IC (z = 127) Generation Code
Dark Matter Density Fields (z = 0):
For z = 0, you can either:Use the pre-generated LH dark matter density fields available at 64³ or 128³ resolution, or Recompute your own 3D density grids at custom resolutions by downloading the z = 0 snapshots (snapdir_004) for each of the 2,000 simulation IDs and running the following script:DM (z = 0) Generation Code
Halo Density Fields (z = 0):
Halo density fields are constructed from halo catalogs derived via the Friends-of-Friends (FoF) algorithm applied to each simulation.
You can use the script below to generate voxelized halo density grids at your chosen resolution:Halo Field Generation Code
After generating the individual samples for both redshifts (z = 127 and z = 0), use the stacking script in the Dataset/ folder to combine all simulation IDs into single large .npy arrays for training.Depending on your training setup, you can select how many samples to include (e.g., 100, 500, 1900) based on simulation IDs and split the dataset into training and testing subsets accordingly.
For demonstration purposes, three small stacked dataset samples are included in the Dataset/ folder:
quijote128_halo_train_3.npy— stacked sample of z = 0 halo density fields (3 simulations)quijote128_dm_train_3.npy— stacked sample of z = 0 dark matter density fields (3 simulations)quijote128_z127_train_3.npy— stacked sample of z = 127 initial-condition fields (3 simulations)
These examples allow users to verify the dataset format and test the training and sampling pipeline without downloading the full dataset.
The complete generated datasets (2,000 samples per redshift/type) are available on Google Drive:Complete Generated Dataset
Details
The stacked datasets of both redshifts (z = 0 dark matter/halo fields and z = 127 initial condition fields) are fed into the conditional diffusion model for training.
We begin by training the model with 100 samples, gradually increasing the number up to the full training set (1900 samples) to examine the sensitivity of the model to training dataset size. Additionally, we incorporate the velocity field into the dark matter density field to analyze how the inclusion of extra physical information affects model performance. We also evaluate the model’s robustness using noisy observational data, such as redshift-space dark matter and halo density fields.
The corresponding training script is provided here:Training Code
Training is performed on 4 NVIDIA A100 GPUs available on the UVA Rivanna supercomputing cluster,
using a batch size of 4 per GPU (effective total batch size of 16) for 400 epochs.
The full training process with 1900 samples takes approximately 17 hours.
All key hyperparameters—such as the number of epochs, batch size, learning rate, and model configuration—can be modified in the corresponding configuration file to adapt to different datasets or experiments.
Details
After training, the model enters the sampling phase, where it generates reconstructed initial conditions from unseen test data. During sampling, the model takes the observed z = 0 halo/DM field as input and progressively denoises it to reconstruct the corresponding z = 127 initial condition field.
Before sampling, generate the observation/ground-truth pairs with:make_obs.py. This script prepares the inputs required for sampling and the targets used later for evaluation.
The sampling process is executed using the following script:sample.py
The number of generated samples can be adjusted as a hyperparameter in the configuration file.
Once sampling is complete, the generated outputs are combined into a single file using the stacking script:combine_samples.py
This combined file is then used to evaluate the model’s reconstruction performance. Evaluation is carried out using: result.py
The evaluation script computes three primary metrics to quantify reconstruction accuracy:
- Power Spectrum — measures the statistical similarity of large-scale modes.
- Cross-Correlation Coefficient — quantifies the phase alignment between reconstructed and true fields.
- Transfer Function — evaluates the scale-dependent amplitude accuracy.
Details
This section presents the results of the experiments conducted under different training and observation conditions.
We first examine the sensitivity of the model to the number of training samples. The model was trained with 100, 500, 800, and 1900 samples, and its performance was evaluated using the three metrics: Power Spectrum, Cross-Correlation Coefficient, and Transfer Function.
As shown in the figure below, increasing the number of training samples primarily improves the Transfer Function, while P(k) and C(k) exhibit only little gains (e.g., C(k) increases slightly from 0.88 for 100 samples to 0.90 for 1900 samples).
Finally, we test the model under noisy observational conditions. We condition the model on redshift-space dark matter fields and halo fields, both of which represent more realistic and noisier observations compared to ideal real-space dark matter fields.
As seen in the plots below, conditioning on noisier data degrades reconstruction accuracy—especially the cross-correlation coefficient (C(k)):
- Using halo fields as observations reduces C(k) from 0.90 to 0.60.
- Using redshift-space dark matter fields decreases C(k) from 0.90 to 0.87.
These results show that as observation noise increases, model performance—particularly in C(k)—deteriorates more noticeably.
Giti Doolabi
PhD Student, Computer Science
University of Virginia