Data documentation
- Available data products
- Simulation files structure
- Simulation metadata
- Simulation datasets
- Catalog fields
Available data products
For each simulation, we store:
- raw simulation lightcone particle count maps stored at Healpix nside=2048 up to z<3.5 (69 shells per simulation),
- halo catalog snapshots created using the friends-of-friends halo finder, with halo mass of M ~ 10^13 M⊙, at every time step,
- projected full sky weak lensing, galaxy density, and intrinsic alignments maps for a Stage-III forecast, including baryonification, at the nside=512,
- projected KiDS-1000 lensing and intrinsic alignment maps with grid extended with baryonic feedback parameters, in the TFRecords format, from Fluri et al. 2022.
Simulation files structure
The CosmoGrid directory tree follows this structure:
CosmoGrid/dataset_type/simulation_set/cosmology/realization/
where:
-
dataset_type
are the availabe datasets, and currently containraw
,stage3_forecast
,processed/KiDS1000_data_products
. Theraw
dataset contains the raw data used to create other sets. -
simulation_set
: there are three simulation setsgrid
,fiducial
andbenchmarks
-
cosmology
: each directory contains simulations for a given cosmological parameter set -
realization
: each independent or quasi-independent realization for the given cosmology. This can be from a single, replicated N-body run, or using the shell permutation scheme
Simulation metadata
File CosmoGridV1_metainfo.h5
contains metadata useful for working with the simulaitons: cosmological parameters, random seeds, file paths, boundaries of shells, etc.
Simulations and input parameters
Dataset simulations
contains a list of all unique simulation runs, divided by simulation_set
. Dataset parameters
contains a list of all unique cosmology input parameters. The colums in these datasets are as follows, with parameters
containing only a relevant sub-set of them:
column | data type | content |
---|---|---|
As, H0, O_cdm, O_nu, Ob, Ol, Om, m_nu, ns, w0, wa, s8 | f8 | cosmological parameters |
bary_Mc, bary_nu | f8 | baryonification parameters |
pkd_seed | f8 | seed for initial conditions |
sobol_seed | i8 | sobol index (for the grid) |
seed_index | i4 | index of the initial conditions seed |
delta | S128 | type of the delta run (for the fiducial) |
sobol_index | i4 | index of the sobol sequence (for the grid) |
benchmark_type | S128 | benchmark type (for benchmarks: particle_count, box_size, fiducial_bench, redshift_resolution) |
id_sim | i4 | index of the unique simulation |
id_param | i4 | index of the unique parameter set |
path_sim | S128 | path to the simulation directory in the CosmoGrid dirs structure |
path_par | S128 | path to the parameters directory in the CosmoGrid dirs structure |
box_size_Mpc_over_h | f8 | box size of the simulation |
n_particles | i8 | number of particles of the simulation |
n_shells | i4 | number of shells generated |
n_steps | i4 | number of PkdGrav3 steps |
Shell information
The dataset shell_info
contains information about shell boundaries for each simulation. Inside this dataset one should follow the CosmoGridV1 file structure, for example shell_info/CosmoGrid/raw/grid/cosmo_203124
. These dataset names contain the raw
key, but it can be replaced by the relevant other set. This table contains the shell_info
data in an easy-to-access form, as described in Section Shell information below.
Simulation datasets
Here we describe the available simulations datasets. The main one is the raw
dataset. It was created during the Fluri et al. 2022 (F22) project for the KiDS-1000 analysis, but can be used for other new projects.
Raw lightcone shells
The raw lightcone shells are stored in CosmoGrid/raw
and contain: raw shells with nside=2048, snapshot halo catalogs, full sky shells at nside=512 with and without baryonification. The raw grid has been assigned baryon feedback parameters as in F22, but one can re-do this and apply a new different baryon feedback model, if needed.
file name | file content | comments |
---|---|---|
realiztation/compressed_shells.npz | raw shells at nside=2048 | numpy compressed store with two members: shells : contains the 69 shells, shell_info : table with information about shell boundaries (redshift, comoving distance in Mpc/h) |
cosmology/params.yml | yaml file with cosmology parameters, baryonification parameters, random seeds | baryonification parameters are only used for the exended grid |
cosmology/pkd_halos.tar.gz | compressed PkdGrav3 halo shapshots for each timestep | the compressed store contains files like pkd_halos/CosmoML.XXXXX.fofstats.0 , which are the raw PkdGrav3 halo output from the friends-of-friends algorithm, XXXXX is the time step |
cosmology/Halofile_MinParts=100.npz | profiled halos with minimum of 100 particles, from the F22 analysis | Numpy compressed store with the following fields shell_info as above, halos catalog of halos with profile parameters obtained by NFW profile fitting in F22 (see below for catalog fields description) |
cosmology/param_files.tar.gz | compressed collection of logs and configuration files | The files are: baryonification_params.py : configuration input to the shell baryonification code baryonified_shells.npz.info : log of the baryonification code class_processed.hdf5 : HDF5 file with background quantities as a function of redshift from the CLASS code concept.params : input to the CONCEPT initial conditions code cosmology.par : PkdGrav3 input file CosmoML.log : raw PkdGrav3 log |
cosmology/shells_nside=512.npz | numpy compressed store with 69 maps at nside=512 resolution, without baryonification | contains the same fields as compressed_shells.npz |
cosmology/baryonified_shells.npz | numpy compressed store with 69 maps at nside=512 resolution, with baryonification | contains the same fields as compressed_shells.npz |
cosmology/pkd_spectra.tar.gz | compressed raw PkdGrav3 power spectra output | a single filer per time step, like CosmoML.XXXXX.pk , where XXXXX is the time step |
Stage-III forecast probe maps
Probe maps that can be used for making forecasts for Stage-3 large scale structure surveys are stored in CosmoGrid/stage3_forecast
and condain: full sky projected weak lensing, intrinsic alignment, and galaxy clustering maps at nside=512 for a Stage-III survey forecast. This data is described in Kacprzak et al. 2022.
file name | file content | comments |
---|---|---|
realization/projected_probes_maps_baryonified512.h5 | HDF5 store with baryonified probe maps for 4 redshift bins for lensing, clustering and intrinsic alignment probes | the HDF5 file has the following structure: probe/sample |
realization/projected_probes_maps_nobaryons512.h5 | Same as above, but with no baryonification | same as above |
realization/shell_permutations_index.h5 | HDF5 store with information about the shell selection for the shell permutation scheme | contains datsets: shell_groups : list of shell groups taken from different simulations perms_info : information which simulation to use for each shell group and whether to apply rotations or flips (see below for description of this table) |
probe_weights_kg_ia_dg.h5 | HDF store with probe projection kernels, single value for shell mean redshift | datasets are organized as probe/sample |
KiDS-1000 weak lensing analysis data
The data used by Fluri et al. 2022 for the KiDS-1000 analysis with deep learning is stored in CosmoGrid/processed/KiDS1000_data_products
, and contains: KiDS-1000 lensing maps at nside=512, with and without baryonification, pre-processed noise maps that were used to train and evaluate the networks. This data is described in Fluri et al. 2022. It is stored in the TFRecord format and requires decoding. Additionally, only the relevant patches are stored. For the grid, we provide 250 tfrecord
files named grid_data_{num:03d}.tfrecord
, where num
corresponds to the file number. Each file contains the samples from 10 cosmological parameter combinations and their labels used for the evaluations of the networks used in Fluri et al. 2022. We provide these sample with or without applied baryonification. The fiducial maps are split into signal only patches with or without applied baryonification and pure noise maps that were used for training or evaluation. Furthermore, a single sample of a fiducial file contains the maps of all delta simulations with the same seed to alleviate the calculations of the derivatives. To decode the files we provide the read_TFR.py
script that implements the following functions:
def get_fidu_dset(fname, baryon=False):
"""
Returns a dataset of a fiducial TFRecord file
:param fname: file name to decode
:param baryon: If baryonification was applied to this file
:return: A tensorflow dataset where each sample is a dictionary containing either 15 (if baryon=False)
or 19 elements (if baryon=True). Each entry is named "patch_{num}" and has the shape
(149504, 10) where the first dimension indicates the pixels in NEST ordering and the second
dimension the shear field with the first 5 entries corresponding to gamma_1 of the five
redshift bins followed by gamma_2. The ordering of the patches is:
fiducial, -delta omega_m, +delta omega_m, -delta sigma_8, +delta sigma_8
-delta h_0, +delta h_0, -delta omega_b, +delta omega_b,
-delta n_s, +delta n_s, -delta w_0, +delta w_0,
-delta A_IA, +delta A_IA, -delta log10M_c, +delta log10M_c,
-delta nu, +delta nu
where the baryonification parameter patches (log10M_c and nu) are only included if baryon=True.
"""
def get_grid_dset(fname, baryon=False):
"""
Returns a dataset of a gird TFRecord file
:param fname: file name to decode
:param baryon: If baryonification was applied to this file
:return: A tensorflow dataset containing two elements. The first element has a shape of (149504, 10)
where the first dimension indicates the pixels in NEST ordering and the second
dimension the shear field with the first 5 entries corresponding to gamma_1 of the five
redshift bins followed by gamma_2. The second element is the label of the cosmology having a
length of 9 if baryon=False and 11 if baryon=True. The order of the label is:
omega_m, sigma_8, h_0, omega_b, n_s, w_0, A_IA, log10M_c, nu, omega_nu, A_s
where the baryonification parameters (log10M_c and nu) are only included if baryon=True.
"""
def get_noise_dset(fname):
"""
Returns a dataset of a noise TFRecord file
:param fname: file name to decode
:return: A tensorflow dataset containing an element of a noise map with a shape of (149504, 10)
where the first dimension indicates the pixels in NEST ordering and the second
dimension the shear field with the first 5 entries corresponding to gamma_1 of the five
redshift bins followed by gamma_2.
"""
Additionally we provide a file pixel_indices_NEST.npy
containing the indices of the patches such that they can be maped onto full sky maps, as the following snippet illustrates:
from read_TFR import get_fidu_dset
import healpy as hp
import numpy as np
# load the dataset
fname = ...
dset = get_fidu_dset(fname)
# get the first element
b = next(iter(dset))
# load the pixel indices
pix = np.load("pixel_indices_NEST.npy")
# create and fill the map, note that the map is in NEST ordering
m = np.zeros(hp.nside2npix(512))
m[pix] = b.numpy()[:,5] # gamma_1 of z-bin 5
# plot
hp.mollview(m, nest=True)
Catalog fields
In this section we describe fields in various catalogs contained by the files above.
Shell information
Information about shells boundaries and centers. Those are different for every cosmology, but the same for all realizations for the same cosmology. It can be found in compressed_shells.npz
in the shell_info
field.
field | data type | content |
---|---|---|
shell_name | U512 | file name of the original shell (not needed) |
shell_id | i4 | id of the shell |
lower_z | f4 | redshift of the lower boundary of the shell |
upper_z | f4 | redshift of the upper boundary of the shell |
lower_com | f4 | comoving distance to the lower boundary of the shell |
upper_com | f4 | comoving distance to the upper boundary of the shell |
shell_com | f4 | comoving distance to the center of the shell |
Profiled halo catalogs
The halo catalog with profile paameters are condained in files like Halofile_MinParts=100.npz
and have the following attributes for each halo:
field | data type | content |
---|---|---|
ID | <i8 | Unique ID number of the halo. |
IDhost | <i8 | ID of the host halo, a dummy variable set to -1. |
Mvir | <f8 | NFW profile viral mass of the halo [Msun/h]. |
Nvir | <i8 | Number of particles inside the NFW viral radius. |
x | <f8 | x-coordinate of the halo in the lightcone [kpc/h]. |
y | <f8 | y-coordinate of the halo in the lightcone [kpc/h]. |
z | <f8 | z-coordinate of the halo in the lightcone [kpc/h]. |
rvir | <f8 | NFW viral radius of the halo [kpc/h]. |
cvir | <f8 | NFW concentration of the halo. |
tfNFW_cvir | <f8 | Concentration of the truncated NFW profile, dummy variable set to -1. |
tfNFW_tau | <f8 | Tau of the truncated NFW profile, dummy variable set to -1. |
tfNFW_Mvir | <f8 | Viral mass of the truncated NFW profile, dummy variable set to -1. |
shell_id | <i4 | ID of the shell to which the halo corresponds, see shell_info tables in the shell files. |
Shell permutation index
Configuration for shell permutation scheme. It can be found in shell_permutations_index.h5
files.
field | data type | content |
---|---|---|
id_sim | <i4 | simulation id for the shell group (0-6 for the grid and 0-199 for the fiducial |
rot | <i4 | 0, 1, 2, or 3, apply the rotation by a multiple of rot * 45 deg |
flip_ud | <i4 | 0 or 1, if to apply the the up-down flip |
flip_lr | <i4 | 0 or 1, if to apply the the left-right flip |