Skip to content

Halo catalogues

A halo catalogue holds the per-halo state vectors for one (gravity, redshift, imodel, ibox) realisation. The pipeline keeps two on-disk representations of it:

  • the upstream plain-text CatshortV.*.DAT files (read-only, never modified), and
  • a per-realisation HDF5 mirror halo.hdf5 written under OUTPUT_ROOT.

Downstream stages always read from halo.hdf5; the .DAT is parsed exactly once, when the HDF5 mirror is missing.

Output layout

<OUTPUT_ROOT>/gravity_<gravity>/z_<z:.2f>/imodel_<imodel>/box_<ibox>/halo.hdf5

halocat.config.get_output_dir gives you the directory:

from halocat import config as C

out_dir = C.get_output_dir("LCDM", 0.25, imodel=1, ibox=1)
halo_path = f"{out_dir}/halo.hdf5"

Reading the catalogue

read_halo_hdf5 returns a plain dict[str, np.ndarray] keyed by CATALOGUE_COLUMNS:

from halocat.io import read_halo_hdf5
from halocat import config as C

data = read_halo_hdf5(halo_path)
print(len(data[C.MASS_COLUMN]), "haloes")
print("columns:", list(data))

The standard columns are:

Column Description Units
x, y, z comoving position Mpc/h
vx, vy, vz peculiar velocity km/s
Mtot total halo mass (the mass column) M⊙/h
Mbound bound-particle mass M⊙/h
Rvir virial radius Mpc/h
Vrms, Vcirc velocity dispersion / circular vel km/s
Cvir, Lambda, Xoff, ... shape & spin diagnostics

MASS_COLUMN (= "Mtot") is the column used by all downstream measurements.

File-level attributes

halo.hdf5 carries a small set of file-level attributes describing the realisation:

import h5py
with h5py.File(halo_path, "r") as f:
    print(dict(f.attrs))
# {'box_size': 1024.0, 'gravity': 'LCDM', 'ibox': 1, 'imodel': 1,
#  'redshift': 0.25, 'snapnum': 137,
#  'source_dat': '/cosma8/.../CatshortV.0137.DAT'}

Auto-reformat from .DAT

If halo.hdf5 is missing, run_single reformats it on demand:

from halocat.pipeline import run_single

status = run_single(
    "LCDM", 0.25, imodel=1, ibox=1,
    do_halo=True, do_hmf=False, do_tpcf=False, do_vel=False,
)
assert status["ok"]

The same path is taken transparently by HMFLoader.get, XiHHLoader.get, and XiHHLoader.measure_pair when they need halo data and the HDF5 mirror is not yet on disk.

Fiducial-cosmology realisations

Three fiducial cosmology runs are exposed through the same loaders by sentinel imodel values:

  • ("LCDM", imodel=0) — DESI_MGx100/GR (ΛCDM fiducial)
  • ("fRn1", imodel=0) — F5n1 (|f_R0| = 1e-5)
  • ("fRn1", imodel=-1) — F6n1 (|f_R0| = 1e-6)

Each has 100 boxes (ibox=1..100) and 27 snapshots. See Configuration → Fiducial cosmology runs for the full table of redshifts and source paths.

status = run_single("LCDM", 0.25, imodel=0, ibox=1)   # writes halo.hdf5 etc.
hmf    = HMFLoader().get("LCDM", 0.25, imodel=0, ibox=1)
xi     = XiHHLoader().get("fRn1", 0.25, imodel=-1, ibox=1)   # F6n1

Worked example

scripts/example_load_halo.py is a minimal CLI walkthrough — it loads a realisation, prints attributes / column summary, demonstrates a mass cut, and constructs (N, 3) position/velocity arrays:

python3 scripts/example_load_halo.py --gravity LCDM \
    --redshift 0.25 --imodel 1 --ibox 1 --mass-cut 13.5