Skip to content

halocat

halocat is a small Python package for processing GLAM halo catalogues from the DEGRACE-pilot simulation suite. It turns the raw plain-text CatshortV.*.DAT catalogues into per-realisation HDF5 mirrors and computes three derived statistics on top of them:

  • the halo mass function (HMF),
  • the halo–halo two-point correlation function xi_hh, and
  • pairwise velocity moments via the pairvel / PairVel.jl wrapper.

Every measurement is load-or-measure: existing outputs are read straight from disk, and missing ones are produced on demand by the pipeline.

What it gives you

  • halocat.io.read_halo_hdf5 — read a reformatted halo catalogue as a dict[str, np.ndarray] keyed by halocat.config.CATALOGUE_COLUMNS.
  • halocat.HMFLoader / halocat.XiHHLoader — load-or-measure data loaders returning typed records (HMFRecord, XiHHRecord) and supporting sub-grid stacking via get_grid(...).
  • halocat.XiHHLoader.measure_pairxi_hh(r | bin1, bin2) for an arbitrary pair of finite-width log-mass bins, returned as an in-memory XiHHPairRecord (never written to disk).
  • halocat.pipeline.run_single / run_all — orchestrate the full pipeline for one realisation or the full sub-grid.
  • A halocat console-script and scripts/ drivers for batch use on COSMA.

At a glance

from halocat import config as C, HMFLoader, XiHHLoader
from halocat.io import read_halo_hdf5

# 1. Load a halo catalogue
halo_path = f"{C.get_output_dir('LCDM', 0.25, 1, 1)}/halo.hdf5"
data = read_halo_hdf5(halo_path)

# 2. Load (or measure) the HMF and the static-grid xi_hh
hmf = HMFLoader().get("LCDM", 0.25, imodel=1, ibox=1)
xi  = XiHHLoader().get("LCDM", 0.25, imodel=1, ibox=1)

# 3. Custom mass-bin pair, in-memory only
rec = XiHHLoader().measure_pair(
    "LCDM", 0.25, 1, 1,
    log10M1=(13.0, 13.3), log10M2=(13.7, 14.0),
)
print(rec.r.shape, rec.xi.shape)

Where to next

  • Installation — set up the cosemu environment and install halocat in editable mode.
  • Quick start — walk through one full realisation end to end.
  • User guide — task-oriented pages for each data product.
  • API reference — auto-generated from the source docstrings.

Convention reminders

r centres for xi_hh

Bin centres are always the arithmetic mean of r_edges. The package never trusts pycorr.sepavg because it returns NaN for empty bins and varies between realisations, which would break sub-grid stacking.

Source .DAT files are read-only

The pipeline never modifies CatshortV.*.DAT. The reformatter writes a fresh HDF5 mirror under halocat.config.OUTPUT_ROOT and downstream stages only read from there.