`halocat.dataloader`¶

dataloader ¶

Lazy load-or-measure dataloaders for halo statistics.

Provides: - HMFLoader → HMFRecord for hmf.hdf5 - XiHHLoader → XiHHRecord for xi_hh.hdf5

In each case loader.get(gravity, redshift, imodel, ibox) reads the existing file if present, otherwise runs the corresponding stage of halocat.pipeline.run_single to measure it on demand.

Both classes inherit the shared load-or-measure plumbing (constructor, path / exists / get / measure) from :class:_BaseHaloStatLoader. The per-statistic get_grid stays in each subclass because the stacked-array shape varies.

HMFRecord `dataclass` ¶

HMFRecord(log10M_bin_edges: ndarray, log10M_bin_left: ndarray, log10M_bin_centre: ndarray, counts: ndarray, dndlog10M: ndarray, n_gt_M: ndarray, attrs: dict)

In-memory view of a single hmf.hdf5 file.

log10M_bin_edges `instance-attribute` ¶

log10M_bin_edges: ndarray

log10M_bin_left `instance-attribute` ¶

log10M_bin_left: ndarray

log10M_bin_centre `instance-attribute` ¶

log10M_bin_centre: ndarray

counts `instance-attribute` ¶

counts: ndarray

dndlog10M `instance-attribute` ¶

dndlog10M: ndarray

n_gt_M `instance-attribute` ¶

n_gt_M: ndarray

attrs `instance-attribute` ¶

attrs: dict

gravity `property` ¶

gravity: str

redshift `property` ¶

redshift: float

imodel `property` ¶

imodel: int

ibox `property` ¶

ibox: int

box_size `property` ¶

box_size: float

HMFLoader ¶

HMFLoader(overwrite: bool = False, write_halo_hdf5: bool = False, logger=None)

Bases: _BaseHaloStatLoader

Load-or-measure dataloader for the halo mass function.

Parameters:

Name	Type	Description	Default
`overwrite`	`bool`	If True, always re-measure even when `hmf.hdf5` already exists.	`False`
`write_halo_hdf5`	`bool`	Force-rewrite `halo.hdf5` (re-run the DAT→HDF5 reformat) when measuring with `overwrite=True`. Default False. The pipeline always reads halo data from `halo.hdf5` and auto-reformats it from the `.DAT` source if the HDF5 file is missing, so this flag only controls forced re-reformatting, not whether the file exists.	`False`
`logger`	`Logger or None`	Optional logger. A default one is created if omitted.	`None`

Examples:

>>> loader = HMFLoader()
>>> rec = loader.get("LCDM", 0.25, imodel=1, ibox=1)
>>> rec.dndlog10M[:3]
array([...])

>>> grid = loader.get_grid(
...     gravities=["LCDM"], redshifts=[0.25, 0.50],
...     imodels=[1, 2], iboxes=[1],
... )
>>> grid["dndlog10M"].shape
(1, 2, 2, 1, 30)

Source code in halocat/dataloader.py

def __init__(self, overwrite: bool = False, write_halo_hdf5: bool = False,
             logger=None):
    self.overwrite = overwrite
    self.write_halo_hdf5 = write_halo_hdf5
    self.log = logger or get_logger("halocat.dataloader")

output_filename `class-attribute` `instance-attribute` ¶

output_filename = 'hmf.hdf5'

overwrite `instance-attribute` ¶

overwrite = overwrite

write_halo_hdf5 `instance-attribute` ¶

write_halo_hdf5 = write_halo_hdf5

log `instance-attribute` ¶

log = logger or get_logger('halocat.dataloader')

get_grid ¶

get_grid(gravities: list[str] | None = None, redshifts: list[float] | None = None, imodels: list[int] | None = None, iboxes: list[int] | None = None, skip_missing: bool = False) -> dict

Load (or measure) HMFs across a sub-grid and stack them.

Returns a dict with axis labels, the shared mass binning (edges, left edges, and centres), and arrays of shape (G, Z, M, B, K) for dndlog10M / counts / n_gt_M, plus a bool present mask of shape (G, Z, M, B).

With the default skip_missing=False and overwrite=False, any missing realisation is measured on demand. Set skip_missing=True to leave gaps as NaN instead of triggering measurement.

Source code in halocat/dataloader.py

def get_grid(self,
             gravities: list[str] | None = None,
             redshifts: list[float] | None = None,
             imodels: list[int] | None = None,
             iboxes: list[int] | None = None,
             skip_missing: bool = False) -> dict:
    """Load (or measure) HMFs across a sub-grid and stack them.

    Returns a dict with axis labels, the shared mass binning (edges,
    left edges, and centres), and arrays of shape ``(G, Z, M, B, K)``
    for ``dndlog10M`` / ``counts`` / ``n_gt_M``, plus a bool ``present``
    mask of shape ``(G, Z, M, B)``.

    With the default ``skip_missing=False`` and ``overwrite=False``, any
    missing realisation is measured on demand. Set ``skip_missing=True``
    to leave gaps as NaN instead of triggering measurement.
    """
    gravities = list(gravities) if gravities else list(C.GRAVITIES)
    redshifts = list(redshifts) if redshifts else list(C.REDSHIFTS)
    imodels = list(imodels) if imodels else list(C.IMODELS)
    iboxes = list(iboxes) if iboxes else list(C.IBOXES)

    G, Z, M, B = len(gravities), len(redshifts), len(imodels), len(iboxes)

    centre = edges = left = None
    dndlog10M = counts = n_gt_M = None
    present = np.zeros((G, Z, M, B), dtype=bool)

    for ig, gravity in enumerate(gravities):
        for iz, z in enumerate(redshifts):
            for im, imodel in enumerate(imodels):
                for ib, ibox in enumerate(iboxes):
                    path = self.path(gravity, z, imodel, ibox)
                    if skip_missing and not os.path.exists(path):
                        continue
                    rec = self.get(gravity, z, imodel, ibox)

                    if centre is None:
                        centre = rec.log10M_bin_centre
                        edges = rec.log10M_bin_edges
                        left = rec.log10M_bin_left
                        K = centre.size
                        dndlog10M = np.full((G, Z, M, B, K), np.nan)
                        counts = np.full((G, Z, M, B, K), np.nan)
                        n_gt_M = np.full((G, Z, M, B, K), np.nan)
                    elif rec.log10M_bin_centre.shape != centre.shape \
                            or not np.allclose(rec.log10M_bin_centre, centre):
                        raise ValueError(
                            f"inconsistent HMF binning in {path}"
                        )

                    dndlog10M[ig, iz, im, ib] = rec.dndlog10M
                    counts[ig, iz, im, ib] = rec.counts
                    n_gt_M[ig, iz, im, ib] = rec.n_gt_M
                    present[ig, iz, im, ib] = True

    if centre is None:
        raise FileNotFoundError(
            "no HMF files found in the requested sub-grid"
        )

    return {
        "gravities": np.array(gravities),
        "redshifts": np.array(redshifts),
        "imodels": np.array(imodels),
        "iboxes": np.array(iboxes),
        "log10M_bin_edges": edges,
        "log10M_bin_left": left,
        "log10M_bin_centre": centre,
        "dndlog10M": dndlog10M,
        "counts": counts,
        "n_gt_M": n_gt_M,
        "present": present,
    }

path `classmethod` ¶

path(gravity: str, redshift: float, imodel: int, ibox: int) -> str

Source code in halocat/dataloader.py

@classmethod
def path(cls, gravity: str, redshift: float, imodel: int, ibox: int) -> str:
    return os.path.join(
        C.get_output_dir(gravity, redshift, imodel, ibox),
        cls.output_filename,
    )

exists `classmethod` ¶

exists(gravity: str, redshift: float, imodel: int, ibox: int) -> bool

Source code in halocat/dataloader.py

@classmethod
def exists(cls, gravity: str, redshift: float,
           imodel: int, ibox: int) -> bool:
    return os.path.exists(cls.path(gravity, redshift, imodel, ibox))

measure ¶

measure(gravity: str, redshift: float, imodel: int, ibox: int)

Force measurement and return the record.

Source code in halocat/dataloader.py

def measure(self, gravity: str, redshift: float,
            imodel: int, ibox: int):
    """Force measurement and return the record."""
    kw = dict(do_halo=self.write_halo_hdf5,
              do_hmf=False, do_tpcf=False, do_vel=False)
    kw.update(self._RUN_KWARGS)
    status = run_single(
        gravity, redshift, imodel, ibox,
        overwrite=True, logger=self.log, **kw,
    )
    if not status["ok"]:
        raise RuntimeError(
            f"{self._STAT_NAME} measurement failed for {gravity} "
            f"z={redshift:.2f} m{imodel} b{ibox}: {status['error']}"
        )
    return self._read(self.path(gravity, redshift, imodel, ibox))

get ¶

get(gravity: str, redshift: float, imodel: int, ibox: int)

Return the record, measuring on demand if missing.

Source code in halocat/dataloader.py

def get(self, gravity: str, redshift: float,
        imodel: int, ibox: int):
    """Return the record, measuring on demand if missing."""
    path = self.path(gravity, redshift, imodel, ibox)
    if self.overwrite or not os.path.exists(path):
        self.log.info(
            f"{type(self).__name__}: measuring {gravity} "
            f"z={redshift:.2f} m{imodel} b{ibox}"
        )
        return self.measure(gravity, redshift, imodel, ibox)
    return self._read(path)

XiHHPairRecord `dataclass` ¶

XiHHPairRecord(r_edges: ndarray, r: ndarray, xi: ndarray, log10M1: ndarray, log10M2: ndarray, n1: int, n2: int, attrs: dict)

In-memory ξ_hh for one user-supplied mass-bin pair (never on disk).

Returned by :py:meth:XiHHLoader.measure_pair. Unlike :class:XiHHRecord, which collects all auto and cross combinations from the static grid stored in xi_hh.hdf5, this record holds one correlation for an arbitrary pair of finite-width mass bins (log10M1, log10M2) chosen at call time.

Attributes:

Name	Type	Description
`r_edges`	`ndarray`	Shape `(K + 1,)` separation bin edges (Mpc/h).
`r`	`ndarray`	Shape `(K,)` arithmetic centres of `r_edges` (never `pycorr.sepavg`).
`xi`	`ndarray`	Shape `(K,)` correlation values. `NaN` in bins with no pairs or with an empty selection on either side.
`log10M1, log10M2`	`ndarray`	Shape `(2,)` `[lo, hi]` log10(M / [Msun/h]) bin edges of the two halo samples that were correlated.
`n1, n2`	`int`	Halo counts selected in `log10M1` and `log10M2`.
`attrs`	`dict`	Realisation metadata: `"gravity"`, `"redshift"`, `"imodel"`, `"ibox"`, `"box_size"`.

Notes

gravity / redshift / imodel / ibox / box_size are also exposed as read-only properties for convenience. is_auto returns True iff log10M1 == log10M2.

r_edges `instance-attribute` ¶

r_edges: ndarray

r `instance-attribute` ¶

r: ndarray

xi `instance-attribute` ¶

xi: ndarray

log10M1 `instance-attribute` ¶

log10M1: ndarray

log10M2 `instance-attribute` ¶

log10M2: ndarray

n1 `instance-attribute` ¶

n1: int

n2 `instance-attribute` ¶

n2: int

attrs `instance-attribute` ¶

attrs: dict

gravity `property` ¶

gravity: str

redshift `property` ¶

redshift: float

imodel `property` ¶

imodel: int

ibox `property` ¶

ibox: int

box_size `property` ¶

box_size: float

is_auto `property` ¶

is_auto: bool

XiHHRecord `dataclass` ¶

XiHHRecord(r_edges: ndarray, mass_bins: ndarray, pairs: list, pair_indices: ndarray, r: ndarray, xi: ndarray, n1: ndarray, n2: ndarray, log10M1: ndarray, log10M2: ndarray, attrs: dict)

In-memory view of a single xi_hh.hdf5 file.

Attributes:

Name	Type	Description
`r_edges`	`(K+1,) separation bin edges (Mpc/h).`
`mass_bins`	`(P, 2) array of (log10M_low, log10M_high) for each mass bin`	the file was measured for.
`pairs`	`list of group names, e.g. ['M0_M0', 'M0_M1', ...].`
`pair_indices`	`(Q, 2) int array of the (i, j) mass-bin indices for each`	pair group, with i ≤ j (auto + cross).
`r`	`(Q, K) per-pair separation centres (or sepavg from pycorr).`
`xi`	`(Q, K) per-pair correlation function.`
`n1, n2`	`(Q,) int counts of haloes used in each side of each pair.`
`log10M1, log10M2`	`(Q, 2) per-pair mass-bin edges.`
`attrs`	`top-level HDF5 attributes.`

r_edges `instance-attribute` ¶

r_edges: ndarray

mass_bins `instance-attribute` ¶

mass_bins: ndarray

pairs `instance-attribute` ¶

pairs: list

pair_indices `instance-attribute` ¶

pair_indices: ndarray

r `instance-attribute` ¶

r: ndarray

xi `instance-attribute` ¶

xi: ndarray

n1 `instance-attribute` ¶

n1: ndarray

n2 `instance-attribute` ¶

n2: ndarray

log10M1 `instance-attribute` ¶

log10M1: ndarray

log10M2 `instance-attribute` ¶

log10M2: ndarray

attrs `instance-attribute` ¶

attrs: dict

gravity `property` ¶

gravity: str

redshift `property` ¶

redshift: float

imodel `property` ¶

imodel: int

ibox `property` ¶

ibox: int

box_size `property` ¶

box_size: float

XiHHLoader ¶

XiHHLoader(overwrite: bool = False, write_halo_hdf5: bool = False, logger=None)

Bases: _BaseHaloStatLoader

Load-or-measure dataloader for the halo–halo 2PCF.

Parameters:

Name	Type	Description	Default
`overwrite`	`bool`	If True, always re-measure even when `xi_hh.hdf5` already exists.	`False`
`write_halo_hdf5`	`bool`	Force-rewrite `halo.hdf5` (re-run the DAT→HDF5 reformat) when measuring with `overwrite=True`. Halo data are always read from `halo.hdf5`; if the file is missing the pipeline auto-reformats it from the `.DAT` source, so this flag only affects forced re-reformatting.	`False`
`logger`	`Logger or None`	Optional logger.	`None`

Examples:

>>> loader = XiHHLoader()
>>> rec = loader.get("LCDM", 0.25, imodel=1, ibox=1)
>>> rec.xi.shape           # (n_pairs, n_r_bins)
(10, 60)
>>> rec.pairs[:3]
['M0_M0', 'M0_M1', 'M0_M2']

Source code in halocat/dataloader.py

def __init__(self, overwrite: bool = False, write_halo_hdf5: bool = False,
             logger=None):
    self.overwrite = overwrite
    self.write_halo_hdf5 = write_halo_hdf5
    self.log = logger or get_logger("halocat.dataloader")

output_filename `class-attribute` `instance-attribute` ¶

output_filename = 'xi_hh.hdf5'

overwrite `instance-attribute` ¶

overwrite = overwrite

write_halo_hdf5 `instance-attribute` ¶

write_halo_hdf5 = write_halo_hdf5

log `instance-attribute` ¶

log = logger or get_logger('halocat.dataloader')

measure_pair ¶

measure_pair(gravity: str, redshift: float, imodel: int, ibox: int, log10M1: tuple[float, float], log10M2: tuple[float, float] | None = None, *, r_edges: ndarray | None = None) -> XiHHPairRecord

Measure ξ_hh(r) for one custom mass-bin pair, in-memory only.

Unlike :py:meth:get and :py:meth:measure, this method never writes to disk and never reads xi_hh.hdf5. It loads the halo catalogue, applies the requested (log10M1, log10M2) selection, and runs pycorr for that single pair only. The halo catalogue is read from halo.hdf5; if the HDF5 mirror is missing, the source CatshortV.*.DAT is reformatted on demand (the .DAT itself is never modified).

Parameters:

Name	Type	Description	Default
`gravity`	`str`	One of :data:`halocat.config.GRAVITIES`.	required
`redshift`	`float`	Must be a key of :data:`halocat.config.Z_SNAP_MAP`.	required
`imodel`	`int`	Cosmological model index (1..64).	required
`ibox`	`int`	Realisation index (1..5).	required
`log10M1`	`tuple of (float, float)`	`(lo, hi)` log10(M / [Msun/h]) bin edges; haloes with `lo <= log10M < hi` are selected for the first sample.	required
`log10M2`	`tuple of (float, float) or None`	Edges for the second sample. `None` (the default) requests the auto-correlation of `log10M1`.	`None`
`r_edges`	`ndarray`	Separation bin edges (Mpc/h). Defaults to :data:`halocat.config.R_EDGES` (which may have been overridden via `apply_xi_hh_config` from the YAML config).	`None`

Returns:

Name	Type	Description
`record`	`XiHHPairRecord`	In-memory record carrying `r`, `xi`, the mass-bin metadata, and halo counts. `r` is the arithmetic mean of `r_edges`; empty bins are flagged with `xi = NaN`.

Raises:

Type	Description
`RuntimeError`	If `halo.hdf5` is missing and the on-demand reformat from the source `CatshortV.*.DAT` fails.

Examples:

Auto-correlation of a single finite-width bin:

>>> loader = XiHHLoader()
>>> rec = loader.measure_pair(
...     "LCDM", 0.25, imodel=1, ibox=1,
...     log10M1=(13.0, 13.5),
... )
>>> rec.is_auto, rec.n1, rec.xi.shape
(True, 87587, (60,))

Cross-correlation between two non-overlapping bins, with custom r_edges:

>>> import numpy as np
>>> rec_x = loader.measure_pair(
...     "LCDM", 0.25, 1, 1,
...     log10M1=(13.0, 13.3),
...     log10M2=(13.7, 14.0),
...     r_edges=np.logspace(-1, 2, 31),
... )
>>> rec_x.is_auto
False

Source code in halocat/dataloader.py

def measure_pair(self,
                 gravity: str, redshift: float, imodel: int, ibox: int,
                 log10M1: tuple[float, float],
                 log10M2: tuple[float, float] | None = None,
                 *,
                 r_edges: np.ndarray | None = None) -> XiHHPairRecord:
    """Measure ξ_hh(r) for one custom mass-bin pair, in-memory only.

    Unlike :py:meth:`get` and :py:meth:`measure`, this method never
    writes to disk and never reads ``xi_hh.hdf5``. It loads the halo
    catalogue, applies the requested ``(log10M1, log10M2)``
    selection, and runs ``pycorr`` for that single pair only. The
    halo catalogue is read from ``halo.hdf5``; if the HDF5 mirror is
    missing, the source ``CatshortV.*.DAT`` is reformatted on demand
    (the ``.DAT`` itself is never modified).

    Parameters
    ----------
    gravity : str
        One of :data:`halocat.config.GRAVITIES`.
    redshift : float
        Must be a key of :data:`halocat.config.Z_SNAP_MAP`.
    imodel : int
        Cosmological model index (1..64).
    ibox : int
        Realisation index (1..5).
    log10M1 : tuple of (float, float)
        ``(lo, hi)`` log10(M / [Msun/h]) bin edges; haloes with
        ``lo <= log10M < hi`` are selected for the first sample.
    log10M2 : tuple of (float, float) or None, optional
        Edges for the second sample. ``None`` (the default) requests
        the auto-correlation of ``log10M1``.
    r_edges : numpy.ndarray, optional
        Separation bin edges (Mpc/h). Defaults to
        :data:`halocat.config.R_EDGES` (which may have been
        overridden via ``apply_xi_hh_config`` from the YAML config).

    Returns
    -------
    record : XiHHPairRecord
        In-memory record carrying ``r``, ``xi``, the mass-bin
        metadata, and halo counts. ``r`` is the arithmetic mean of
        ``r_edges``; empty bins are flagged with ``xi = NaN``.

    Raises
    ------
    RuntimeError
        If ``halo.hdf5`` is missing and the on-demand reformat from
        the source ``CatshortV.*.DAT`` fails.

    Examples
    --------
    Auto-correlation of a single finite-width bin:

    >>> loader = XiHHLoader()
    >>> rec = loader.measure_pair(
    ...     "LCDM", 0.25, imodel=1, ibox=1,
    ...     log10M1=(13.0, 13.5),
    ... )
    >>> rec.is_auto, rec.n1, rec.xi.shape
    (True, 87587, (60,))

    Cross-correlation between two non-overlapping bins, with custom
    ``r_edges``:

    >>> import numpy as np
    >>> rec_x = loader.measure_pair(
    ...     "LCDM", 0.25, 1, 1,
    ...     log10M1=(13.0, 13.3),
    ...     log10M2=(13.7, 14.0),
    ...     r_edges=np.logspace(-1, 2, 31),
    ... )
    >>> rec_x.is_auto
    False
    """
    from .io import read_halo_hdf5
    from .tpcf import _compute_xi_one

    log10M1 = tuple(float(x) for x in log10M1)
    if log10M2 is None:
        log10M2 = log10M1
    else:
        log10M2 = tuple(float(x) for x in log10M2)

    halo_path = os.path.join(
        C.get_output_dir(gravity, redshift, imodel, ibox), "halo.hdf5"
    )
    if not os.path.exists(halo_path):
        status = run_single(
            gravity, redshift, imodel, ibox,
            overwrite=False, logger=self.log,
            do_halo=True, do_hmf=False, do_tpcf=False, do_vel=False,
        )
        if not status["ok"]:
            raise RuntimeError(
                f"halo.hdf5 reformat failed for {gravity} "
                f"z={redshift:.2f} m{imodel} b{ibox}: {status['error']}"
            )

    data = read_halo_hdf5(halo_path)
    log10m = np.log10(data[C.MASS_COLUMN])

    sel1 = (log10m >= log10M1[0]) & (log10m < log10M1[1])
    pos1 = np.array([data["x"][sel1], data["y"][sel1], data["z"][sel1]])

    is_auto = log10M1 == log10M2
    if is_auto:
        pos2 = None
        n2 = int(pos1.shape[1])
    else:
        sel2 = (log10m >= log10M2[0]) & (log10m < log10M2[1])
        pos2 = np.array([data["x"][sel2], data["y"][sel2], data["z"][sel2]])
        n2 = int(pos2.shape[1])

    r_edges_arr = (np.asarray(r_edges, dtype=np.float64)
                   if r_edges is not None
                   else np.asarray(C.R_EDGES, dtype=np.float64))
    r_centres = 0.5 * (r_edges_arr[:-1] + r_edges_arr[1:])

    self.log.info(
        f"XiHHLoader.measure_pair: {gravity} z={redshift:.2f} "
        f"m{imodel} b{ibox}  log10M1={log10M1} log10M2={log10M2}  "
        f"n1={int(pos1.shape[1])} n2={n2} ({'auto' if is_auto else 'cross'})"
    )

    xi = _compute_xi_one(pos1, pos2, r_edges_arr, float(C.BOX_SIZE))

    return XiHHPairRecord(
        r_edges=r_edges_arr,
        r=r_centres,
        xi=xi,
        log10M1=np.asarray(log10M1, dtype=np.float64),
        log10M2=np.asarray(log10M2, dtype=np.float64),
        n1=int(pos1.shape[1]),
        n2=n2,
        attrs={
            "gravity": gravity,
            "redshift": float(redshift),
            "imodel": int(imodel),
            "ibox": int(ibox),
            "box_size": float(C.BOX_SIZE),
        },
    )

get_grid ¶

get_grid(gravities: list[str] | None = None, redshifts: list[float] | None = None, imodels: list[int] | None = None, iboxes: list[int] | None = None, skip_missing: bool = False) -> dict

Load (or measure) xi_hh across a sub-grid and stack.

Returns a dict with axis labels, the shared r_edges / mass_bins / pairs metadata, and arrays of shape (G, Z, M, B, P, K) for r / xi, plus n1 / n2 of shape (G, Z, M, B, P) and a present mask of shape (G, Z, M, B).

Source code in halocat/dataloader.py

def get_grid(self,
             gravities: list[str] | None = None,
             redshifts: list[float] | None = None,
             imodels: list[int] | None = None,
             iboxes: list[int] | None = None,
             skip_missing: bool = False) -> dict:
    """Load (or measure) xi_hh across a sub-grid and stack.

    Returns a dict with axis labels, the shared ``r_edges`` / ``mass_bins``
    / ``pairs`` metadata, and arrays of shape ``(G, Z, M, B, P, K)`` for
    ``r`` / ``xi``, plus ``n1`` / ``n2`` of shape ``(G, Z, M, B, P)`` and
    a ``present`` mask of shape ``(G, Z, M, B)``.
    """
    gravities = list(gravities) if gravities else list(C.GRAVITIES)
    redshifts = list(redshifts) if redshifts else list(C.REDSHIFTS)
    imodels = list(imodels) if imodels else list(C.IMODELS)
    iboxes = list(iboxes) if iboxes else list(C.IBOXES)

    G, Z, M, B = len(gravities), len(redshifts), len(imodels), len(iboxes)

    r_edges = mass_bins = pairs = pair_indices = None
    r_arr = xi_arr = None
    n1_arr = n2_arr = None
    present = np.zeros((G, Z, M, B), dtype=bool)

    for ig, gravity in enumerate(gravities):
        for iz, z in enumerate(redshifts):
            for im, imodel in enumerate(imodels):
                for ib, ibox in enumerate(iboxes):
                    path = self.path(gravity, z, imodel, ibox)
                    if skip_missing and not os.path.exists(path):
                        continue
                    rec = self.get(gravity, z, imodel, ibox)

                    if pairs is None:
                        r_edges = rec.r_edges
                        mass_bins = rec.mass_bins
                        pairs = list(rec.pairs)
                        pair_indices = rec.pair_indices
                        P = len(pairs)
                        K = rec.r.shape[1]
                        r_arr = np.full((G, Z, M, B, P, K), np.nan)
                        xi_arr = np.full((G, Z, M, B, P, K), np.nan)
                        n1_arr = np.full((G, Z, M, B, P), -1, dtype=int)
                        n2_arr = np.full((G, Z, M, B, P), -1, dtype=int)
                    elif list(rec.pairs) != pairs or rec.r.shape[1] != K:
                        raise ValueError(
                            f"inconsistent xi_hh schema in {path}"
                        )

                    r_arr[ig, iz, im, ib] = rec.r
                    xi_arr[ig, iz, im, ib] = rec.xi
                    n1_arr[ig, iz, im, ib] = rec.n1
                    n2_arr[ig, iz, im, ib] = rec.n2
                    present[ig, iz, im, ib] = True

    if pairs is None:
        raise FileNotFoundError(
            "no xi_hh files found in the requested sub-grid"
        )

    return {
        "gravities": np.array(gravities),
        "redshifts": np.array(redshifts),
        "imodels": np.array(imodels),
        "iboxes": np.array(iboxes),
        "r_edges": r_edges,
        "mass_bins": mass_bins,
        "pairs": np.array(pairs),
        "pair_indices": pair_indices,
        "r": r_arr,
        "xi": xi_arr,
        "n1": n1_arr,
        "n2": n2_arr,
        "present": present,
    }

path `classmethod` ¶

path(gravity: str, redshift: float, imodel: int, ibox: int) -> str

Source code in halocat/dataloader.py

@classmethod
def path(cls, gravity: str, redshift: float, imodel: int, ibox: int) -> str:
    return os.path.join(
        C.get_output_dir(gravity, redshift, imodel, ibox),
        cls.output_filename,
    )

exists `classmethod` ¶

exists(gravity: str, redshift: float, imodel: int, ibox: int) -> bool

Source code in halocat/dataloader.py

@classmethod
def exists(cls, gravity: str, redshift: float,
           imodel: int, ibox: int) -> bool:
    return os.path.exists(cls.path(gravity, redshift, imodel, ibox))

measure ¶

measure(gravity: str, redshift: float, imodel: int, ibox: int)

Force measurement and return the record.

Source code in halocat/dataloader.py

def measure(self, gravity: str, redshift: float,
            imodel: int, ibox: int):
    """Force measurement and return the record."""
    kw = dict(do_halo=self.write_halo_hdf5,
              do_hmf=False, do_tpcf=False, do_vel=False)
    kw.update(self._RUN_KWARGS)
    status = run_single(
        gravity, redshift, imodel, ibox,
        overwrite=True, logger=self.log, **kw,
    )
    if not status["ok"]:
        raise RuntimeError(
            f"{self._STAT_NAME} measurement failed for {gravity} "
            f"z={redshift:.2f} m{imodel} b{ibox}: {status['error']}"
        )
    return self._read(self.path(gravity, redshift, imodel, ibox))

get ¶

get(gravity: str, redshift: float, imodel: int, ibox: int)

Return the record, measuring on demand if missing.

Source code in halocat/dataloader.py

def get(self, gravity: str, redshift: float,
        imodel: int, ibox: int):
    """Return the record, measuring on demand if missing."""
    path = self.path(gravity, redshift, imodel, ibox)
    if self.overwrite or not os.path.exists(path):
        self.log.info(
            f"{type(self).__name__}: measuring {gravity} "
            f"z={redshift:.2f} m{imodel} b{ibox}"
        )
        return self.measure(gravity, redshift, imodel, ibox)
    return self._read(path)

halocat.dataloader¶

dataloader ¶

HMFRecord dataclass ¶

log10M_bin_edges instance-attribute ¶

log10M_bin_left instance-attribute ¶

log10M_bin_centre instance-attribute ¶

counts instance-attribute ¶

dndlog10M instance-attribute ¶

n_gt_M instance-attribute ¶

attrs instance-attribute ¶

gravity property ¶

redshift property ¶

imodel property ¶

ibox property ¶

box_size property ¶

HMFLoader ¶

output_filename class-attribute instance-attribute ¶

overwrite instance-attribute ¶

write_halo_hdf5 instance-attribute ¶

log instance-attribute ¶

get_grid ¶

path classmethod ¶

exists classmethod ¶

measure ¶

get ¶

XiHHPairRecord dataclass ¶

r_edges instance-attribute ¶

r instance-attribute ¶

xi instance-attribute ¶

log10M1 instance-attribute ¶

log10M2 instance-attribute ¶

n1 instance-attribute ¶

n2 instance-attribute ¶

attrs instance-attribute ¶

gravity property ¶

redshift property ¶

imodel property ¶

ibox property ¶

box_size property ¶

is_auto property ¶

XiHHRecord dataclass ¶

r_edges instance-attribute ¶

mass_bins instance-attribute ¶

pairs instance-attribute ¶

pair_indices instance-attribute ¶

r instance-attribute ¶

xi instance-attribute ¶

n1 instance-attribute ¶

n2 instance-attribute ¶

log10M1 instance-attribute ¶

log10M2 instance-attribute ¶

attrs instance-attribute ¶

gravity property ¶

redshift property ¶

imodel property ¶

ibox property ¶

box_size property ¶

XiHHLoader ¶

output_filename class-attribute instance-attribute ¶

overwrite instance-attribute ¶

write_halo_hdf5 instance-attribute ¶

log instance-attribute ¶

measure_pair ¶

get_grid ¶

path classmethod ¶

exists classmethod ¶

measure ¶

get ¶

`halocat.dataloader`¶

HMFRecord `dataclass` ¶

log10M_bin_edges `instance-attribute` ¶

log10M_bin_left `instance-attribute` ¶

log10M_bin_centre `instance-attribute` ¶

counts `instance-attribute` ¶

dndlog10M `instance-attribute` ¶

n_gt_M `instance-attribute` ¶

attrs `instance-attribute` ¶

gravity `property` ¶

redshift `property` ¶

imodel `property` ¶

ibox `property` ¶

box_size `property` ¶

output_filename `class-attribute` `instance-attribute` ¶

overwrite `instance-attribute` ¶

write_halo_hdf5 `instance-attribute` ¶

log `instance-attribute` ¶

path `classmethod` ¶

exists `classmethod` ¶

XiHHPairRecord `dataclass` ¶

r_edges `instance-attribute` ¶

r `instance-attribute` ¶

xi `instance-attribute` ¶

log10M1 `instance-attribute` ¶

log10M2 `instance-attribute` ¶

n1 `instance-attribute` ¶

n2 `instance-attribute` ¶

attrs `instance-attribute` ¶

gravity `property` ¶

redshift `property` ¶

imodel `property` ¶

ibox `property` ¶

box_size `property` ¶

is_auto `property` ¶

XiHHRecord `dataclass` ¶

r_edges `instance-attribute` ¶

mass_bins `instance-attribute` ¶

pairs `instance-attribute` ¶

pair_indices `instance-attribute` ¶

r `instance-attribute` ¶

xi `instance-attribute` ¶

n1 `instance-attribute` ¶

n2 `instance-attribute` ¶

log10M1 `instance-attribute` ¶

log10M2 `instance-attribute` ¶

attrs `instance-attribute` ¶

gravity `property` ¶

redshift `property` ¶

imodel `property` ¶

ibox `property` ¶

box_size `property` ¶

output_filename `class-attribute` `instance-attribute` ¶

overwrite `instance-attribute` ¶

write_halo_hdf5 `instance-attribute` ¶

log `instance-attribute` ¶

path `classmethod` ¶

exists `classmethod` ¶