halocat.dataloader¶
dataloader
¶
Lazy load-or-measure dataloaders for halo statistics.
Provides:
- HMFLoader → HMFRecord for hmf.hdf5
- XiHHLoader → XiHHRecord for xi_hh.hdf5
In each case loader.get(gravity, redshift, imodel, ibox) reads the
existing file if present, otherwise runs the corresponding stage of
halocat.pipeline.run_single to measure it on demand.
Both classes inherit the shared load-or-measure plumbing (constructor,
path / exists / get / measure) from
:class:_BaseHaloStatLoader. The per-statistic get_grid stays in
each subclass because the stacked-array shape varies.
HMFRecord
dataclass
¶
HMFRecord(log10M_bin_edges: ndarray, log10M_bin_left: ndarray, log10M_bin_centre: ndarray, counts: ndarray, dndlog10M: ndarray, n_gt_M: ndarray, attrs: dict)
In-memory view of a single hmf.hdf5 file.
HMFLoader
¶
Bases: _BaseHaloStatLoader
Load-or-measure dataloader for the halo mass function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
overwrite
|
bool
|
If True, always re-measure even when |
False
|
write_halo_hdf5
|
bool
|
Force-rewrite |
False
|
logger
|
Logger or None
|
Optional logger. A default one is created if omitted. |
None
|
Examples:
>>> loader = HMFLoader()
>>> rec = loader.get("LCDM", 0.25, imodel=1, ibox=1)
>>> rec.dndlog10M[:3]
array([...])
>>> grid = loader.get_grid(
... gravities=["LCDM"], redshifts=[0.25, 0.50],
... imodels=[1, 2], iboxes=[1],
... )
>>> grid["dndlog10M"].shape
(1, 2, 2, 1, 30)
Source code in halocat/dataloader.py
get_grid
¶
get_grid(gravities: list[str] | None = None, redshifts: list[float] | None = None, imodels: list[int] | None = None, iboxes: list[int] | None = None, skip_missing: bool = False) -> dict
Load (or measure) HMFs across a sub-grid and stack them.
Returns a dict with axis labels, the shared mass binning (edges,
left edges, and centres), and arrays of shape (G, Z, M, B, K)
for dndlog10M / counts / n_gt_M, plus a bool present
mask of shape (G, Z, M, B).
With the default skip_missing=False and overwrite=False, any
missing realisation is measured on demand. Set skip_missing=True
to leave gaps as NaN instead of triggering measurement.
Source code in halocat/dataloader.py
path
classmethod
¶
exists
classmethod
¶
measure
¶
Force measurement and return the record.
Source code in halocat/dataloader.py
get
¶
Return the record, measuring on demand if missing.
Source code in halocat/dataloader.py
XiHHPairRecord
dataclass
¶
XiHHPairRecord(r_edges: ndarray, r: ndarray, xi: ndarray, log10M1: ndarray, log10M2: ndarray, n1: int, n2: int, attrs: dict)
In-memory ξ_hh for one user-supplied mass-bin pair (never on disk).
Returned by :py:meth:XiHHLoader.measure_pair. Unlike
:class:XiHHRecord, which collects all auto and cross combinations
from the static grid stored in xi_hh.hdf5, this record holds one
correlation for an arbitrary pair of finite-width mass bins
(log10M1, log10M2) chosen at call time.
Attributes:
| Name | Type | Description |
|---|---|---|
r_edges |
ndarray
|
Shape |
r |
ndarray
|
Shape |
xi |
ndarray
|
Shape |
log10M1, log10M2 |
ndarray
|
Shape |
n1, n2 |
int
|
Halo counts selected in |
attrs |
dict
|
Realisation metadata: |
Notes
gravity / redshift / imodel / ibox / box_size
are also exposed as read-only properties for convenience.
is_auto returns True iff log10M1 == log10M2.
XiHHRecord
dataclass
¶
XiHHRecord(r_edges: ndarray, mass_bins: ndarray, pairs: list, pair_indices: ndarray, r: ndarray, xi: ndarray, n1: ndarray, n2: ndarray, log10M1: ndarray, log10M2: ndarray, attrs: dict)
In-memory view of a single xi_hh.hdf5 file.
Attributes:
| Name | Type | Description |
|---|---|---|
r_edges |
(K+1,) separation bin edges (Mpc/h).
|
|
mass_bins |
(P, 2) array of (log10M_low, log10M_high) for each mass bin
|
the file was measured for. |
pairs |
list of group names, e.g. ['M0_M0', 'M0_M1', ...].
|
|
pair_indices |
(Q, 2) int array of the (i, j) mass-bin indices for each
|
pair group, with i ≤ j (auto + cross). |
r |
(Q, K) per-pair separation centres (or sepavg from pycorr).
|
|
xi |
(Q, K) per-pair correlation function.
|
|
n1, n2 |
(Q,) int counts of haloes used in each side of each pair.
|
|
log10M1, log10M2 |
(Q, 2) per-pair mass-bin edges.
|
|
attrs |
top-level HDF5 attributes.
|
|
XiHHLoader
¶
Bases: _BaseHaloStatLoader
Load-or-measure dataloader for the halo–halo 2PCF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
overwrite
|
bool
|
If True, always re-measure even when |
False
|
write_halo_hdf5
|
bool
|
Force-rewrite |
False
|
logger
|
Logger or None
|
Optional logger. |
None
|
Examples:
>>> loader = XiHHLoader()
>>> rec = loader.get("LCDM", 0.25, imodel=1, ibox=1)
>>> rec.xi.shape # (n_pairs, n_r_bins)
(10, 60)
>>> rec.pairs[:3]
['M0_M0', 'M0_M1', 'M0_M2']
Source code in halocat/dataloader.py
measure_pair
¶
measure_pair(gravity: str, redshift: float, imodel: int, ibox: int, log10M1: tuple[float, float], log10M2: tuple[float, float] | None = None, *, r_edges: ndarray | None = None) -> XiHHPairRecord
Measure ξ_hh(r) for one custom mass-bin pair, in-memory only.
Unlike :py:meth:get and :py:meth:measure, this method never
writes to disk and never reads xi_hh.hdf5. It loads the halo
catalogue, applies the requested (log10M1, log10M2)
selection, and runs pycorr for that single pair only. The
halo catalogue is read from halo.hdf5; if the HDF5 mirror is
missing, the source CatshortV.*.DAT is reformatted on demand
(the .DAT itself is never modified).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gravity
|
str
|
One of :data: |
required |
redshift
|
float
|
Must be a key of :data: |
required |
imodel
|
int
|
Cosmological model index (1..64). |
required |
ibox
|
int
|
Realisation index (1..5). |
required |
log10M1
|
tuple of (float, float)
|
|
required |
log10M2
|
tuple of (float, float) or None
|
Edges for the second sample. |
None
|
r_edges
|
ndarray
|
Separation bin edges (Mpc/h). Defaults to
:data: |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
record |
XiHHPairRecord
|
In-memory record carrying |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
Examples:
Auto-correlation of a single finite-width bin:
>>> loader = XiHHLoader()
>>> rec = loader.measure_pair(
... "LCDM", 0.25, imodel=1, ibox=1,
... log10M1=(13.0, 13.5),
... )
>>> rec.is_auto, rec.n1, rec.xi.shape
(True, 87587, (60,))
Cross-correlation between two non-overlapping bins, with custom
r_edges:
>>> import numpy as np
>>> rec_x = loader.measure_pair(
... "LCDM", 0.25, 1, 1,
... log10M1=(13.0, 13.3),
... log10M2=(13.7, 14.0),
... r_edges=np.logspace(-1, 2, 31),
... )
>>> rec_x.is_auto
False
Source code in halocat/dataloader.py
524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 | |
get_grid
¶
get_grid(gravities: list[str] | None = None, redshifts: list[float] | None = None, imodels: list[int] | None = None, iboxes: list[int] | None = None, skip_missing: bool = False) -> dict
Load (or measure) xi_hh across a sub-grid and stack.
Returns a dict with axis labels, the shared r_edges / mass_bins
/ pairs metadata, and arrays of shape (G, Z, M, B, P, K) for
r / xi, plus n1 / n2 of shape (G, Z, M, B, P) and
a present mask of shape (G, Z, M, B).
Source code in halocat/dataloader.py
668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 | |
path
classmethod
¶
exists
classmethod
¶
measure
¶
Force measurement and return the record.
Source code in halocat/dataloader.py
get
¶
Return the record, measuring on demand if missing.