Threshold → bin rebinning of xi_hh¶
For correlation-function modelling we need the bin-averaged differential halo–halo 2PCF \(\xi_{hh}^{\mathrm{bin}}(r; x_c, y_c)\), where \(x = \log_{10} M\) and the bin around \(x_c\) is \([x_c - \Delta\log M / 2,\, x_c + \Delta\log M / 2]\).
Measuring it directly in narrow bins gives poor signal-to-noise —
each bin only contains a fraction of the haloes. The standard trick is
to measure \(\xi_{hh}\) from cumulative mass-threshold samples (high
S/N: every sample contains all haloes above some threshold), then
rebin analytically. This page documents
halocat.xi_hh_rebin which does exactly that.
The closed-form identity¶
Define
with \(\xi_{AB}(r; x, y) := \xi_{hh}(r \,|\, n_h(\ge x), n_h(\ge y))\) the threshold-pair correlation function. The differential identity
(where \(\bar n = \mathrm{d}n_h / \mathrm{d}\log_{10} M\)) integrates over the bin \([x_-, x_+] \times [y_-, y_+]\) to the exact mixed second difference
with \(\Delta n(x_c) = n_h(\ge x_-) - n_h(\ge x_+)\) (the halo count inside the bin) and \((x_\pm) = x_c \pm \Delta \log M / 2\).
No derivative is approximated. Numerical error enters only via interpolation of \(F\) to the four bin-edge corners. This is qualitatively better than finite-differencing \(\xi_{AB}\) directly, which would amplify measurement noise.
Two entry points¶
| Function | Inputs | Output |
|---|---|---|
cumulative_to_binned_xi_hh |
raw arrays logM_in, nh_cum, xi_AB, logM_out, dlogM |
(logM_out, xi_bin) |
rebin_xi_hh_record |
XiHHRecord measured with threshold-style mass bins |
new XiHHRecord (pair-sparse) |
The record-level wrapper recovers \(n_h(\ge x)\) from the auto-pairs'
halo counts (n1) divided by box_size**3 — same haloes that went
into \(\xi\), no separate HMF needed.
Worked example (raw arrays)¶
import numpy as np
from halocat import cumulative_to_binned_xi_hh
# Threshold grid (input)
logM_in = np.linspace(12.0, 15.0, 31)
nh_cum = ... # shape (31,), strictly positive
xi_AB = ... # shape (n_r, 31, 31)
# Differential bin grid (output)
logM_out = np.array([12.5, 13.0, 13.5, 14.0])
dlogM = 0.2
_, xi_bin = cumulative_to_binned_xi_hh(
logM_in, nh_cum, xi_AB, logM_out, dlogM,
interp_method="linear", # or "cubic" for ~h^4 vs ~h^2 in the bin edge interp
)
print(xi_bin.shape) # (n_r, 4, 4)
interp_method="linear" is the sane default — log-linear in \(\log M\)
for \(n_h(\ge)\), linear-in-\((\log M, \log M)\) for \(\xi_{AB}\). Use
"cubic" if your input grid is coarse and the (analytic) \(\xi_{AB}\)
is smooth.
Worked example (halocat record)¶
If you already have a threshold-style XiHHRecord from
XiHHLoader — e.g. measured by setting
MASS_BINS to [(12.5, np.inf), (12.7, np.inf), …] — the wrapper
gives you a drop-in replacement record with the standard pair-sparse
schema:
import numpy as np
from halocat import XiHHLoader, rebin_xi_hh_record
import halocat.config as C
C.MASS_BINS = [(t, np.inf) for t in
np.round(np.arange(12.5, 14.5001, 0.1), 6)]
rec_thr = XiHHLoader(overwrite=True).get("LCDM", 0.25, imodel=1, ibox=1)
rec_bin = rebin_xi_hh_record(
rec_thr,
logM_out=np.array([12.7, 12.9, 13.1]),
dlogM=0.2,
interp_method="linear",
)
print(rec_bin.pairs) # ['M0_M0', 'M0_M1', 'M0_M2', 'M1_M1', ...]
print(rec_bin.xi.shape) # (6, n_r)
print(rec_bin.attrs["rebin_method"]) # 'cumulative_to_binned'
The returned record is never written to disk — it's an in-memory
derived view, like
XiHHLoader.measure_pair.
Edge-case policy¶
| Condition | Behaviour |
|---|---|
logM_in not strictly increasing |
ValueError |
Any nh_cum <= 0 |
ValueError |
dlogM <= 0 |
ValueError |
Bin edges outside [logM_in[0], logM_in[-1]] |
ValueError ("extend logM_in past logM_out_max + dlogM/2") |
| Any \(\Delta n(x_c) \le 0\) | ValueError |
interp_method not in {'linear', 'cubic'} |
ValueError |
xi_AB asymmetric beyond asymmetry_atol |
governed by asymmetry_policy ∈ {'warn', 'error', 'ignore'} (default 'warn'); symmetrised iff symmetrize_input=True (default) |
xi_threshold_rec.mass_bins[:, 1] not all +inf |
logger warning; proceeds (lower edges are still valid thresholds) |
The r-axis is never interpolated — only the two mass axes are rebinned.
Memory¶
Peak transient memory is \(5 \times N_M^2 \times N_r \times 8\) bytes for
the four corner cubes plus the assembled integral, on top of the
\((N_{\mathrm{in}}, N_{\mathrm{in}}, N_r)\) input cube held inside
scipy.interpolate.RegularGridInterpolator. For halocat-typical sizes
this is well under 10 MB. There is no chunking-over-r logic; the
implementation is intentionally simple.
Validation¶
tests/test_xi_hh_rebin.py validates the implementation against an
analytic linear-bias / power-law-HMF mock at three precision tiers:
| Grid | Method | Bound | Failure means |
|---|---|---|---|
| 0.01 dex, edges align | linear | rel-err < 1e-10 |
bug in physics or mixed-diff algebra |
| 0.05 dex, edges off-grid | linear | rel-err < 1e-3 |
bug in numerics layer (linear interp wired wrong) |
| 0.05 dex, edges off-grid | cubic | rel-err < 5e-6 |
bug in interp_method plumbing through to RGI |
Run with pytest tests/test_xi_hh_rebin.py -v.