Threshold → bin rebinning of `xi_hh`¶

For correlation-function modelling we need the bin-averaged differential halo–halo 2PCF \(\xi_{hh}^{\mathrm{bin}}(r; x_c, y_c)\), where \(x = \log_{10} M\) and the bin around \(x_c\) is \([x_c - \Delta\log M / 2,\, x_c + \Delta\log M / 2]\).

Measuring it directly in narrow bins gives poor signal-to-noise — each bin only contains a fraction of the haloes. The standard trick is to measure \(\xi_{hh}\) from cumulative mass-threshold samples (high S/N: every sample contains all haloes above some threshold), then rebin analytically. This page documents halocat.xi_hh_rebin which does exactly that.

The closed-form identity¶

Define

\[ F(x, y; r) \;:=\; n_h(\ge x)\, n_h(\ge y)\, [1 + \xi_{AB}(r; x, y)], \]

with \(\xi_{AB}(r; x, y) := \xi_{hh}(r \,|\, n_h(\ge x), n_h(\ge y))\) the threshold-pair correlation function. The differential identity

\[ \bar n(x)\, \bar n(y)\, [1 + \xi_{hh}(r|x,y)] \;=\; \frac{\partial^2 F}{\partial x\, \partial y} \]

(where \(\bar n = \mathrm{d}n_h / \mathrm{d}\log_{10} M\)) integrates over the bin \([x_-, x_+] \times [y_-, y_+]\) to the exact mixed second difference

\[ \Delta n(x_c)\, \Delta n(y_c)\, [1 + \xi_{hh}^{\mathrm{bin}}(r; x_c, y_c)] \;=\; F(x_-, y_-) - F(x_-, y_+) - F(x_+, y_-) + F(x_+, y_+), \]

with \(\Delta n(x_c) = n_h(\ge x_-) - n_h(\ge x_+)\) (the halo count inside the bin) and \((x_\pm) = x_c \pm \Delta \log M / 2\).

No derivative is approximated. Numerical error enters only via interpolation of \(F\) to the four bin-edge corners. This is qualitatively better than finite-differencing \(\xi_{AB}\) directly, which would amplify measurement noise.

Two entry points¶

Function	Inputs	Output
`cumulative_to_binned_xi_hh`	raw arrays `logM_in`, `nh_cum`, `xi_AB`, `logM_out`, `dlogM`	`(logM_out, xi_bin)`
`rebin_xi_hh_record`	`XiHHRecord` measured with threshold-style mass bins	new `XiHHRecord` (pair-sparse)

The record-level wrapper recovers \(n_h(\ge x)\) from the auto-pairs' halo counts (n1) divided by box_size**3 — same haloes that went into \(\xi\), no separate HMF needed.

Worked example (raw arrays)¶

import numpy as np
from halocat import cumulative_to_binned_xi_hh

# Threshold grid (input)
logM_in = np.linspace(12.0, 15.0, 31)
nh_cum  = ...                           # shape (31,), strictly positive
xi_AB   = ...                           # shape (n_r, 31, 31)

# Differential bin grid (output)
logM_out = np.array([12.5, 13.0, 13.5, 14.0])
dlogM = 0.2

_, xi_bin = cumulative_to_binned_xi_hh(
    logM_in, nh_cum, xi_AB, logM_out, dlogM,
    interp_method="linear",   # or "cubic" for ~h^4 vs ~h^2 in the bin edge interp
)
print(xi_bin.shape)            # (n_r, 4, 4)

interp_method="linear" is the sane default — log-linear in \(\log M\) for \(n_h(\ge)\), linear-in-\((\log M, \log M)\) for \(\xi_{AB}\). Use "cubic" if your input grid is coarse and the (analytic) \(\xi_{AB}\) is smooth.

Worked example (halocat record)¶

If you already have a threshold-style XiHHRecord from XiHHLoader — e.g. measured by setting MASS_BINS to [(12.5, np.inf), (12.7, np.inf), …] — the wrapper gives you a drop-in replacement record with the standard pair-sparse schema:

import numpy as np
from halocat import XiHHLoader, rebin_xi_hh_record
import halocat.config as C

C.MASS_BINS = [(t, np.inf) for t in
               np.round(np.arange(12.5, 14.5001, 0.1), 6)]

rec_thr = XiHHLoader(overwrite=True).get("LCDM", 0.25, imodel=1, ibox=1)

rec_bin = rebin_xi_hh_record(
    rec_thr,
    logM_out=np.array([12.7, 12.9, 13.1]),
    dlogM=0.2,
    interp_method="linear",
)
print(rec_bin.pairs)        # ['M0_M0', 'M0_M1', 'M0_M2', 'M1_M1', ...]
print(rec_bin.xi.shape)     # (6, n_r)
print(rec_bin.attrs["rebin_method"])   # 'cumulative_to_binned'

The returned record is never written to disk — it's an in-memory derived view, like XiHHLoader.measure_pair.

Edge-case policy¶

Condition	Behaviour
`logM_in` not strictly increasing	`ValueError`
Any `nh_cum <= 0`	`ValueError`
`dlogM <= 0`	`ValueError`
Bin edges outside `[logM_in[0], logM_in[-1]]`	`ValueError` ("extend logM_in past `logM_out_max + dlogM/2`")
Any \(\Delta n(x_c) \le 0\)	`ValueError`
`interp_method` not in `{'linear', 'cubic'}`	`ValueError`
`xi_AB` asymmetric beyond `asymmetry_atol`	governed by `asymmetry_policy` ∈ `{'warn', 'error', 'ignore'}` (default `'warn'`); symmetrised iff `symmetrize_input=True` (default)
`xi_threshold_rec.mass_bins[:, 1]` not all `+inf`	logger warning; proceeds (lower edges are still valid thresholds)

The r-axis is never interpolated — only the two mass axes are rebinned.

Memory¶

Peak transient memory is \(5 \times N_M^2 \times N_r \times 8\) bytes for the four corner cubes plus the assembled integral, on top of the \((N_{\mathrm{in}}, N_{\mathrm{in}}, N_r)\) input cube held inside scipy.interpolate.RegularGridInterpolator. For halocat-typical sizes this is well under 10 MB. There is no chunking-over-r logic; the implementation is intentionally simple.

Validation¶

tests/test_xi_hh_rebin.py validates the implementation against an analytic linear-bias / power-law-HMF mock at three precision tiers:

Grid	Method	Bound	Failure means
0.01 dex, edges align	linear	rel-err `< 1e-10`	bug in physics or mixed-diff algebra
0.05 dex, edges off-grid	linear	rel-err `< 1e-3`	bug in numerics layer (linear interp wired wrong)
0.05 dex, edges off-grid	cubic	rel-err `< 5e-6`	bug in `interp_method` plumbing through to RGI

Run with pytest tests/test_xi_hh_rebin.py -v.

Threshold → bin rebinning of xi_hh¶