Skip to content

halocat.xi_hh_rebin

xi_hh_rebin

Threshold→bin rebinning of the halo–halo 2PCF.

Convert ξ_hh measured from cumulative mass-threshold samples,

ξ_AB(r; x, y) := ξ_hh(r | n_h(>=x), n_h(>=y)),     x = log10 M

into the bin-averaged differential

ξ_hh^bin(r; x_c, y_c)
    := ξ_hh(r | x_c ± dlogM/2, y_c ± dlogM/2)

via the exact closed-form mixed-difference identity

Dn(x_c) Dn(y_c) [1 + ξ_hh^bin(r; x_c, y_c)]
    = F(x-, y-) - F(x-, y+) - F(x+, y-) + F(x+, y+),

with F(x, y; r) := n_h(>=x) n_h(>=y) [1 + ξ_AB(r; x, y)] and Dn(x_c) := n_h(>=x-) - n_h(>=x+). The identity is exact given F; numerical error enters only through interpolation of F to the four bin edges. There is no derivative approximation.

The submodule is split into three layers:

  • physics — :func:_mixed_difference, the closed-form algebra on already-evaluated F arrays.
  • numerics — :func:_interp_nh_cum, :func:_interp_xi_AB_at_edges, the interpolation of n_h(>=) and ξ_AB onto bin edges.
  • driver — :func:cumulative_to_binned_xi_hh (raw arrays) and :func:rebin_xi_hh_record (halocat-native XiHHRecord wrapper).

cumulative_to_binned_xi_hh

cumulative_to_binned_xi_hh(logM_in, nh_cum, xi_AB, logM_out, dlogM, *, interp_method: Literal['linear', 'cubic'] = 'linear', symmetrize_input: bool = True, symmetrize_output: bool = True, asymmetry_atol: float = 0.001, asymmetry_policy: _AsymmetryPolicy = 'warn') -> tuple[np.ndarray, np.ndarray]

Rebin ξ_hh from cumulative threshold samples to differential bins.

Implements the closed-form mixed-difference identity

.. math::

\Delta n(x_c) \Delta n(y_c) [1 + \xi^{bin}{hh}(r; x_c, y_c)] = F(x-, y_-) - F(x_-, y_+) - F(x_+, y_-) + F(x_+, y_+),

with F(x, y; r) = n_h(>=x) n_h(>=y) [1 + \xi_{AB}(r; x, y)].

Parameters:

Name Type Description Default
logM_in array_like

Shape (N_in,). Strictly increasing grid of input log10 mass thresholds.

required
nh_cum array_like

Shape (N_in,). Cumulative halo number density n_h(>= logM_in); strictly positive. Units (Mpc/h)^-3.

required
xi_AB array_like

Shape (N_r, N_in, N_in). Threshold-pair correlation function ξ_hh(r | n_h(>=logM_in[i]), n_h(>=logM_in[j])). Symmetric in the last two axes up to noise; see asymmetry_policy.

required
logM_out array_like

Shape (N_M,). Centres of the output mass bins.

required
dlogM float

Output bin full width (dex). The bin around logM_out[k] is [logM_out[k] - dlogM/2, logM_out[k] + dlogM/2].

required
interp_method ('linear', 'cubic')

Method used to interpolate ξ_AB onto the bin edges via :class:scipy.interpolate.RegularGridInterpolator. n_h(>=) is always interpolated log-linearly in logM. Default: "linear".

'linear'
symmetrize_input bool

If True, replace ξ_AB with 0.5 * (ξ_AB + ξ_AB.swapaxes(1, 2)) before processing. Default: True.

True
symmetrize_output bool

Apply the same symmetrisation to the returned xi_bin. Default: True.

True
asymmetry_atol float

Absolute tolerance on max|ξ_AB - ξ_AB.T| / max|ξ_AB| used when asymmetry_policy != 'ignore'. Default: 1e-3.

0.001
asymmetry_policy ('warn', 'error', 'ignore')

Action when ξ_AB is asymmetric above asymmetry_atol: 'warn' logs via the halocat.xi_hh_rebin logger and proceeds, 'error' raises ValueError, 'ignore' skips the check entirely. Default: "warn".

'warn'

Returns:

Name Type Description
logM_out ndarray

Shape (N_M,). Echo of the input bin centres.

xi_bin ndarray

Shape (N_r, N_M, N_M). Bin-averaged differential ξ_hh.

Raises:

Type Description
ValueError

If logM_in is not strictly increasing, nh_cum has any non-positive entry, dlogM <= 0, output bin edges fall outside [logM_in[0], logM_in[-1]], any Dn(x_c) <= 0, interp_method is unrecognised, or asymmetry_policy triggers an error.

Notes

The r-axis is never interpolated by this function. Only the two mass axes are rebinned.

The implementation is exact in the closed form: the only numerical approximation is the interpolation of F to the four bin edges.

See Also

rebin_xi_hh_record : halocat-native wrapper that takes an XiHHRecord and returns another XiHHRecord.

Source code in halocat/xi_hh_rebin.py
def cumulative_to_binned_xi_hh(
    logM_in,
    nh_cum,
    xi_AB,
    logM_out,
    dlogM,
    *,
    interp_method: Literal["linear", "cubic"] = "linear",
    symmetrize_input: bool = True,
    symmetrize_output: bool = True,
    asymmetry_atol: float = 1e-3,
    asymmetry_policy: _AsymmetryPolicy = "warn",
) -> tuple[np.ndarray, np.ndarray]:
    """Rebin ξ_hh from cumulative threshold samples to differential bins.

    Implements the closed-form mixed-difference identity

    .. math::

       \\Delta n(x_c) \\Delta n(y_c) [1 + \\xi^{bin}_{hh}(r; x_c, y_c)]
            = F(x_-, y_-) - F(x_-, y_+) - F(x_+, y_-) + F(x_+, y_+),

    with ``F(x, y; r) = n_h(>=x) n_h(>=y) [1 + \\xi_{AB}(r; x, y)]``.

    Parameters
    ----------
    logM_in : array_like
        Shape ``(N_in,)``. Strictly increasing grid of input log10
        mass thresholds.
    nh_cum : array_like
        Shape ``(N_in,)``. Cumulative halo number density
        ``n_h(>= logM_in)``; strictly positive. Units (Mpc/h)^-3.
    xi_AB : array_like
        Shape ``(N_r, N_in, N_in)``. Threshold-pair correlation
        function ``ξ_hh(r | n_h(>=logM_in[i]), n_h(>=logM_in[j]))``.
        Symmetric in the last two axes up to noise; see
        ``asymmetry_policy``.
    logM_out : array_like
        Shape ``(N_M,)``. Centres of the output mass bins.
    dlogM : float
        Output bin full width (dex). The bin around ``logM_out[k]`` is
        ``[logM_out[k] - dlogM/2, logM_out[k] + dlogM/2]``.
    interp_method : {'linear', 'cubic'}, optional
        Method used to interpolate ξ_AB onto the bin edges via
        :class:`scipy.interpolate.RegularGridInterpolator`.
        ``n_h(>=)`` is always interpolated log-linearly in ``logM``.
        Default: ``"linear"``.
    symmetrize_input : bool, optional
        If True, replace ``ξ_AB`` with ``0.5 * (ξ_AB + ξ_AB.swapaxes(1, 2))``
        before processing. Default: True.
    symmetrize_output : bool, optional
        Apply the same symmetrisation to the returned ``xi_bin``.
        Default: True.
    asymmetry_atol : float, optional
        Absolute tolerance on ``max|ξ_AB - ξ_AB.T| / max|ξ_AB|`` used
        when ``asymmetry_policy != 'ignore'``. Default: ``1e-3``.
    asymmetry_policy : {'warn', 'error', 'ignore'}, optional
        Action when ``ξ_AB`` is asymmetric above ``asymmetry_atol``:
        ``'warn'`` logs via the ``halocat.xi_hh_rebin`` logger and
        proceeds, ``'error'`` raises ``ValueError``, ``'ignore'``
        skips the check entirely. Default: ``"warn"``.

    Returns
    -------
    logM_out : numpy.ndarray
        Shape ``(N_M,)``. Echo of the input bin centres.
    xi_bin : numpy.ndarray
        Shape ``(N_r, N_M, N_M)``. Bin-averaged differential ξ_hh.

    Raises
    ------
    ValueError
        If ``logM_in`` is not strictly increasing, ``nh_cum`` has any
        non-positive entry, ``dlogM <= 0``, output bin edges fall
        outside ``[logM_in[0], logM_in[-1]]``, any ``Dn(x_c) <= 0``,
        ``interp_method`` is unrecognised, or ``asymmetry_policy``
        triggers an error.

    Notes
    -----
    The r-axis is **never** interpolated by this function. Only the
    two mass axes are rebinned.

    The implementation is exact in the closed form: the only numerical
    approximation is the interpolation of F to the four bin edges.

    See Also
    --------
    rebin_xi_hh_record :
        halocat-native wrapper that takes an ``XiHHRecord`` and returns
        another ``XiHHRecord``.
    """
    logM_in = np.ascontiguousarray(logM_in, dtype=np.float64)
    nh_cum = np.ascontiguousarray(nh_cum, dtype=np.float64)
    xi_AB = np.ascontiguousarray(xi_AB, dtype=np.float64)
    logM_out = np.ascontiguousarray(logM_out, dtype=np.float64)
    dlogM = float(dlogM)

    # ---- shape & sanity checks ----
    if logM_in.ndim != 1:
        raise ValueError("logM_in must be 1-D")
    N_in = logM_in.size
    if nh_cum.shape != (N_in,):
        raise ValueError(
            f"nh_cum must have shape (N_in,) = ({N_in},); got {nh_cum.shape}"
        )
    if xi_AB.ndim != 3 or xi_AB.shape[1:] != (N_in, N_in):
        raise ValueError(
            f"xi_AB must have shape (N_r, N_in, N_in); got {xi_AB.shape} "
            f"with N_in={N_in}"
        )
    if not np.all(np.diff(logM_in) > 0):
        raise ValueError("logM_in must be strictly increasing")
    if np.any(nh_cum <= 0):
        raise ValueError("nh_cum must be strictly positive everywhere")
    if dlogM <= 0:
        raise ValueError("dlogM must be > 0")
    if interp_method not in ("linear", "cubic"):
        raise ValueError(
            f"interp_method must be 'linear' or 'cubic'; got {interp_method!r}"
        )

    _check_symmetry(xi_AB, asymmetry_atol, asymmetry_policy)
    if symmetrize_input:
        xi_AB = 0.5 * (xi_AB + np.swapaxes(xi_AB, 1, 2))

    # ---- bin edges ----
    half = 0.5 * dlogM
    edges_lo = logM_out - half
    edges_hi = logM_out + half
    eps = 1e-12 * (logM_in[-1] - logM_in[0])
    if (edges_lo.min() < logM_in[0] - eps
            or edges_hi.max() > logM_in[-1] + eps):
        raise ValueError(
            f"output bin edges [{edges_lo.min():.4f}, {edges_hi.max():.4f}] "
            f"fall outside logM_in range "
            f"[{logM_in[0]:.4f}, {logM_in[-1]:.4f}]; "
            f"extend logM_in past logM_out_max + dlogM/2"
        )

    # ---- n_h(>=) at bin edges (log-linear in logM) ----
    nh_lo = _interp_nh_cum(logM_in, nh_cum, edges_lo)
    nh_hi = _interp_nh_cum(logM_in, nh_cum, edges_hi)
    dnh = nh_lo - nh_hi
    if np.any(dnh <= 0):
        raise ValueError(
            "non-positive halo count Dn(x_c) in some bin — check the "
            "input HMF or widen the output bins"
        )

    # ---- xi_AB at the four bin-edge corners ----
    xi_ll, xi_lh, xi_hl, xi_hh_ = _interp_xi_AB_at_edges(
        logM_in, xi_AB, edges_lo, edges_hi, method=interp_method,
    )

    # ---- assemble F at corners ----
    nlo_x = nh_lo[:, None, None]
    nlo_y = nh_lo[None, :, None]
    nhi_x = nh_hi[:, None, None]
    nhi_y = nh_hi[None, :, None]

    F_ll = nlo_x * nlo_y * (1.0 + xi_ll)
    F_lh = nlo_x * nhi_y * (1.0 + xi_lh)
    F_hl = nhi_x * nlo_y * (1.0 + xi_hl)
    F_hh = nhi_x * nhi_y * (1.0 + xi_hh_)

    # ---- physics: closed-form mixed second difference ----
    xi_bin_xyR = _mixed_difference(F_ll, F_lh, F_hl, F_hh, dnh, dnh)

    xi_bin = np.moveaxis(xi_bin_xyR, -1, 0)                  # (N_r, N_M, N_M)

    if symmetrize_output:
        xi_bin = 0.5 * (xi_bin + np.swapaxes(xi_bin, 1, 2))

    return logM_out, xi_bin

rebin_xi_hh_record

rebin_xi_hh_record(xi_threshold_rec: XiHHRecord, logM_out, dlogM, *, interp_method: Literal['linear', 'cubic'] = 'linear', symmetrize_input: bool = True, symmetrize_output: bool = True, asymmetry_atol: float = 0.001, asymmetry_policy: _AsymmetryPolicy = 'warn') -> XiHHRecord

Rebin a threshold-style XiHHRecord into a differential one.

halocat-native wrapper around :func:cumulative_to_binned_xi_hh. Densifies the input record's pair-sparse storage into a ξ_AB cube, recovers n_h(>=) from the auto-pair halo counts n1, runs the closed-form rebin, then repackages the result into a new XiHHRecord with the standard pair-sparse layout. Never writes to disk.

Parameters:

Name Type Description Default
xi_threshold_rec XiHHRecord

Record measured with threshold-style mass bins. Each row of mass_bins is interpreted as (log10M_threshold, +inf); only the lower edge is used as the threshold. A finite upper edge triggers a one-time logger warning.

required
logM_out array_like

Shape (N_M,). Centres of the output (differential) mass bins.

required
dlogM float

Output bin full width (dex).

required
interp_method ('linear', 'cubic')

Forwarded to :func:cumulative_to_binned_xi_hh. Default "linear".

'linear'
symmetrize_input bool

Forwarded. Default True for both.

True
symmetrize_output bool

Forwarded. Default True for both.

True
asymmetry_atol float

Forwarded. Default 1e-3.

0.001
asymmetry_policy ('warn', 'error', 'ignore')

Forwarded. Default "warn".

'warn'

Returns:

Name Type Description
rebinned XiHHRecord

New record with:

  • r_edges and r carried over from the input.
  • mass_bins set to [(c - dlogM/2, c + dlogM/2) for c in logM_out].
  • pairs / pair_indices covering the upper triangle i ≤ j of the new mass-bin grid.
  • xi shape (N_pairs, N_r); same convention as :class:XiHHRecord from the on-disk reader.
  • n1 / n2 derived from the rebinned halo counts.
  • log10M1 / log10M2 set to per-pair edges.
  • attrs copied from the input and extended with rebin_method, rebin_interp, and rebin_dlogM.

Raises:

Type Description
ValueError

Same conditions as :func:cumulative_to_binned_xi_hh, plus if any auto-pair M{i}_M{i} is missing from xi_threshold_rec (its n1 is needed to recover n_h(>=)) or its n1 is non-positive.

See Also

cumulative_to_binned_xi_hh : underlying raw-array driver.

Source code in halocat/xi_hh_rebin.py
def rebin_xi_hh_record(
    xi_threshold_rec: XiHHRecord,
    logM_out,
    dlogM,
    *,
    interp_method: Literal["linear", "cubic"] = "linear",
    symmetrize_input: bool = True,
    symmetrize_output: bool = True,
    asymmetry_atol: float = 1e-3,
    asymmetry_policy: _AsymmetryPolicy = "warn",
) -> XiHHRecord:
    """Rebin a threshold-style ``XiHHRecord`` into a differential one.

    halocat-native wrapper around :func:`cumulative_to_binned_xi_hh`.
    Densifies the input record's pair-sparse storage into a ``ξ_AB``
    cube, recovers ``n_h(>=)`` from the auto-pair halo counts ``n1``,
    runs the closed-form rebin, then repackages the result into a new
    ``XiHHRecord`` with the standard pair-sparse layout. Never writes
    to disk.

    Parameters
    ----------
    xi_threshold_rec : XiHHRecord
        Record measured with threshold-style mass bins. Each row of
        ``mass_bins`` is interpreted as ``(log10M_threshold, +inf)``;
        only the lower edge is used as the threshold. A finite upper
        edge triggers a one-time logger warning.
    logM_out : array_like
        Shape ``(N_M,)``. Centres of the output (differential) mass
        bins.
    dlogM : float
        Output bin full width (dex).
    interp_method : {'linear', 'cubic'}, optional
        Forwarded to :func:`cumulative_to_binned_xi_hh`. Default
        ``"linear"``.
    symmetrize_input, symmetrize_output : bool, optional
        Forwarded. Default True for both.
    asymmetry_atol : float, optional
        Forwarded. Default ``1e-3``.
    asymmetry_policy : {'warn', 'error', 'ignore'}, optional
        Forwarded. Default ``"warn"``.

    Returns
    -------
    rebinned : XiHHRecord
        New record with:

        - ``r_edges`` and ``r`` carried over from the input.
        - ``mass_bins`` set to
          ``[(c - dlogM/2, c + dlogM/2) for c in logM_out]``.
        - ``pairs`` / ``pair_indices`` covering the upper triangle
          ``i ≤ j`` of the new mass-bin grid.
        - ``xi`` shape ``(N_pairs, N_r)``; same convention as
          :class:`XiHHRecord` from the on-disk reader.
        - ``n1`` / ``n2`` derived from the rebinned halo counts.
        - ``log10M1`` / ``log10M2`` set to per-pair edges.
        - ``attrs`` copied from the input and extended with
          ``rebin_method``, ``rebin_interp``, and ``rebin_dlogM``.

    Raises
    ------
    ValueError
        Same conditions as :func:`cumulative_to_binned_xi_hh`, plus
        if any auto-pair ``M{i}_M{i}`` is missing from
        ``xi_threshold_rec`` (its ``n1`` is needed to recover
        ``n_h(>=)``) or its ``n1`` is non-positive.

    See Also
    --------
    cumulative_to_binned_xi_hh : underlying raw-array driver.
    """
    if xi_threshold_rec.mass_bins.size == 0:
        raise ValueError("xi_threshold_rec has no mass_bins")

    mass_bins_in = np.asarray(xi_threshold_rec.mass_bins, dtype=np.float64)
    logM_in = mass_bins_in[:, 0]

    upper = mass_bins_in[:, 1]
    if not np.all(np.isinf(upper) & (upper > 0)):
        _log.warning(
            "xi_threshold_rec.mass_bins has finite upper edges %s; "
            "treating each lower edge as a threshold but the upstream "
            "ξ may not be a true cumulative-threshold sample",
            list(upper),
        )

    xi_AB, n1_thr = _densify_xi_record(xi_threshold_rec)

    if np.any(n1_thr <= 0):
        missing = [int(i) for i, n in enumerate(n1_thr) if n <= 0]
        raise ValueError(
            f"xi_threshold_rec is missing auto-pair n1 for threshold "
            f"index(es) {missing}; cannot recover n_h(>=)"
        )

    box_size = float(xi_threshold_rec.attrs.get("box_size", np.nan))
    if not np.isfinite(box_size) or box_size <= 0:
        raise ValueError(
            "xi_threshold_rec.attrs['box_size'] is missing or non-positive; "
            "needed to convert auto-pair n1 into n_h(>=)"
        )
    volume = box_size ** 3
    nh_cum = n1_thr.astype(np.float64) / volume

    # NaN entries in xi_AB are expected from the upstream measurement
    # at small r where pair bins are empty. They propagate naturally
    # through the closed-form mixed difference so the output ξ_bin is
    # NaN in those bins — which is the correct signal for "no
    # information here". Just log once.
    if np.any(np.isnan(xi_AB)):
        _log.info(
            "xi_threshold_rec.xi has %d NaN entries (likely empty pair "
            "bins from the upstream measurement); they will propagate "
            "through the rebin and the output ξ_bin is NaN in those bins",
            int(np.isnan(xi_AB).sum()),
        )

    logM_out_arr = np.ascontiguousarray(logM_out, dtype=np.float64)
    _, xi_bin = cumulative_to_binned_xi_hh(
        logM_in, nh_cum, xi_AB, logM_out_arr, dlogM,
        interp_method=interp_method,
        symmetrize_input=symmetrize_input,
        symmetrize_output=symmetrize_output,
        asymmetry_atol=asymmetry_atol,
        asymmetry_policy=asymmetry_policy,
    )

    # Recover bin halo counts (Dn) at the new bin edges, for n1/n2.
    half = 0.5 * float(dlogM)
    edges_lo = logM_out_arr - half
    edges_hi = logM_out_arr + half
    nh_lo = _interp_nh_cum(logM_in, nh_cum, edges_lo)
    nh_hi = _interp_nh_cum(logM_in, nh_cum, edges_hi)
    n_per_bin = np.rint((nh_lo - nh_hi) * volume).astype(np.int64)

    # ---- repackage as a new XiHHRecord ----
    N_M = logM_out_arr.size
    new_mass_bins = np.column_stack([edges_lo, edges_hi])

    pair_keys: list[str] = []
    pair_indices: list[tuple[int, int]] = []
    rows_xi = []
    rows_log10M1 = []
    rows_log10M2 = []
    rows_n1 = []
    rows_n2 = []

    for i in range(N_M):
        for j in range(i, N_M):
            pair_keys.append(f"M{i}_M{j}")
            pair_indices.append((i, j))
            rows_xi.append(xi_bin[:, i, j])
            rows_log10M1.append(new_mass_bins[i])
            rows_log10M2.append(new_mass_bins[j])
            rows_n1.append(n_per_bin[i])
            rows_n2.append(n_per_bin[j])

    K = xi_threshold_rec.r_edges.size - 1 if xi_threshold_rec.r_edges.size \
        else xi_bin.shape[0]
    if xi_threshold_rec.r_edges.size:
        r_centres = 0.5 * (xi_threshold_rec.r_edges[:-1]
                           + xi_threshold_rec.r_edges[1:])
        new_r = np.broadcast_to(r_centres[None, :], (len(pair_keys), K)).copy()
    else:
        new_r = np.empty((len(pair_keys), 0))

    new_attrs = dict(xi_threshold_rec.attrs)
    new_attrs.update({
        "rebin_method": "cumulative_to_binned",
        "rebin_interp": interp_method,
        "rebin_dlogM": float(dlogM),
    })

    return replace(
        xi_threshold_rec,
        mass_bins=new_mass_bins,
        pairs=pair_keys,
        pair_indices=np.array(pair_indices, dtype=int),
        r=new_r,
        xi=np.stack(rows_xi).astype(np.float64),
        n1=np.asarray(rows_n1, dtype=int),
        n2=np.asarray(rows_n2, dtype=int),
        log10M1=np.stack(rows_log10M1),
        log10M2=np.stack(rows_log10M2),
        attrs=new_attrs,
    )