Representations¶

What you call predict() on is rarely the raw quantity — each emulator targets the most learnable representation and reconstructs the physical quantity on the way out. Knowing the representation explains the accuracy behaviour and the caveats.

ΛCDM¶

Property	Emulated target	Reconstruction
`hmf`	weighted-PCA + per-component GP on n(>M)	direct
`pk_mm`	boost over EH98 linear theory	low-k limb anchored to P_lin below k = 0.03 h/Mpc, blended to fully-emulated by 0.1
`xi_mm`	EH98 ξ_lin + emulated Δξ	signed sum; ξ crosses zero near the BAO
`b_cum`	peak-height b(ν) polynomial + residual GP	Tinker-2010 anchored above log₁₀M = 14
`vel_*`	per-moment transformed central moment	see Velocity

Per-moment velocity transform¶

The velocity moments span a huge dynamic range and some change sign, so each is emulated under a per-moment transform:

log for the strictly positive moments (c20, c02, c40, c04, c22);
arcsinh (sign-preserving) for the signed moments (m10, c12, c30).

This is the single change that brought the high-order moments from interior-LOO χ ≈ 3–9 down to ≈ 0.4–0.7. The transform lives in core/emulator.py (PCAGPEmulator/PerBinGPEmulator, backward-compatible via a getattr fallback for old pickles).

f(R) — seed-paired boosts¶

The 64 f(R) design models reuse the same five initial-condition seeds as their ΛCDM twins. Differencing the matched-seed pair cancels most of the cosmic variance, so the f(R) artifacts emulate the modified-gravity boost, not the absolute quantity, and compose it onto the pinned ΛCDM artifact:

\[X_{f(R)}(\theta_5) = B(\theta_5)\, X_{\Lambda\mathrm{CDM}}(\theta_5[:4])\]

Property	Boost form	Class
`hmf`	multiplicative ratio	`MGBoostEmulator`
`pk_mm`	multiplicative ratio	`MGBoostEmulator`
`xi_mm`	additive Δξ (ξ crosses zero)	`MGBoostEmulator`
`vel_*` (7 of 8)	additive δ (moments cross zero), arcsinh	`VelMomentBoostEmulator`
`vel_m10`	direct 5-parameter GP	`VelMomentEmulator`
`b_cum`	direct 5-parameter peak-height GP	`PeakHeightEmulator`

The boost artifact embeds and pins the sha256 of its ΛCDM base, so a base retrain forces re-pin. Whether each property prefers the boost or a direct 5-parameter GP was decided empirically (fRn1_boost_evidence.py, velocity_frn1_evidence.py): the boost wins where the seed-pairing cancels variance, the direct GP wins where the target is intrinsically smooth in θ (b_cum, m10).

Why m10 and b_cum go direct

The mean infall m10 and the cumulative bias b_cum are smooth, high-S/N functions of θ; the seed-paired difference adds noise without cancelling much, so the direct five-parameter GP wins.

Anchoring and masks¶

pk_mm low-k anchor closes the frozen-seed offset (below) on large scales by tying the boost to linear theory.
ξ / ξ_hh have no working large-scale anchor — an r-space anchor and a Hankel-of-anchored-P(k) hybrid were both tested and rejected (they degrade the BAO). The frozen-seed offset therefore remains open there; see Caveats.
ξ fractional error is meaningless where |ξ| < 0.01 (the BAO zero crossing); the suite masks there and quotes χ vs the across-box SEM instead.