Skip to content

Representations

What you call predict() on is rarely the raw quantity — each emulator targets the most learnable representation and reconstructs the physical quantity on the way out. Knowing the representation explains the accuracy behaviour and the caveats.

ΛCDM

Property Emulated target Reconstruction
hmf weighted-PCA + per-component GP on n(>M) direct
pk_mm boost over EH98 linear theory low-k limb anchored to P_lin below k = 0.03 h/Mpc, blended to fully-emulated by 0.1
xi_mm EH98 ξ_lin + emulated Δξ signed sum; ξ crosses zero near the BAO
b_cum peak-height b(ν) polynomial + residual GP Tinker-2010 anchored above log₁₀M = 14
vel_* per-moment transformed central moment see Velocity

Per-moment velocity transform

The velocity moments span a huge dynamic range and some change sign, so each is emulated under a per-moment transform:

  • log for the strictly positive moments (c20, c02, c40, c04, c22);
  • arcsinh (sign-preserving) for the signed moments (m10, c12, c30).

This is the single change that brought the high-order moments from interior-LOO χ ≈ 3–9 down to ≈ 0.4–0.7. The transform lives in core/emulator.py (PCAGPEmulator/PerBinGPEmulator, backward-compatible via a getattr fallback for old pickles).

f(R) — seed-paired boosts

The 64 f(R) design models reuse the same five initial-condition seeds as their ΛCDM twins. Differencing the matched-seed pair cancels most of the cosmic variance, so the f(R) artifacts emulate the modified-gravity boost, not the absolute quantity, and compose it onto the pinned ΛCDM artifact:

\[X_{f(R)}(\theta_5) = B(\theta_5)\, X_{\Lambda\mathrm{CDM}}(\theta_5[:4])\]
Property Boost form Class
hmf multiplicative ratio MGBoostEmulator
pk_mm multiplicative ratio MGBoostEmulator
xi_mm additive Δξ (ξ crosses zero) MGBoostEmulator
vel_* (7 of 8) additive δ (moments cross zero), arcsinh VelMomentBoostEmulator
vel_m10 direct 5-parameter GP VelMomentEmulator
b_cum direct 5-parameter peak-height GP PeakHeightEmulator

The boost artifact embeds and pins the sha256 of its ΛCDM base, so a base retrain forces re-pin. Whether each property prefers the boost or a direct 5-parameter GP was decided empirically (fRn1_boost_evidence.py, velocity_frn1_evidence.py): the boost wins where the seed-pairing cancels variance, the direct GP wins where the target is intrinsically smooth in θ (b_cum, m10).

Why m10 and b_cum go direct

The mean infall m10 and the cumulative bias b_cum are smooth, high-S/N functions of θ; the seed-paired difference adds noise without cancelling much, so the direct five-parameter GP wins.

Anchoring and masks

  • pk_mm low-k anchor closes the frozen-seed offset (below) on large scales by tying the boost to linear theory.
  • ξ / ξ_hh have no working large-scale anchor — an r-space anchor and a Hankel-of-anchored-P(k) hybrid were both tested and rejected (they degrade the BAO). The frozen-seed offset therefore remains open there; see Caveats.
  • ξ fractional error is meaningless where |ξ| < 0.01 (the BAO zero crossing); the suite masks there and quotes χ vs the across-box SEM instead.