Representations¶
What you call predict() on is rarely the raw quantity — each emulator targets
the most learnable representation and reconstructs the physical quantity on the
way out. Knowing the representation explains the accuracy behaviour and the
caveats.
ΛCDM¶
| Property | Emulated target | Reconstruction |
|---|---|---|
hmf |
weighted-PCA + per-component GP on n(>M) | direct |
pk_mm |
boost over EH98 linear theory | low-k limb anchored to P_lin below k = 0.03 h/Mpc, blended to fully-emulated by 0.1 |
xi_mm |
EH98 ξ_lin + emulated Δξ | signed sum; ξ crosses zero near the BAO |
b_cum |
peak-height b(ν) polynomial + residual GP | Tinker-2010 anchored above log₁₀M = 14 |
vel_* |
per-moment transformed central moment | see Velocity |
Per-moment velocity transform¶
The velocity moments span a huge dynamic range and some change sign, so each is emulated under a per-moment transform:
- log for the strictly positive moments (
c20,c02,c40,c04,c22); - arcsinh (sign-preserving) for the signed moments (
m10,c12,c30).
This is the single change that brought the high-order moments from interior-LOO
χ ≈ 3–9 down to ≈ 0.4–0.7. The transform lives in core/emulator.py
(PCAGPEmulator/PerBinGPEmulator, backward-compatible via a getattr fallback
for old pickles).
f(R) — seed-paired boosts¶
The 64 f(R) design models reuse the same five initial-condition seeds as their ΛCDM twins. Differencing the matched-seed pair cancels most of the cosmic variance, so the f(R) artifacts emulate the modified-gravity boost, not the absolute quantity, and compose it onto the pinned ΛCDM artifact:
| Property | Boost form | Class |
|---|---|---|
hmf |
multiplicative ratio | MGBoostEmulator |
pk_mm |
multiplicative ratio | MGBoostEmulator |
xi_mm |
additive Δξ (ξ crosses zero) | MGBoostEmulator |
vel_* (7 of 8) |
additive δ (moments cross zero), arcsinh | VelMomentBoostEmulator |
vel_m10 |
direct 5-parameter GP | VelMomentEmulator |
b_cum |
direct 5-parameter peak-height GP | PeakHeightEmulator |
The boost artifact embeds and pins the sha256 of its ΛCDM base, so a base
retrain forces re-pin. Whether each property prefers the boost or a direct
5-parameter GP was decided empirically (fRn1_boost_evidence.py,
velocity_frn1_evidence.py): the boost wins where the seed-pairing cancels
variance, the direct GP wins where the target is intrinsically smooth in θ
(b_cum, m10).
Why m10 and b_cum go direct
The mean infall m10 and the cumulative bias b_cum are smooth, high-S/N
functions of θ; the seed-paired difference adds noise without cancelling
much, so the direct five-parameter GP wins.
Anchoring and masks¶
pk_mmlow-k anchor closes the frozen-seed offset (below) on large scales by tying the boost to linear theory.- ξ / ξ_hh have no working large-scale anchor — an r-space anchor and a Hankel-of-anchored-P(k) hybrid were both tested and rejected (they degrade the BAO). The frozen-seed offset therefore remains open there; see Caveats.
- ξ fractional error is meaningless where |ξ| < 0.01 (the BAO zero crossing); the suite masks there and quotes χ vs the across-box SEM instead.