User Guide#

POPS Regression#

POPSRegression is a Bayesian regression method designed for low-noise data where standard Bayesian approaches underestimate uncertainty due to model misspecification.

Background#

In many scientific applications, a surrogate model (e.g. a polynomial) is fit to near-deterministic data from simulations. When the model class cannot perfectly represent the target function, standard Bayesian regression methods like BayesianRidge only capture epistemic uncertainty (which decays with more data) and aleatoric uncertainty (noise, which is negligible). They miss the dominant source of error: misspecification uncertainty — the systematic error from using the wrong model class.

POPS (Pointwise Optimal Parameter Sets) corrects this by:

  1. Fitting a BayesianRidge model to obtain the posterior mean weights.

  2. Computing pointwise corrections: for each training point, finding the parameter perturbation that would fit that point exactly.

  3. Constructing a posterior over these corrections (via a hypercube in PCA space or as a raw ensemble).

  4. Using this posterior to estimate misspecification uncertainty in predictions.

Basic Usage#

>>> import numpy as np
>>> from popsregression import POPSRegression
>>> from sklearn.preprocessing import PolynomialFeatures
>>> rng = np.random.RandomState(42)
>>> x = np.sort(rng.uniform(-1, 1, 30)) * 10
>>> y = np.sin(x) * x + 0.01 * rng.randn(30)
>>> poly = PolynomialFeatures(degree=4, include_bias=True)
>>> X = poly.fit_transform(x.reshape(-1, 1))
>>> model = POPSRegression()
>>> model.fit(X, y)
POPSRegression()
>>> y_pred, y_std = model.predict(X, return_std=True)

The returned y_std combines both epistemic and misspecification uncertainty.

Posterior Types#

POPSRegression supports two posterior forms:

  • 'hypercube' (default): Fits a PCA-aligned hypercube to the pointwise corrections. This tends to give conservative uncertainty bounds and is recommended for most use cases.

  • 'ensemble': Uses the raw pointwise corrections directly as posterior samples. This can be useful when the number of features is small relative to the number of training points.

>>> model = POPSRegression(posterior='ensemble')
>>> model.fit(X, y)
POPSRegression(posterior='ensemble')

Prediction Options#

The predict method supports several return options:

>>> model = POPSRegression().fit(X, y)
>>> y_pred = model.predict(X)  # mean only
>>> y_pred, y_std = model.predict(X, return_std=True)  # + combined std
>>> y_pred, y_max, y_min = model.predict(X, return_bounds=True)  # + bounds
>>> y_pred, y_ep_std = model.predict(X, return_epistemic_std=True)  # + epistemic only

All options can be combined:

>>> result = model.predict(
...     X, return_std=True, return_bounds=True, return_epistemic_std=True
... )
>>> y_pred, y_std, y_max, y_min, y_ep_std = result

Pipeline Integration#

POPSRegression is fully compatible with scikit-learn pipelines:

>>> from sklearn.pipeline import make_pipeline
>>> pipe = make_pipeline(PolynomialFeatures(degree=4), POPSRegression())
>>> pipe.fit(x.reshape(-1, 1), y)
Pipeline(...)
>>> pipe.predict(x.reshape(-1, 1))
array([...])

References#