.. title:: User guide : contents

.. _user_guide:

==========
User Guide
==========

.. _pops_regression:

POPS Regression
---------------

:class:`~popsregression.POPSRegression` is a Bayesian regression method
designed for low-noise data where standard Bayesian approaches underestimate
uncertainty due to model misspecification.

Background
~~~~~~~~~~

In many scientific applications, a surrogate model (e.g. a polynomial) is fit
to near-deterministic data from simulations. When the model class cannot
perfectly represent the target function, standard Bayesian regression methods
like :class:`~sklearn.linear_model.BayesianRidge` only capture *epistemic*
uncertainty (which decays with more data) and *aleatoric* uncertainty (noise,
which is negligible). They miss the dominant source of error:
**misspecification uncertainty** — the systematic error from using the wrong
model class.

POPS (Pointwise Optimal Parameter Sets) corrects this by:

1. Fitting a BayesianRidge model to obtain the posterior mean weights.
2. Computing pointwise corrections: for each training point, finding the
   parameter perturbation that would fit that point exactly.
3. Constructing a posterior over these corrections (via a hypercube in PCA
   space or as a raw ensemble).
4. Using this posterior to estimate misspecification uncertainty in predictions.

Basic Usage
~~~~~~~~~~~

::

    >>> import numpy as np
    >>> from popsregression import POPSRegression
    >>> from sklearn.preprocessing import PolynomialFeatures
    >>> rng = np.random.RandomState(42)
    >>> x = np.sort(rng.uniform(-1, 1, 30)) * 10
    >>> y = np.sin(x) * x + 0.01 * rng.randn(30)
    >>> poly = PolynomialFeatures(degree=4, include_bias=True)
    >>> X = poly.fit_transform(x.reshape(-1, 1))
    >>> model = POPSRegression()
    >>> model.fit(X, y)  # doctest: +ELLIPSIS
    POPSRegression()
    >>> y_pred, y_std = model.predict(X, return_std=True)

The returned ``y_std`` combines both epistemic and misspecification uncertainty.

Posterior Types
~~~~~~~~~~~~~~~

POPSRegression supports two posterior forms:

- ``'hypercube'`` (default): Fits a PCA-aligned hypercube to the pointwise
  corrections. This tends to give conservative uncertainty bounds and is
  recommended for most use cases.

- ``'ensemble'``: Uses the raw pointwise corrections directly as posterior
  samples. This can be useful when the number of features is small relative
  to the number of training points.

::

    >>> model = POPSRegression(posterior='ensemble')
    >>> model.fit(X, y)  # doctest: +ELLIPSIS
    POPSRegression(posterior='ensemble')

Prediction Options
~~~~~~~~~~~~~~~~~~

The ``predict`` method supports several return options::

    >>> model = POPSRegression().fit(X, y)
    >>> y_pred = model.predict(X)  # mean only
    >>> y_pred, y_std = model.predict(X, return_std=True)  # + combined std
    >>> y_pred, y_max, y_min = model.predict(X, return_bounds=True)  # + bounds
    >>> y_pred, y_ep_std = model.predict(X, return_epistemic_std=True)  # + epistemic only

All options can be combined::

    >>> result = model.predict(
    ...     X, return_std=True, return_bounds=True, return_epistemic_std=True
    ... )
    >>> y_pred, y_std, y_max, y_min, y_ep_std = result

Pipeline Integration
~~~~~~~~~~~~~~~~~~~~

POPSRegression is fully compatible with scikit-learn pipelines::

    >>> from sklearn.pipeline import make_pipeline
    >>> pipe = make_pipeline(PolynomialFeatures(degree=4), POPSRegression())
    >>> pipe.fit(x.reshape(-1, 1), y)  # doctest: +ELLIPSIS
    Pipeline(...)
    >>> pipe.predict(x.reshape(-1, 1))  # doctest: +ELLIPSIS
    array([...])

References
~~~~~~~~~~

.. [1] Swinburne, T.D. and Perez, D. (2025).
       "Parameter uncertainties for imperfect surrogate models in the
       low-noise regime."
       Machine Learning: Science and Technology, 6, 015008.