POPSRegression#
- class popsregression.POPSRegression(*, max_iter=300, tol=0.001, alpha_1=1e-06, alpha_2=1e-06, lambda_1=1e-06, lambda_2=1e-06, alpha_init=None, lambda_init=None, compute_score=False, fit_intercept=False, copy_X=True, verbose=False, mode_threshold=1e-08, resample_density=1.0, resampling_method='uniform', percentile_clipping=0.0, leverage_percentile=50.0, posterior='hypercube')#
Bayesian regression for low-noise data with misspecification uncertainty.
Fits a linear model using BayesianRidge, then estimates weight uncertainties accounting for model misspecification using the POPS (Pointwise Optimal Parameter Sets) algorithm [1]. Unlike standard Bayesian regression, the aleatoric noise precision
alpha_is not used for predictions, as it should be negligible in the low-noise regime.Standard Bayesian regression can only estimate epistemic and aleatoric uncertainties. In the low-noise limit, weight uncertainties (
sigma_inBayesianRidge) are significantly underestimated as they only account for epistemic uncertainties that decay with increasing data. POPS corrects this by estimating misspecification uncertainty from pointwise optimal parameter sets.- Parameters:
- max_iterint, default=300
Maximum number of iterations for the BayesianRidge convergence loop.
- tolfloat, default=1e-3
Convergence threshold. Stop the algorithm if the coefficient vector has converged.
- alpha_1float, default=1e-6
Shape parameter for the Gamma distribution prior over
alpha_.- alpha_2float, default=1e-6
Inverse scale (rate) parameter for the Gamma distribution prior over
alpha_.- lambda_1float, default=1e-6
Shape parameter for the Gamma distribution prior over
lambda_.- lambda_2float, default=1e-6
Inverse scale (rate) parameter for the Gamma distribution prior over
lambda_.- alpha_initfloat, default=None
Initial value for
alpha_(precision of the noise). If None,alpha_initis1 / Var(y).- lambda_initfloat, default=None
Initial value for
lambda_(precision of the weights). If None,lambda_initis 1.- compute_scorebool, default=False
If True, compute the log marginal likelihood at each step.
- fit_interceptbool, default=False
Whether to fit an intercept. If True, a constant column is appended to X (rather than centering) so that the intercept participates in the POPS posterior estimation.
- copy_Xbool, default=True
If True, X will be copied; else, it may be overwritten.
- verbosebool, default=False
Verbose mode when fitting the model.
- mode_thresholdfloat, default=1e-8
Eigenvalue threshold (relative to max) for determining the effective dimensionality of the POPS posterior. Eigenvalues below
mode_threshold * max_eigenvalueare discarded.- resample_densityfloat, default=1.0
Number of resampled points per training point. The actual number of samples is
max(100, int(resample_density * n_samples)).- resampling_method{‘uniform’, ‘sobol’, ‘latin’, ‘halton’}, default=’uniform’
Quasi-random sampling method for generating points within the POPS hypercube posterior.
- percentile_clippingfloat, default=0.0
Percentile to clip from each end when determining hypercube bounds. The hypercube spans the
[percentile_clipping, 100 - percentile_clipping]range. Should be between 0 and 50.- leverage_percentilefloat, default=50.0
Only training points with leverage scores above this percentile are used for POPS posterior estimation. Higher values accelerate fitting by focusing on high-leverage points.
- posterior{‘hypercube’, ‘ensemble’}, default=’hypercube’
Form of the POPS parameter posterior:
'hypercube': fit an axis-aligned box in PCA space (default).'ensemble': use raw pointwise corrections directly.
- Attributes:
- coef_ndarray of shape (n_features,)
Coefficients of the regression model (posterior mean).
- intercept_float
Independent term in the decision function. Set to 0.0 if
fit_intercept=False.- alpha_float
Estimated precision of the noise. Not used for prediction.
- lambda_float
Estimated precision of the weights.
- sigma_ndarray of shape (n_features, n_features)
Estimated epistemic variance-covariance matrix of the weights.
- misspecification_sigma_ndarray of shape (n_features, n_features)
Estimated misspecification variance-covariance matrix from POPS.
- posterior_samples_ndarray of shape (n_features, n_posterior_samples)
Samples from the POPS posterior, representing plausible weight perturbations.
- scores_ndarray of shape (n_iter_,)
Value of the log marginal likelihood at each iteration. Only available if
compute_score=True.- n_iter_int
The actual number of iterations to reach convergence.
- n_features_in_int
Number of features seen during fit.
- feature_names_in_ndarray of shape (
n_features_in_,) Names of features seen during fit. Defined only when
Xhas feature names that are all strings.
See also
sklearn.linear_model.BayesianRidgeBayesian ridge regression without misspecification correction.
sklearn.linear_model.ARDRegressionBayesian ARD regression.
References
[1]Swinburne, T.D. and Perez, D. (2025). “Parameter uncertainties for imperfect surrogate models in the low-noise regime.” Machine Learning: Science and Technology, 6, 015008. :doi:`10.1088/2632-2153/ad9fce`
Examples
>>> import numpy as np >>> from popsregression import POPSRegression >>> rng = np.random.RandomState(0) >>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) >>> y = np.dot(X, np.array([1, 2])) + 0.01 * rng.randn(4) >>> reg = POPSRegression() >>> reg.fit(X, y) POPSRegression() >>> reg.predict(np.array([[3, 5]])) array([...])
Methods
fit(X, y[, sample_weight])Fit the POPS regression model.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X[, return_std, return_bounds, ...])Predict using the POPS regression model.
score(X, y[, sample_weight])Return coefficient of determination on test data.
set_fit_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
set_predict_request(*[, return_bounds, ...])Configure whether metadata should be requested to be passed to the
predictmethod.set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.- fit(X, y, sample_weight=None)#
Fit the POPS regression model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Training data.
- yarray-like of shape (n_samples,)
Target values.
- sample_weightarray-like of shape (n_samples,), default=None
Individual weights for each sample.
- Returns:
- selfobject
Returns the instance itself.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X, return_std=False, return_bounds=False, return_epistemic_std=False)#
Predict using the POPS regression model.
In addition to the standard
return_stdfromBayesianRidge, this method can return prediction bounds (min/max over the posterior) and epistemic-only uncertainty.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Samples to predict for.
- return_stdbool, default=False
If True, return the combined (misspecification + epistemic) standard deviation.
- return_boundsbool, default=False
If True, return the max and min predictions over the POPS posterior samples.
- return_epistemic_stdbool, default=False
If True, return the epistemic-only standard deviation (from
sigma_, excluding misspecification).
- Returns:
- y_meanndarray of shape (n_samples,)
Predicted mean values.
- y_stdndarray of shape (n_samples,)
Combined standard deviation. Only returned if
return_std=True.- y_maxndarray of shape (n_samples,)
Upper bound from posterior samples. Only returned if
return_bounds=True.- y_minndarray of shape (n_samples,)
Lower bound from posterior samples. Only returned if
return_bounds=True.- y_epistemic_stdndarray of shape (n_samples,)
Epistemic-only standard deviation. Only returned if
return_epistemic_std=True.
- score(X, y, sample_weight=None)#
Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value ofy, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for
X.- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)w.r.t.y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score(). This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') POPSRegression#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_predict_request(*, return_bounds: bool | None | str = '$UNCHANGED$', return_epistemic_std: bool | None | str = '$UNCHANGED$', return_std: bool | None | str = '$UNCHANGED$') POPSRegression#
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- return_boundsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_boundsparameter inpredict.- return_epistemic_stdstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_epistemic_stdparameter inpredict.- return_stdstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_stdparameter inpredict.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') POPSRegression#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
Examples using popsregression.POPSRegression#
POPS vs BayesianRidge: Uncertainty for Low-Noise Surrogates