GPax models

Gaussian Processes - Fully Bayesian Implementation

class gpax.models.gp.ExactGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None)[source]

Bases: object

Gaussian process class

Parameters:

input_dim (int) – Number of input dimensions
kernel (Union[str, Callable[[Array, Array, Dict[str, Array], Array], Array]]) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over kernel hyperparameters. Use it when passing your custom kernel.
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).

Examples

Regular GP for sparse noisy obervations

>>> # Get random number generator keys for training and prediction
>>> rng_key, rng_key_predict = gpax.utils.get_keys()
>>> # Initialize model
>>> gp_model = gpax.ExactGP(input_dim=1, kernel='Matern')
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> gp_model.fit(rng_key, X, y)  # X and y are arrays with dimensions (n, 1) and (n,)
>>> # Make a noiseless prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new, noiseless=True)

GP with custom noise prior

>>> gp_model = gpax.ExactGP(
>>>     input_dim=1, kernel='RBF',
>>>     noise_prior_dist = numpyro.distributions.HalfNormal(.1)
>>> )
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> gp_model.fit(rng_key, X, y)  # X and y are arrays with dimensions (n, 1) and (n,)
>>> # Make a noiselsess prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new, noiseless=True)

GP with custom probabilistic model as its mean function

>>> # Define a deterministic mean function
>>> mean_fn = lambda x, param: param["a"]*x + param["b"]
>>>
>>> # Define priors over the mean function parameters (to make it probabilistic)
>>> def mean_fn_prior():
>>>     a = numpyro.sample("a", numpyro.distributions.Normal(3, 1))
>>>     b = numpyro.sample("b", numpyro.distributions.Normal(0, 1))
>>>     return {"a": a, "b": b}
>>>
>>> # Initialize structural GP model
>>> sgp_model = gpax.ExactGP(
        input_dim=1, kernel='Matern',
        mean_fn=mean_fn, mean_fn_prior=mean_fn_prior)
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> sgp_model.fit(rng_key, X, y)  # X and y are numpy arrays with dimensions (n, d) and (n,)
>>> # Make a noiselsess prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new, noiseless=True)

model(X, y=None, **kwargs)[source]

GP probabilistic model with inputs X and targets y

Return type:: None

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)[source]

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples(chain_dim=False)[source]

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)[source]

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)[source]

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

sample_from_prior(rng_key, X, num_samples=10)[source]: Samples from prior predictive distribution at X

class gpax.models.uigp.UIGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None, sigma_x_prior_dist=None)[source]

Bases: ExactGP

Gaussian process with uncertain inputs

This class extends the standard Gaussian Process model to handle uncertain inputs. It allows for incorporating the uncertainty in input data into the GP model, providing a more robust prediction.

Parameters:

input_dim (int) – Number of input dimensions
kernel (Union[str, Callable[[Array, Array, Dict[str, Array], Array], Array]]) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over kernel hyperparameters. Use it when passing your custom kernel.
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).
sigma_x_prior_dist (Optional[Distribution]) – Optional custom prior for the input uncertainty (sigma_x). Defaults to HalfNormal(0.1) under the assumption that data is normalized to (0, 1).

Examples

UIGP with custom prior over sigma_x

>>> # Get random number generator keys for training and prediction
>>> rng_key, rng_key_predict = gpax.utils.get_keys()
>>> # Initialize model
>>> gp_model = gpax.UIGP(input_dim=1, kernel='Matern', sigma_x_prior_dist=gpax.utils.halfnormal_dist(0.5))
>>> # Run HMC to obtain posterior samples for the model parameters
>>> gp_model.fit(rng_key, X, y, num_warmup=2000, num_samples=10000)
>>> # Make a prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new)

model(X, y=None, **kwargs)[source]

Gaussian process model for uncertain (stochastic) inputs

Return type:: None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of UIGP parameters

Return type:: Tuple[Array, Array]

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

class gpax.models.hskgp.VarNoiseGP(input_dim, kernel, noise_kernel='RBF', mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_kernel_prior=None, lengthscale_prior_dist=None, noise_mean_fn=None, noise_mean_fn_prior=None, noise_lengthscale_prior_dist=None)[source]

Bases: ExactGP

Heteroskedastic Gaussian process class

Parameters:

input_dim (int) – Number of input dimensions
kernel (Union[str, Callable[[Array, Array, Dict[str, Array], Array], Array]]) – Main kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
noise_kernel (Union[str, Callable[[Array, Array, Dict[str, Array], Array], Array]]) – Noise kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over main kernel hyperparameters. Use it when passing your custom kernel.
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over noise kernel hyperparameters. Use it when passing your custom kernel.
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over main kernel lengthscale. Defaults to LogNormal(0, 1).
noise_mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional noise mean function
noise_mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over noise mean function

noise_lengthscale_prior_dist (Optional[Distribution]) –

Optional custom prior distribution over noise kernel lengthscale. Defaults to LogNormal(0, 1).

Examples:

Use two different kernels with default priors for main and noise processes

>>> # Get random number generator keys for training and prediction
>>> rng_key, rng_key_predict = gpax.utils.get_keys()
>>> # Initialize model
>>> gp_model = gpax.VarNoiseGP(input_dim=1, kernel='RBF, noise_kernel='Matern')
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> gp_model.fit(rng_key, X, y)
>>> # Make a prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new)
>>> # Get the inferred noise samples (for training data)
>>> data_variance = gp_model.get_data_var_samples()

Specify custom kernel lengthscale priors for main and noise kernels

>>> lscale_prior = gpax.utils.gamma_dist(5, 1)  # equivalent to numpyro.distributions.Gamma(5, 1)
>>> noise_lscale_prior = gpax.utils.halfnormal_dist(1)  # equivalent to numpyro.distributions.HalfNormal(1)
>>> # Initialize model
>>> gp_model = gpax.VarNoiseGP(
>>>    input_dim=1, kernel='RBF, noise_kernel='Matern',
>>>    lengthscale_prior_dist=lscale_prior, noise_lengthscale_prior_dist=noise_lscale_prior)
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> gp_model.fit(rng_key, X, y)
>>> # Make a prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new)
>>> # Get the inferred noise samples (for training data)
>>> data_variance = gp_model.get_data_var_samples()

model(X, y=None, **kwargs)[source]

Heteroskedastic GP probabilistic model with inputs X and targets y

Return type:: None

get_mvn_posterior(X_new, params, *args, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of heteroskedastic GP parameters

Return type:: Tuple[Array, Array]

get_data_var_samples()[source]: Returns samples with inferred (training) data variance - aka noise

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

class gpax.models.mngp.MeasuredNoiseGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, lengthscale_prior_dist=None)[source]

Bases: ExactGP

Gaussian Process model that incorporates measured noise. This class extends the ExactGP model by allowing the inclusion of measured noise variances in the GP framework. Unlike standard GP models where noise is typically inferred, this model uses noise values obtained from repeated measurements at the same input points.

Parameters:

input_dim (int) – Number of input dimensions
kernel (Union[str, Callable[[Array, Array, Dict[str, Array], Array], Array]]) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over kernel hyperparameters. Use it when passing your custom kernel.
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).

Examples

>>> # Get random number generator keys for training and prediction
>>> key1, key2 = gpax.utils.get_keys()
>>> # Initialize model
>>> gp_model = gpax.MeasuredNoiseGP(input_dim=1, kernel='Matern')
>>> # Run HMC to obtain posterior samples for the GP model parameters
>>> gp_model.fit(key1, X, y_mean, noise)  # X, y_mean, and noise have dimensions (n, 1), (n,), and (n,)
>>> # Make a prediction on new inputs by extrapolating noise variance with either linear regression or gaussian process
>>> y_pred, y_samples = gp_model.predict(key2, X_new, noise_prediction_method='linreg')

model(X, y=None, measured_noise=None, **kwargs)[source]

GP model that accepts measured noise

Return type:: None

fit(rng_key, X, y, measured_noise, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)[source]

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
measured_noise (Array) – 1D vector with measured noise
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=True, device=None, noise_prediction_method='linreg', **kwargs)[source]

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
noise_prediction_method (str) – Method for extrapolating noise variance to new/test data. Choose between ‘linreg’ and ‘gpreg’. Defaults to ‘linreg’.
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

linreg(x, y, x_new, **kwargs)[source]

gpreg(x, y, x_new, **kwargs)[source]

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

class gpax.models.vgp.vExactGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None)[source]

Bases: ExactGP

Gaussian process class for vector-valued targets

Parameters:

input_dim (int) – number of input dimensions
kernel (str) – type of kernel (‘RBF’, ‘Matern’, ‘Periodic’)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – optional custom priors over kernel hyperparameters (uses LogNormal(0,1) by default)
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – optional priors over mean function parameters
noise_prior (Optional[Callable[[], Dict[str, Array]]]) – optional custom prior for observation noise
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).

model(X, y=None, **kwargs)[source]

GP probabilistic model with inputs X and vector-valued targets y

Return type:: None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters. Wrapper over self._get_mvn_posterior.

Return type:: Tuple[Array, Array]

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)[source]

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

Gaussian Processes - Approximate Bayesian

class gpax.models.vigp.viGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None, guide='delta')[source]

Bases: ExactGP

Variational inference based Gaussian process

Parameters:

input_dim (int) – Number of input dimensions
kernel (str) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over kernel hyperparameters; uses LogNormal(0,1) by default
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).
guide (str) – Auto-guide option, use ‘delta’ (default) or ‘normal’

Examples

Use viGP to reconstruct data from sparse noisy obervations

>>> # Get random number generator keys
>>> rng_key, rng_key_predict = gpax.utils.get_keys()
>>> # Initialize model
>>> gp_model = gpax.viGP(input_dim=1, kernel='Matern')
>>> # Run variational inference to obtain a MAP estimate for the GP model parameters
>>> gp_model.fit(rng_key, X, y, num_steps=1000)  # X and y are arrays with dimensions (n, 1) and (n,)
>>> # Make a noiseless prediction on new inputs
>>> y_pred, y_samples = gp_model.predict(rng_key_predict, X_new, noiseless=True)

fit(rng_key, X, y, num_steps=1000, step_size=0.005, progress_bar=True, print_summary=True, device=None, **kwargs)[source]

Run variational inference to learn GP (hyper)parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector with (number of points, number of features) dimensions
y (Array) – 1D target vector with (n,) dimensions
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of training
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples()[source]

Get posterior samples

Return type:: Dict[str, Array]

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, predict_fn=None, noiseless=False, device=None, **kwargs)[source]

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

predict(rng_key, X_new, samples=None, noiseless=False, device=None, **kwargs)[source]

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

model(X, y=None, **kwargs)

GP probabilistic model with inputs X and targets y

Return type:: None

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

class gpax.models.sparse_gp.viSparseGP(input_dim, kernel, mean_fn=None, kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None, guide='delta')[source]

Bases: viGP

Variational inference-based sparse Gaussian process

Parameters:

input_dim (int) – Number of input dimensions
kernel (str) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over kernel hyperparameters; uses LogNormal(0,1) by default
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1).
guide (str) – Auto-guide option, use ‘delta’ (default) or ‘normal’

model(X, y=None, Xu=None, **kwargs)[source]

Probabilistic sparse Gaussian process regression model

Return type:: None

fit(rng_key, X, y, inducing_points_ratio=0.1, inducing_points_selection='random', num_steps=1000, step_size=0.005, progress_bar=True, print_summary=True, device=None, **kwargs)[source]

Run variational inference to learn sparse GP (hyper)parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector with (number of points, number of features) dimensions
y (Array) – 1D target vector with (n,) dimensions
Xu – Inducing points ratio. Must be a float between 0 and 1. Default value is 0.1.
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of training
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

get_samples()

Get posterior samples

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

Deep Kernel Learning - Fully Bayesian Implementation

class gpax.models.dkl.DKL(input_dim, z_dim=2, kernel='RBF', kernel_prior=None, nn=None, nn_prior=None, latent_prior=None, hidden_dim=None, **kwargs)[source]

Bases: ExactGP

Fully Bayesian implementation of deep kernel learning

Parameters:

input_dim (int) – Number of input dimensions
z_dim (int) – Latent space dimensionality (defaults to 2)
kernel (str) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over kernel hyperparameters; uses LogNormal(0,1) by default
nn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Custom neural network (‘feature extractor’); uses a 3-layer MLP with hyperbolic tangent activations by default
nn_prior (Optional[Callable[[], Dict[str, Array]]]) – Priors over the weights and biases in ‘nn’; uses normal priors by default
latent_prior (Optional[Callable[[Array], Dict[str, Array]]]) – Optional prior over the latent space (BNN embedding); uses none by default
hidden_dim (Optional[List[int]]) – Optional custom MLP architecture. For example [16, 8, 4] corresponds to a 3-layer neural network backbone containing 16, 8, and 4 neurons activated by tanh(). The latent layer is added autoamtically and doesn’t have to be spcified here. Defaults to [64, 32].
**kwargs – Optional custom prior distributions over observational noise (noise_dist_prior) and kernel lengthscale (lengthscale_prior_dist)

Examples

DKL with image patches as inputs and a 1-d vector as targets

>>> # Get random number generator keys for training and prediction
>>> key1, key2 = gpax.utils.get_keys()
>>> input data dimensions are (n, height*width*channels)
>>> data_dim = X.shape[-1]
>>> # Initialize DKL model with 2 latent dimensions
>>> dkl = gpax.DKL(data_dim, z_dim=2, kernel='RBF')
>>> # Train model by parallelizing HMC chains on a single GPU
>>> dkl.fit(key1, X, y, num_warmup=333, num_samples=333, num_chains=3, chain_method='vectorized')
>>> # Obtain posterior mean and samples from DKL posterior at new inputs
>>> # using batches to avoid memory overflow
>>> y_pred, y_samples = dkl.predict_in_batches(key2, X_new)

model(X, y=None, **kwargs)[source]

DKL probabilistic model

Return type:: None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)[source]

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

embed(X_new)[source]

Embeds data into the latent space using the inferred weights of the DKL’s Bayesian neural network

Return type:: Array

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

Deep Kernel Learning - Approximate Bayesian

class gpax.models.vidkl.viDKL(input_dim, z_dim=2, kernel='RBF', kernel_prior=None, nn=None, nn_prior=True, latent_prior=None, guide='delta', **kwargs)[source]

Bases: ExactGP

Implementation of the variational infernece-based deep kernel learning

Parameters:

input_dim (Union[int, Tuple[int]]) – Input features dimensions (e.g. 64*64 for a stack of flattened 64-by-64 images)
z_dim (int) – Latent space dimensionality (defaults to 2)
kernel (str) – Kernel function (‘RBF’, ‘Matern’, ‘Periodic’, or custom function)
kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over kernel hyperparameters; uses LogNormal(0,1) by default
nn (Optional[Callable[[Array], Array]]) – Custom neural network (‘feature extractor’); uses a 3-layer MLP with ReLU activations by default
nn_prior (bool) – Places probabilistic priors over NN weights and biases (Default: True)
latent_prior (Optional[Callable[[Array], Dict[str, Array]]]) – Optional prior over the latent space (NN embedding); uses none by default
guide (str) – Auto-guide option, use ‘delta’ (default) or ‘normal’
**kwargs – Optional custom prior distributions over observational noise (noise_dist_prior) and kernel lengthscale (lengthscale_prior_dist)

Examples

vi-DKL with image patches as inputs and a 1-d vector as targets

>>> # Get random number generator keys for training and prediction
>>> key1, key2 = gpax.utils.get_keys()
>>> input data dimensions are (n, height*width*channels)
>>> data_dim = X.shape[-1]
>>> # Initialize vi-DKL model with 2 latent dimensions
>>> dkl = gpax.viDKL(input_dim=data_dim, z_dim=2, kernel='RBF')
>>> Train a model
>>> dkl.fit(rng_key, X_train, y_train, num_steps=1000, step_size=0.005)
>>> # Obtain posterior mean and variance ('uncertainty') at new inputs
>>> y_mean, y_var = dkl.predict(key2, X_new)

model(X, y=None, **kwargs)[source]

DKL probabilistic model

Return type:: None

single_fit(rng_key, X, y, num_steps=1000, step_size=0.005, print_summary=True, progress_bar=True, **kwargs)[source]

Optimizes parameters of a single DKL model

Return type:: None

fit(rng_key, X, y, num_steps=1000, step_size=0.005, print_summary=True, progress_bar=True, **kwargs)[source]

Run stochastic variational inference to learn a DKL model(s) parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – Input high-dimensional features
y (Array) – Target output (scalar of vector)
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
print_summary (bool) – print summary at the end of sampling
progress_bar – show progress bar (works only for scalar outputs)

get_mvn_posterior(X_new, nn_params, k_params, noiseless=False, y_residual=None, **kwargs)[source]

Returns predictive mean and covariance at new points (mean and cov, where cov.diagonal() is ‘uncertainty’) given a single set of DKL parameters

Return type:: Tuple[Array, Array]

sample_from_posterior(rng_key, X_new, n=1000, noiseless=False, **kwargs)[source]

Samples from the DKL posterior at X_new points

Return type:: Tuple[Array]

get_samples()[source]

Returns a tuple with trained NN weights and kernel hyperparameters

Return type:: Tuple[Dict[str, Array]]

predict_in_batches(rng_key, X_new, batch_size=100, params=None, noiseless=False, **kwargs)[source]

Make prediction at X_new with sampled DKL parameters by spitting the input array into chunks (“batches”) and running self.predict on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

predict(rng_key, X_new, params=None, noiseless=False, *args, **kwargs)[source]

Make prediction at X_new points using a trained DKL model(s)

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – New inputs
params (Optional[Tuple[Dict[str, Array]]]) – Tuple with neural network weigths and kernel parameters (optional)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise for the training data, we also want to include that noise in our prediction.

Return type:

Tuple[Array, Array]

Returns:

Predictive mean and variance

fit_predict(rng_key, X, y, X_new, num_steps=1000, step_size=0.005, n_models=1, batch_size=100, noiseless=False, ensemble_method='vectorized', print_summary=True, progress_bar=True, **kwargs)[source]

Run SVI to learn DKL model(s) parameters and make a prediction with trained model(s) on new data. Allows using an ensemble of models.

Parameters:

rng_key (array) – random number generator key
X (Array) – Input high-dimensional features
y (Array) – Target output (scalar of vector)
X_new (Array) – New (‘test’) data
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
n_models (int) – number of models in the ensemble (defaults to 1)
batch_size (int) – prediction batch size (to avoid memory overflows)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise for the training data, we also want to include that noise in our prediction.
ensemble_method (str) – ‘vectorized’ (single GPU) or ‘parallel’ (multiple GPUs)
print_summary (bool) – print summary at the end of sampling
progress_bar – show progress bar (works only for scalar outputs)

Return type:

Tuple[Array, Array]

Returns:

Predictive mean and variance

embed(X_new)[source]

Use trained neural network(s) to embed the input data into the latent space(s)

Return type:: Array

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

Infinite-width Bayesian Neural Networks

class gpax.models.ibnn.iBNN(input_dim, depth=3, activation='erf', mean_fn=None, nngp_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None)[source]

Bases: ExactGP

Infinite-width Bayesian neural net (iBNN)

Parameters:

input_dim (int) – Number of input dimensions
depth (int) – The number of layers in the corresponding infinite-width neural network.
activation (str) – activation function (‘erf’ or ‘relu’)
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
nngp_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over NNGP kernel hyperparameters; uses LogNormal(0,1) by default
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

model(X, y=None, **kwargs)

GP probabilistic model with inputs X and targets y

Return type:: None

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

Multi-Task Learning

class gpax.models.mtgp.MultiTaskGP(input_dim, data_kernel, num_latents=None, shared_input_space=False, num_tasks=None, rank=None, mean_fn=None, data_kernel_prior=None, mean_fn_prior=None, noise_prior=None, noise_prior_dist=None, lengthscale_prior_dist=None, W_prior_dist=None, v_prior_dist=None, output_scale=False, **kwargs)[source]

Bases: ExactGP

Gaussian process for multi-task/fidelity learning

Parameters:

input_dim (int) – Number of input dimensions
data_kernel (str) – Kernel function operating on data inputs (‘RBF’, ‘Matern’, ‘Periodic’, or a custom function)
num_latents (int) – Number of latent functions. Typically equal to or less than the number of tasks
shared_input_space (bool) – If True, assumes that all tasks share the same input space and uses a multivariate kernel (Kronecker product). If False (default), assumes that different tasks have different number of observations and uses a multitask kernel (elementwise multiplication). In that case, the task indices must be appended as the last column of the input vector.
num_tasks (int) – Number of tasks. This is only needed if shared_input_space is True.
rank (Optional[int]) – Rank of the weight matrix in the task kernel. Cannot be larger than the number of tasks. Higher rank implies higher correlation. Uses (num_tasks - 1) when not specified.
mean_fn (Optional[Callable[[Array, Dict[str, Array]], Array]]) – Optional deterministic mean function (use ‘mean_fn_priors’ to make it probabilistic)
data_kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over the data kernel hyperparameters
mean_fn_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over mean function parameters
noise_prior_dist (Optional[Distribution]) – Optional custom prior distribution over the observational noise variance. Defaults to LogNormal(0,1).
lengthscale_prior_dist (Optional[Distribution]) – Optional custom prior distribution over kernel lengthscale. Defaults to LogNormal(0, 1)
W_prior_dist (Optional[Distribution]) – Optional custom prior distribution over W in the task kernel, \(WW^T + diag(v)\). Defaults to Normal(0, 10).
v_prior_dist (Optional[Distribution]) – Optional custom prior distribution over v in the task kernel, \(WW^T + diag(v)\). Must be non-negative. Defaults to LogNormal(0, 1)
task_kernel_prior – Optional custom priors over task kernel parameters; Defaults to Normal(0, 10) for weights W and LogNormal(0, 1) for variances v.
output_scale (bool) – Option to sample data kernel’s output scale. Defaults to False to avoid over-parameterization (the scale is already absorbed into task kernel)

model(X, y=None, **kwargs)[source]

Multitask GP probabilistic model with inputs X and targets y

Return type:: None

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None, **kwargs)

Run Hamiltonian Monter Carlo to infer the GP parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – 2D feature vector
y (Array) – 1D target vector
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

None

get_mvn_posterior(X_new, params, noiseless=False, **kwargs)

Returns parameters (mean and cov) of multivariate normal posterior for a single sample of GP parameters

Return type:: Tuple[Array, Array]

get_samples(chain_dim=False)

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, noiseless=False, device=None, **kwargs)

Make prediction at X_new points using posterior samples for GP parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – new inputs with (number of points, number of features) dimensions
samples (Optional[Dict[str, Array]]) – optional (different) samples with GP parameters
n (int) – number of samples from Multivariate Normal posterior for each HMC sample with GP parameters
filter_nans (bool) – filter out samples containing NaN values (if any)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise by default for the training data, we also want to include that noise in our prediction.
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`
**jitter – Small positive term added to the diagonal part of a covariance matrix for numerical stability (Default: 1e-6)

Return type:

Tuple[Array, Array]

Returns: Center of the mass of sampled means and all the sampled predictions

predict_in_batches(rng_key, X_new, batch_size=100, samples=None, n=1, filter_nans=False, predict_fn=None, noiseless=False, device=None, **kwargs)

Make prediction at X_new with sampled GP parameters by spitting the input array into chunks (“batches”) and running predict_fn (defaults to self.predict) on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

class gpax.models.vi_mtdkl.viMTDKL(input_dim, z_dim=2, data_kernel='RBF', num_latents=None, shared_input_space=False, num_tasks=None, rank=None, data_kernel_prior=None, nn=None, nn_prior=True, guide='delta', W_prior_dist=None, v_prior_dist=None, task_kernel_prior=None, **kwargs)[source]

Bases: viDKL

Implementation of the variational inference-based deep kernel learning for multi-task/fidelity problems

Parameters:

input_dim (int) – Number of input dimensions, not counting the column with task indices (if any)
z_dim (int) – Latent space dimensionality (defaults to 2)
data_kernel (str) – Kernel function operating on data inputs (‘RBF’, ‘Matern’, ‘Periodic’, or a custom function)
num_latents (int) – Number of latent functions. Typically equal to or less than the number of tasks
shared_input_space (bool) – If True, assumes that all tasks share the same input space and uses a multivariate kernel (Kronecker product). If False (default), assumes that different tasks have different number of observations and uses a multitask kernel (elementwise multiplication). In that case, the task indices must be appended as the last column of the input vector.
num_tasks (int) – Number of tasks. This is only needed if shared_input_space is True.
rank (Optional[int]) – Rank of the weight matrix in the task kernel. Cannot be larger than the number of tasks. Higher rank implies higher correlation. Uses (num_tasks - 1) when not specified.
data_kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional priors over kernel hyperparameters; uses LogNormal(0,1) by default
nn (Optional[Callable[[Array], Array]]) – Custom neural network (‘feature extractor’); uses a 3-layer MLP with ReLU activations by default
nn_prior (bool) – Places probabilistic priors over NN weights and biases (Default: True)
latent_prior – Optional prior over the latent space (NN embedding); uses none by default
guide (str) – Auto-guide option, use ‘delta’ (default) or ‘normal’
W_prior_dist (Optional[Distribution]) – Optional custom prior distribution over W in the task kernel, \(WW^T + diag(v)\). Defaults to Normal(0, 10).
v_prior_dist (Optional[Distribution]) – Optional custom prior distribution over v in the task kernel, \(WW^T + diag(v)\). Must be non-negative. Defaults to LogNormal(0, 1)
task_kernel_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom priors over task kernel parameters; Defaults to Normal(0, 10) for weights W and LogNormal(0, 1) for variances v.
**kwargs – Optional custom prior distributions over observational noise (noise_dist_prior) and kernel lengthscale (lengthscale_prior_dist)

model(X, y=None, **kwargs)[source]

Multitask DKL probabilistic model

Return type:: None

get_mvn_posterior(X_new, nn_params, k_params, noiseless=False, y_residual=None, **kwargs)[source]

Returns predictive mean and covariance at new points (mean and cov, where cov.diagonal() is ‘uncertainty’) given a single set of DKL parameters

Return type:: Tuple[Array, Array]

embed(X_new)

Use trained neural network(s) to embed the input data into the latent space(s)

Return type:: Array

fit(rng_key, X, y, num_steps=1000, step_size=0.005, print_summary=True, progress_bar=True, **kwargs)

Run stochastic variational inference to learn a DKL model(s) parameters

Parameters:

rng_key (array) – random number generator key
X (Array) – Input high-dimensional features
y (Array) – Target output (scalar of vector)
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
print_summary (bool) – print summary at the end of sampling
progress_bar – show progress bar (works only for scalar outputs)

fit_predict(rng_key, X, y, X_new, num_steps=1000, step_size=0.005, n_models=1, batch_size=100, noiseless=False, ensemble_method='vectorized', print_summary=True, progress_bar=True, **kwargs)

Run SVI to learn DKL model(s) parameters and make a prediction with trained model(s) on new data. Allows using an ensemble of models.

Parameters:

rng_key (array) – random number generator key
X (Array) – Input high-dimensional features
y (Array) – Target output (scalar of vector)
X_new (Array) – New (‘test’) data
num_steps (int) – number of SVI steps
step_size (float) – step size schedule for Adam optimizer
n_models (int) – number of models in the ensemble (defaults to 1)
batch_size (int) – prediction batch size (to avoid memory overflows)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise for the training data, we also want to include that noise in our prediction.
ensemble_method (str) – ‘vectorized’ (single GPU) or ‘parallel’ (multiple GPUs)
print_summary (bool) – print summary at the end of sampling
progress_bar – show progress bar (works only for scalar outputs)

Return type:

Tuple[Array, Array]

Returns:

Predictive mean and variance

get_samples()

Returns a tuple with trained NN weights and kernel hyperparameters

Return type:: Tuple[Dict[str, Array]]

predict(rng_key, X_new, params=None, noiseless=False, *args, **kwargs)

Make prediction at X_new points using a trained DKL model(s)

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – New inputs
params (Optional[Tuple[Dict[str, Array]]]) – Tuple with neural network weigths and kernel parameters (optional)
noiseless (bool) – Noise-free prediction. It is set to False by default as new/unseen data is assumed to follow the same distribution as the training data. Hence, since we introduce a model noise for the training data, we also want to include that noise in our prediction.

Return type:

Tuple[Array, Array]

Returns:

Predictive mean and variance

predict_in_batches(rng_key, X_new, batch_size=100, params=None, noiseless=False, **kwargs)

Make prediction at X_new with sampled DKL parameters by spitting the input array into chunks (“batches”) and running self.predict on each of them one-by-one to avoid a memory overflow

Return type:: Tuple[Array, Array]

sample_from_posterior(rng_key, X_new, n=1000, noiseless=False, **kwargs)

Samples from the DKL posterior at X_new points

Return type:: Tuple[Array]

sample_from_prior(rng_key, X, num_samples=10): Samples from prior predictive distribution at X

single_fit(rng_key, X, y, num_steps=1000, step_size=0.005, print_summary=True, progress_bar=True, **kwargs)

Optimizes parameters of a single DKL model

Return type:: None

Structured Probabilistic Models

class gpax.models.spm.sPM(model, model_prior, noise_prior=None, noise_prior_dist=None)[source]

Bases: object

Bayesian inference with structured probabilistic model. Serves as a wrapper over the NumPyro’s functions for Bayesian inference on probabilistic models.

Parameters:

model (Callable[[Array, Dict[str, Array]], Array]) – Deterministic model of expected system’s behavior.
model_prior (Callable[[], Dict[str, Array]]) – Priors over model parameters
noise_prior (Optional[Callable[[], Dict[str, Array]]]) – Optional custom prior for observation noise; uses LogNormal(0,1) by default.

model(X, y=None)[source]

Full probabilistic model

Return type:: None

fit(rng_key, X, y, num_warmup=2000, num_samples=2000, num_chains=1, chain_method='sequential', progress_bar=True, print_summary=True, device=None)[source]

Run HMC to infer parameters of the structured probabilistic model

Parameters:

rng_key (array) – random number generator key
X (Array) – 1D or 2D ‘feature vector’ with \((n,)\) or \(n x num_features\) dimensions
y (Array) – 1D ‘target vector’ with \((n,)\) dimensions
num_warmup (int) – number of HMC warmup states
num_samples (int) – number of HMC samples
num_chains (int) – number of HMC chains
chain_method (str) – ‘sequential’, ‘parallel’ or ‘vectorized’
progress_bar (bool) – show progress bar
print_summary (bool) – print summary at the end of sampling
device (Type[Device]) – optionally specify a cpu or gpu device on which to run the inference; e.g., device=jax.devices("cpu")[0]

Return type:

None

get_samples(chain_dim=False)[source]

Get posterior samples (after running the MCMC chains)

Return type:: Dict[str, Array]

get_param_means()[source]: Returns mean value for each probabilistic parameter in the model

sample_from_prior(rng_key, X, num_samples=10)[source]: Samples from prior predictive distribution at X

sample_single_posterior_predictive(rng_key, X_new, params, n_draws)[source]

predict(rng_key, X_new, samples=None, n=1, filter_nans=False, take_point_predictions_mean=True, device=None)[source]

Make prediction at X_new points using posterior model parameters

Parameters:

rng_key (Array) – random number generator key
X_new (Array) – 2D vector with new/’test’ data of \(n x num_features\) dimensionality
samples (Optional[Dict[str, Array]]) – optional posterior samples
n (int) – number of samples to draw from normal distribution per single HMC sample
filter_nans (bool) – filter out samples containing NaN values (if any)
take_point_predictions_mean (bool) – take a mean of point predictions (without sampling from the normal distribution)
device (Type[Device]) – optionally specify a cpu or gpu device on which to make a prediction; e.g., `device=jax.devices("gpu")[0]`

Return type:

Tuple[Array, Array]

Returns:

Point predictions (or their mean) and posterior predictive distribution