pymor.algorithms.ml.vkoga.regressor

Module Contents

class pymor.algorithms.ml.vkoga.regressor.VKOGARegressor(kernel=GaussianKernel(), criterion='fp', max_centers=20, tol=1e-06, reg=1e-12)[source]

Bases: pymor.core.base.BasicObject

Scikit-learn-style regressor using the VKOGASurrogate.

The regressor uses the weak_greedy in its fit-method to select centers according to the given criterion.

The algorithm is described in [WH13] and [SH21].

Parameters:
  • kernel – Kernel to use in the regressor. The kernel is assumed to have a scalar-valued output. For vector-valued outputs, the interpolant uses vector-valued coefficients, i.e., the prediction is computed as a linear combination of kernel evaluations with vector-valued weights. The interface of the kernel needs to follow the scikit-learn interface and in particular a __call__-method for (vectorized) evaluation of the kernel and a diag-method for computing the diagonal of the kernel matrix are required. For convenience, a Gaussian kernel is provided in pymor.algorithms.ml.vkoga.kernels.

  • criterion – Selection criterion for the greedy algorithm. Possible values are 'fp', 'f' and 'p'.

  • max_centers – Maximum number of selected centers in the greedy algorithm.

  • tol – Tolerance for the weak greedy algorithm.

  • reg – Regularization parameter for the kernel interpolation.

Methods

fit

Fit VKOGA surrogate using pyMOR's weak greedy algorithm.

predict

Predict the target for the input X.

fit(X, Y)[source]

Fit VKOGA surrogate using pyMOR’s weak greedy algorithm.

Parameters:
  • X – Training inputs.

  • Y – Training targets.

Returns:

The trained regressor.

predict(X)[source]

Predict the target for the input X.

Parameters:

X – Input for which to compute the prediction.

Returns:

Prediction obtained by the VKOGASurrogate.

class pymor.algorithms.ml.vkoga.regressor.VKOGASurrogate(kernel, X_train, F_train, criterion='fp', reg=1e-12)[source]

Bases: pymor.algorithms.greedy.WeakGreedySurrogate

Surrogate for the weak greedy error used when training the VKOGARegressor.

Not intended to be used directly.

Methods

evaluate

Evaluate the surrogate for given parameters.

extend

Extends the kernel interpolant by adding a new center and updating all quantities.

predict

evaluate(mus, return_all_values=False)[source]

Evaluate the surrogate for given parameters.

Parameters:
  • mus – List of parameters for which to estimate the approximation error. When parallelization is used, mus can be a RemoteObject.

  • return_all_values – See below.

Returns:

  • If return_all_values is True, an NumPy array of the estimated errors.

  • If return_all_values is False, the maximum estimated error as first

  • return value and the corresponding parameter as second return value.

extend(mu)[source]

Extends the kernel interpolant by adding a new center and updating all quantities.

Incrementally add a new center \(\mu\) with corresponding function value from the training data. This function updates

  • the selected centers (self._centers)

  • the indices of the selected centers in the training set (self._centers_idx)

  • the coefficients of the interpolant (self._coefficients)

  • the (squared) power function evaluations on the training set (self._power2)

  • the Newton basis evaluations on the training set (self._V)

  • the residual \(f - s_f\) at the training points (self.res)

  • the inverse of the Cholesky factor of the kernel matrix (self._C)

  • the coefficients in the Newton basis (self._z).

In the following derivation we leave out the regularization term for simplicity and restrict ourselves to scalar kernels. Let \(K_n\) denote the full kernel matrix for the current set of selected centers \(X_n\). Since \(K_n\) is a kernel matrix, it is in particular positive-definite, so it has a Cholesky decomposition, i.e. \(K_n=L_nL_n^\top\). The inverse of the Choleksy decomposition \(C_n := L_{n}^{-1}\) will be used to efficiently compute and update the coefficients of the kernel interpolant. The formula for the computation and update of \(C_n\) can be found below.

Updating the coefficients of the kernel interpolant: Let \(c_n\in\mathbb{R}^n\) denote the coefficients associated to the kernel interpolant for the first \(n\) selected centers (\(X_n\)), i.e.

\[K_n c_n = f_n,\]

where \(f_n\in\mathbb{R}^n\) corresponds to the vector of target values at the selected centers. The kernel interpolant is then given as

\[s_f^n(x) := \sum\limits_{i=1}^{n} (c_n)_i k(x, x_i),\]

where \(X_n=\{x_1,\dots,x_n\}\subset\mathbb{R}^d\) is the set containing the \(n\) selected centers. When adding \(\mu\) as new center, i.e. \(X_{n+1}=X_n\cup \{\mu\}\), the new coefficient vector \(c_{n+1}\in\mathbb{R}^{n+1}\) is given as

\[\begin{split}c_{n+1} = C_{n+1}^\top \begin{bmatrix} c_n \\ z_{n+1} \end{bmatrix}\end{split}\]

for an unknown coefficient \(z_{n+1}\in\mathbb{R}\), which is computed via the residual \(r_n\) and the power function \(P_n\) for the centers \(X_n\) evaluated at \(\mu\) (the definitions of these quantities are given below):

\[\begin{align*} z_{n+1} = \frac{r_n(\mu)}{P_n(\mu)}. \end{align*}\]

The residual \(r_n=f-s_f^n\) is defined as the difference between the target function \(f\) and the current kernel interpolant \(s_f^n\) and can be updated via the following recursion:

\[\begin{align*} r_n(x) = r_{n-1}(x) - z_n V_n(x) \end{align*}\]

with initial residual \(r_0 = f\). Here, \(V_n\) denotes the Newton basis for

\[\begin{align*} V(X_n) := \mathrm{span}\{k(\,\cdot\,,x_1), \dots, k(\,\cdot\,,x_n)\}. \end{align*}\]

The first Newton basis is given as

\[\begin{align*} V_{1}(x)= \frac{k(x, x_1)}{\sqrt{k(x_1, x_1)}}. \end{align*}\]

The subsequent Newton basis functions are computed via the Gram-Schmidt algorithm as

\[\begin{align*} V_{n+1}(x) = \frac{k(x, \mu) - \sum_{j=1}^n V_j(x) V_j(\mu)}{P_n(\mu)}. \end{align*}\]

Regarding the power function (measuring for \(x\in\mathbb{R}^d\) how well the current subspace can represent \(k(\,\cdot\,,x)\), i.e. the projection error in the reproducing kernel Hilbert space): The initial squared power function is given as

\[\begin{align*} P_0^2(x) = k(x, x) \end{align*}\]

and the incremental update of the squared power function \(P_n^2\) can be computed by

\[\begin{align*} P_n^2(x) = P_{n-1}^2(x) - V_n(x)^2. \end{align*}\]

We are going to track this quantity evaluated at all training points during the iterations of the greedy algorithm.

It remains the computation of the inverse of the Cholesky factor \(C_n\): The initial inverse Cholesky factor is given as

\[\begin{align*} C_1 = \frac{1}{V_1(\mu)} \end{align*}\]

and is updated via

\[\begin{split}C_{n+1} = \begin{bmatrix} C_n & 0 \\ -\frac{1}{V_{n+1}(\mu)}\bar{v}_{n}(\mu) C_n & \frac{1}{V_{n+1}(\mu)} \end{bmatrix},\end{split}\]

where we denote as \(\bar{v}_{n}(\mu)\in\mathbb{R}^n\) the vector \([V_{1}(\mu),\ldots,V_{n}(\mu)]\).

Parameters:

mu – Parameter (center) to add.

predict(X)[source]