pymor.algorithms.ml.vkoga.regressor¶
Module Contents¶
- class pymor.algorithms.ml.vkoga.regressor.VKOGARegressor(kernel=GaussianKernel(), criterion='fp', max_centers=20, tol=1e-06, reg=1e-12)[source]¶
Bases:
pymor.core.base.BasicObjectScikit-learn-style regressor using the
VKOGASurrogate.The regressor uses the
weak_greedyin itsfit-method to select centers according to the given criterion.The algorithm is described in [WH13] and [SH21].
- Parameters:
kernel – Kernel to use in the regressor. The kernel is assumed to have a scalar-valued output. For vector-valued outputs, the interpolant uses vector-valued coefficients, i.e., the prediction is computed as a linear combination of kernel evaluations with vector-valued weights. The interface of the kernel needs to follow the scikit-learn interface and in particular a
__call__-method for (vectorized) evaluation of the kernel and adiag-method for computing the diagonal of the kernel matrix are required. For convenience, a Gaussian kernel is provided inpymor.algorithms.ml.vkoga.kernels.criterion – Selection criterion for the greedy algorithm. Possible values are
'fp','f'and'p'.max_centers – Maximum number of selected centers in the greedy algorithm.
tol – Tolerance for the weak greedy algorithm.
reg – Regularization parameter for the kernel interpolation.
Methods
Fit VKOGA surrogate using pyMOR's weak greedy algorithm.
Predict the target for the input
X.- fit(X, Y)[source]¶
Fit VKOGA surrogate using pyMOR’s weak greedy algorithm.
- Parameters:
X – Training inputs.
Y – Training targets.
- Returns:
The trained regressor.
- predict(X)[source]¶
Predict the target for the input
X.- Parameters:
X – Input for which to compute the prediction.
- Returns:
Prediction obtained by the
VKOGASurrogate.
- class pymor.algorithms.ml.vkoga.regressor.VKOGASurrogate(kernel, X_train, F_train, criterion='fp', reg=1e-12)[source]¶
Bases:
pymor.algorithms.greedy.WeakGreedySurrogateSurrogate for the weak greedy error used when training the
VKOGARegressor.Not intended to be used directly.
Methods
Evaluate the surrogate for given parameters.
Extends the kernel interpolant by adding a new center and updating all quantities.
- evaluate(mus, return_all_values=False)[source]¶
Evaluate the surrogate for given parameters.
- Parameters:
mus – List of parameters for which to estimate the approximation error. When parallelization is used,
muscan be aRemoteObject.return_all_values – See below.
- Returns:
If
return_all_valuesisTrue, anNumPy arrayof the estimated errors.If
return_all_valuesisFalse, the maximum estimated error as firstreturn value and the corresponding parameter as second return value.
- extend(mu)[source]¶
Extends the kernel interpolant by adding a new center and updating all quantities.
Incrementally add a new center \(\mu\) with corresponding function value from the training data. This function updates
the selected centers (
self._centers)the indices of the selected centers in the training set (
self._centers_idx)the coefficients of the interpolant (
self._coefficients)the (squared) power function evaluations on the training set (
self._power2)the Newton basis evaluations on the training set (
self._V)the residual \(f - s_f\) at the training points (
self.res)the inverse of the Cholesky factor of the kernel matrix (
self._C)the coefficients in the Newton basis (
self._z).
In the following derivation we leave out the regularization term for simplicity and restrict ourselves to scalar kernels. Let \(K_n\) denote the full kernel matrix for the current set of selected centers \(X_n\). Since \(K_n\) is a kernel matrix, it is in particular positive-definite, so it has a Cholesky decomposition, i.e. \(K_n=L_nL_n^\top\). The inverse of the Choleksy decomposition \(C_n := L_{n}^{-1}\) will be used to efficiently compute and update the coefficients of the kernel interpolant. The formula for the computation and update of \(C_n\) can be found below.
Updating the coefficients of the kernel interpolant: Let \(c_n\in\mathbb{R}^n\) denote the coefficients associated to the kernel interpolant for the first \(n\) selected centers (\(X_n\)), i.e.
\[K_n c_n = f_n,\]where \(f_n\in\mathbb{R}^n\) corresponds to the vector of target values at the selected centers. The kernel interpolant is then given as
\[s_f^n(x) := \sum\limits_{i=1}^{n} (c_n)_i k(x, x_i),\]where \(X_n=\{x_1,\dots,x_n\}\subset\mathbb{R}^d\) is the set containing the \(n\) selected centers. When adding \(\mu\) as new center, i.e. \(X_{n+1}=X_n\cup \{\mu\}\), the new coefficient vector \(c_{n+1}\in\mathbb{R}^{n+1}\) is given as
\[\begin{split}c_{n+1} = C_{n+1}^\top \begin{bmatrix} c_n \\ z_{n+1} \end{bmatrix}\end{split}\]for an unknown coefficient \(z_{n+1}\in\mathbb{R}\), which is computed via the residual \(r_n\) and the power function \(P_n\) for the centers \(X_n\) evaluated at \(\mu\) (the definitions of these quantities are given below):
\[\begin{align*} z_{n+1} = \frac{r_n(\mu)}{P_n(\mu)}. \end{align*}\]The residual \(r_n=f-s_f^n\) is defined as the difference between the target function \(f\) and the current kernel interpolant \(s_f^n\) and can be updated via the following recursion:
\[\begin{align*} r_n(x) = r_{n-1}(x) - z_n V_n(x) \end{align*}\]with initial residual \(r_0 = f\). Here, \(V_n\) denotes the Newton basis for
\[\begin{align*} V(X_n) := \mathrm{span}\{k(\,\cdot\,,x_1), \dots, k(\,\cdot\,,x_n)\}. \end{align*}\]The first Newton basis is given as
\[\begin{align*} V_{1}(x)= \frac{k(x, x_1)}{\sqrt{k(x_1, x_1)}}. \end{align*}\]The subsequent Newton basis functions are computed via the Gram-Schmidt algorithm as
\[\begin{align*} V_{n+1}(x) = \frac{k(x, \mu) - \sum_{j=1}^n V_j(x) V_j(\mu)}{P_n(\mu)}. \end{align*}\]Regarding the power function (measuring for \(x\in\mathbb{R}^d\) how well the current subspace can represent \(k(\,\cdot\,,x)\), i.e. the projection error in the reproducing kernel Hilbert space): The initial squared power function is given as
\[\begin{align*} P_0^2(x) = k(x, x) \end{align*}\]and the incremental update of the squared power function \(P_n^2\) can be computed by
\[\begin{align*} P_n^2(x) = P_{n-1}^2(x) - V_n(x)^2. \end{align*}\]We are going to track this quantity evaluated at all training points during the iterations of the greedy algorithm.
It remains the computation of the inverse of the Cholesky factor \(C_n\): The initial inverse Cholesky factor is given as
\[\begin{align*} C_1 = \frac{1}{V_1(\mu)} \end{align*}\]and is updated via
\[\begin{split}C_{n+1} = \begin{bmatrix} C_n & 0 \\ -\frac{1}{V_{n+1}(\mu)}\bar{v}_{n}(\mu) C_n & \frac{1}{V_{n+1}(\mu)} \end{bmatrix},\end{split}\]where we denote as \(\bar{v}_{n}(\mu)\in\mathbb{R}^n\) the vector \([V_{1}(\mu),\ldots,V_{n}(\mu)]\).
- Parameters:
mu – Parameter (center) to add.