Greedy algorithms for optimal measurements selection in state estimation using reduced models

,


Introduction 1.State estimation from data in parametric PDEs
One typical recovery problem in a Hilbert space V is the following: we observe m measurements of an unknown element u ∈ V and would like to recover u up to some accuracy.Specifically, we observe where the i are independent continuous linear functionals over V .In this paper we consider a Hilbert space V and denote by • its norm and by •, • its inner-product.The knowledge of z = (z i ) i=1,...,m is equivalent to that of the orthogonal projection w = P Wm u, where and ω i ∈ V are the Riesz representers of the linear functionals i , that is Obviously, there are infinitely many v ∈ V such that P Wm v = w and the only way to recover u up to a guaranteed accuracy is to combine the measurements with some a-priori information on u.
One particularly relevant scenario is when u is a solution to some parameter-dependent PDE of the general form P(u, a) = 0, (1.4) where P is a differential operator, a is a parameter that describes some physical property and lives in a given set A. We assume for each a ∈ A, the problem is well posed, that is, there exists a solution u(a) ∈ V .Therefore, in such a scenario, our prior on u is that it belongs to the set M := {u(a) : a ∈ A}. (1.5) which is sometimes called the solution manifold.We assume that this manifold is a pre-compact set in V , which allows us to approximate it by finite dimensional spaces.For example, one could consider the diffusion equation on a given domain D with fixed right-side f and homogeneous Dirichlet boundary condition.Then with A a set of symmetric matrix valued functions such that for some 0 < r ≤ R < ∞, the solution u(a) is well defined in V = H 1 0 (D).Assuming in addition that A is compact in L ∞ (D), the compactness of M in V follows by standard continuity properties of the map a → u(a).

Reduced model based estimation
The above prior is generally not easily exploitable due to the fact that M does not have a simple geometry.For example it is not a convex set, which prevents classical convex optimization techniques when trying to recover u in such a set.On the other hand, the particular PDE structure often allows one to derive interesting approximation properties of the solution manifold M by linear spaces V n of moderate dimension n.Such spaces can for example be obtained through a scalar parametrization of a(y) of a where y = (y 1 , . . ., y d ), using polynomial approximations of the form where Λ n is a conveniently chosen set of multi-indices such that #(Λ n ) = n.See in particular [7,8] where convergence estimates of the form sup y∈U u(y) − u n (y) ≤ Cn −s , (1.9) are established for some s > 0 even when d = ∞.Thus, all solutions M are approximated in the space V n := span{v ν : ν ∈ Λ n }.Another typical instance is provided by reduced bases.In such approximations, one directly builds a family of n-dimensional spaces V n := span{v 1 , . . ., v n } with v i ∈ M, in such a way that all solutions in M are uniformly well approximated in V n .In particular the approximation rate compares favorably with that achieved by the best n-dimensional spaces, that is, the Kolmogorov n-width of M, see [4,9].The common feature of these reduced models is that they yield a hierarchy of spaces (V n ) n≥1 with dim(V n ) = n such that the solution manifold is uniformly well approximated in such spaces, in the sense that sup where ε n is some known tolerance.Therefore, one natural option is to replace M by the simpler prior class described by the cylinder .11)for some given n.This point of view is adopted in [12] and further analyzed in [5] where the optimal recovery solution u * (w) from the data w is characterized.This solution is defined as the center of the ellipsoid and equivalently given by It can be computed from the data w by solving a finite set of linear equations.The worst case performance for this reconstruction is given by where for any pair of closed subspaces (X, Y ) of V , we have set µ(X, Y ) := β(X, Y ) −1 , with β(X, Y ) the inf-sup constant It is also shown in [5] that β(V n , W m ) can be computed as the smallest singular value of the n × m cross-Gramian matrix with entries φ i , ψ j between any pair of orthonormal bases (φ i ) i=1,...,n and (ψ j ) j=1,...,m of V n and W m , respectively.
Remark 1.1 As already mentionned, the map u → u * (w) in (1.13) is linear.Conversely it can easily be checked that any linear recovery algorithm may be rewritten in the form of (1.13) for a certain space V n .On the other hand, let us note that these linear recovery methods apply to general solution manifolds that may result from nonlinear PDEs.

Optimal measurement selection
For a given reduced model space V n with accuracy ε n , one natural objective is therefore that µ(V n , W m ) is maintained of moderate size, with a number of measurements m ≥ n as small possible.
Note that taking W m = V n would automatically give the minimal value µ(V n , W m ) = 1 with m = n.However, in a typical data acquisition scenario, the measurements that comprise the basis of W m are chosen from within a limited class.This is the case for example when placing m pointwise sensors at various locations within the physical domain D.
We model this restriction by asking that the i are picked within a dictionary D of V , that is a set of linear functionals normalized according to which is complete in the sense that (v) = 0 for all ∈ D implies that v = 0.With an abuse of notation, we identify D with the subset of V that consists of all Riesz representers ω of the above linear functionals .With such an identification, D is set of functions normalized according to such that the finite linear combinations of elements of D are dense in V .Our task is therefore to pick {ω 1 , . . ., ω m } ∈ D in such a way that for some prescribed 0 < β * < 1, with m larger than n but as small as possible.In particular, we may introduce the minimal value of m such that there exists {ω 1 , . . ., ω m } ∈ D satisfying (1.16).
We start §2 with a discussion of the typical range of m * compared to n.We first show the following two "extreme" results: • For any V n and D, there exists β * > 0 such that m * = n, that is, the inf-sup condition (1.16) holds with the minimal possible number of measurements.However this β * could be arbitrarily close to 0.
• For any prescribed β * > 0 and any model space V n , there are instances of dictionaries D such that m * is arbitrarily large.
We then discuss particular cases of relevant dictionaries for the particular space V = H 1 0 (D), with inner product and norms (1.18) motivated by the aforementioned example of parametric elliptic PDEs.These dictionaries model local sensors, either as point evaluations (which is only when D is univariate) or as local averages.
In such a case, we provide upper estimates of m * in the case of spaces V n that satisfy some inverse estimates, such as finite element or trigonometric polynomial spaces.The optimal value m * is proved to be of the same order as n when the sensors are uniformly spaced.This a-priori analysis is not possible for more general spaces V n such as reduced basis spaces, which are preferred to finite element spaces due to their advantageous approximation properties.
For such general spaces, we need a strategy to select the measurements.In practice, V is of finite but very large dimension and D is of finite but very large cardinality (1.19) For this reason, the exhaustive search of the set {ω 1 , . . ., ω m } ⊂ D maximizing β(V n , W m ) for a given m > 1 is out of reach.One natural alternative is to rely on greedy algorithms where the ω j are picked incrementally.
Our starting point to the design of such algorithms is the observation that (1.16) is equivalent to having Therefore, our objective is to construct a space W m spanned by m elements from D that captures all unit norm vectors of V n with the prescribed accuracy σ * < 1.This leads us to study and analyze algorithms which may be thought as generalization to the well-studied orthogonal matching pursuit algorithm (OMP), equivalent to the algorithms we study here when applied to the case n = 1 with a unit norm vector φ 1 that generates V 1 .Two algorithms are proposed and analyzed in §3 and §4, respectively.In particular, we show that they always converge, ensuring that (1.16) holds for m sufficiently large, and we also give conditions on D that allow us to a-priori estimate the minimal value of m where this happens.We close our paper in §5 with numerical experiments that illustrate the ability of our greedy algorithms to pick good points.In particular, we return to the case of dictionaries of point evaluations or local averages, and show that the selection performed by the greedy algorithms is near optimal in the sense that it achieves (1.16) after a number of iteration of the same order as that established for m * in the results of §2 when V n is a trigonometric polynomial space.We also illustrate the interest of the greedy selection of the measurement for the reduced basis spaces.We close by some remarks on the relevance of the method in the case of hyperbolic PDEs, for which reduced basis approximation is known to be less effective than for elliptic or parabolic problems due the presence of shocks.

Remark 1.2
The problem of optimal placement of sensors, which corresponds to the particular setting where the linear functionals are point evaluations or local averages as in §2, has been extensively studied since the 1970's in control and systems theory.In this context, the state function to be estimated is the realization of a Gaussian stochastic process, typically obtained as the solution of a linear PDE with a white noise forcing term.The error is then measured in the mean square sense, rather than in the worst case performance sense (1.14) which is the point of view adopted in our work.The function to be minimized by the sensors locations is then the trace of the error covariance, while we target at minimizing the inverse inf-sup constant µ(V n , W ). See in particular [3] where the existence and characterization of the optimal sensor location is established in this stochastic setting.Continuous optimization algorithms have been proposed for computing the optimal sensor location, see e.g.[1,6,16].One common feature with our approach is that the criterion to be minimized by the optimal location is non-convex, which leads to potential difficulties when the number of sensors is large.This is our main motivation for introducing a greedy selection algorithm, which in addition allows us to consider more general dictionaries.

A-priori analysis of measurement selection strategies
The goal of this section is to present several results that provide with estimates on the inf-sup constant β(V n , W m ).We first discuss general dictionaries D and approximation spaces V n , in which case it is not possible to control how large m has to be compared to n in order to ensure a prescribed inf-sup constant.We then show that this goal can be met with m proportional to n, for more specific choices of dictionaries and approximation spaces.

Two extreme results
The following result shows that one can always achieve a positive inf-sup constant β(V n , W n ) using a minimal number of measurement m = n, however there are no guarantees on the lower limit of the inf-sup constant, other that it is greater than zero.
Theorem 2.1 Given a space V n of dimension n and any complete dictionary D, there exists a selection {ω 1 , . . ., ω n } from D such that β(V n , W n ) > 0.
Proof: We define the ω i inductively, together with certain functions φ i that constitute a basis of V n .We first pick any element φ 1 ∈ V n of unit norm.Since the dictionary is complete, there exists With such a selection procedure, we find that the cross-Gramian matrix ( ω i , φ j ) i,j=1,...,n is lowertriangular with non-zero diagonal entries.These both show that {ω 1 , . . ., ω n } and {φ 1 , . . ., φ n } are bases, and that there is no non-trivial element of V n that is orthogonal to W n , which means that On the negative side, the following result shows that for a general space V n and dictionary D, there is no hope to control the value of β(V n , W m ) by below, even for arbitrarily large m (here we assume that V is infinite dimensional).
Theorem 2.2 Given any space V n of dimension n > 0, any ε > 0 and any m > 0, there exists a dictionary D such that for any selection {ω 1 , . . ., ω n } from D, one has β(V n , W n ) ≤ ε.
Proof: It suffices to prove the result for n = 1 since enriching the space V n has the effect of lowering the inf-sup constant.We thus take V 1 = Rφ for some φ ∈ V of unit norm, and we take for D an orthonormal basis of V .By appropriate rotation, we can always choose this basis so that

Pointwise evaluations
In this section, we consider the particular dictionary that consists of all point evaluation functionals δ x for x ∈ D where D is some domain of R d .This means that the Hilbert space V should be a reproducing kernel Hilbert space (RKHS) of functions defined on D, that is a Hilbert space that continuously embeds in C(D).Examples of such spaces are the Sobolev spaces H s (D) for s > d/2, possibly with additional boundary conditions.As a simple example, motivated by parametric second order elliptic PDEs, for a bounded univariate interval I we consider V = H 1 0 (I) which is continuously embedded in C(I).Without loss of generality we take I =]0, 1[.For every x ∈]0, 1[, the Riesz representer of δ x is given by the solution of ω = δ x with zero boundary condition.Normalising this solution ω it with respect to the V -norm (1.18), we obtain For any set of m distinct points 0 < x 1 < • • • < x m < 1, the associated measurement space W m = span{ω x 1 , . . ., ω xm } coincides with the space of piecewise affine polynomials with nodes at x 1 , . . ., x m that vanish at the boundary.Denoting x 0 := 0 and x m+1 := 1, we have (2.4) As an example for the space V n , let us consider the span of the Fourier basis (here orthonormalized in V ), With such choices, we can establish a lower bound (1.16) on β(V n , W m ) with a number of measurements m * that scales linearly with n, when using equally spaced measurements, as shown by the following result.

6)
Proof: We introduce the interpolation operator I Wm : V → W m , so that the projection error in the V norm is bounded by with h := max 0≤k≤m |x k+1 −x k |.The constant 1 π in the second inequality is optimal for the interpolation error and can be derived from the Poincaré inequality g ]) for g .On the other hand, one has the inverse estimate and therefore (2.9) When using equally spaced measurements, we have h = (m + 1) −1 and therefore Bearing in mind that β(V n , W m ) = 0 for m < n, it is fair to say that the estimate (2.11) gives a relatively sharp estimation of the minimal m which is required.Remark 2.4 The interplay between pointwise evaluation and Fourier reconstruction has been the object of much attention in the area of sparse recovery and compressed sensing.It is known in particular that, with high probabiity, trigonometric polynomials of degree N which are n-sparse, can be exactly reconstructed from their sampling at m randomly chosen points according to the uniform measure, if m is larger than n by some logarithmic factors.We refer to [13] for an instance of such results.In our case the setting is different since we are searching for a recovery in the fixed trigonometric space V n , which allows us to have m exactly proportional to n.

Local averages
We return to the situation where D is a general domain in R d .Since real world pointwise sensors have a non-zero point spread, it is natural to model them by linear functionals that are local averages rather than point evaluations, that is where for some fixed radial function ϕ compactly supported in the unit ball B = {|x| ≤ 1} of R d and such that ϕ = 1, and τ > 0 representing the point spread.Here, we assume in addition that ϕ belongs to H 1 0 (B).Taking the measurement point x at a distance at least τ from the boundary of D, we are ensured that the support of ϕ τ (• − x) is fully contained in D.
Note that in several space dimension d > 1, the point evaluation functionals are not continuous on H 1 0 (D), however the above local averages are.We may therefore use these functionals in the case where V = H 1 0 (D), in arbitrary multivariate dimension.The corresponding Riesz representers ω x,τ ∈ V are the solutions to with homogeneous Dirichlet boundary conditions.
For simplicity we work in dimension d ≤ 3. We consider measurement points x 1 , . . ., x m that are uniformly spaced, in the sense that they are at the vertices of a quasi-uniform mesh T h for the domain D with meshsize h > 0. Therefore (2.15) We are interested in the approximation properties of the corresponding space W m .We cannot hope for an estimate similar to (2.7) since these approximation properties should also depend on τ > 0 This is reflected by the following direct estimate.
Lemma 2.5 One has the estimate for the projection error in the V -norm where C is independent of (h, τ ).
Proof: We shall establish the following estimate in (2.17) Then, (2.16) immediately follows by application of (2.17) to w = ∆v.In order to prove (2.17), we first introduce the nodal P 1 finite element basis ψ i associated to the mesh points x i .So, under some reasonable geometric assumptions on D (piecewise smoothness of its boundary), we have the interpolation error estimate (2.18) Here we have used the fact that d ≤ 3 in order to be able to apply point evaluation of the elements of H 2 (D).We introduce the dual space which differs from H −2 (D), and obtain by a duality argument that (2.20) In order to derive (2.17), we note that g x i ,τ = δ x i * ϕ τ , from which it follows that for any w ∈ L 2 (D), Here, the convolution of w is meant in the sense where w is the extension of w by 0 outside of D.
For the first term on the right side of (2.21), we write We now remark that the extension ψ → ψ by 0 outside of D is continuous from V to H 1 (R d ).
Thus, for ψ ∈ V , we may write up to a change in the constant C in the last inequality, and where the second inequality is a standard estimate for regularization by convolution.It follows that (2.24) For the second term on the right side of (2.21), we write where we have used (2.20).Using again the extension ψ of ψ by 0 outside of D, we write where the first inequality is straightforward by Fourier transform, and the second inequality follows from the definition of ϕ τ by scaling.It follows that By summation of (2.24) and (2.26), we reach (2.17). 2 Remark 2.6 The estimate (2.16) deteriorates for too small or too large τ .The choice τ ∼ h gives the optimal approximation order O(h).
We may use the above approximation result to estimate the number of local average measurements needed to control the inf-sup constant β(V n , W m ), provided that V n satisfies some inverse estimate.As an example, consider the case where D = [0, 1] d and the space V n is the span of the Fourier basis Combining the direct estimate (2.16), with an inverse estimate, we obtain (2.28) Using m = J d equally spaced measurements at points on the tensorized grid we find that h ∼ m −1/d and therefore This shows that β(V n , W m ) ≥ β * can be achieved provided that where In this analysis, the number m of required measurement deteriorates for large or small values of τ , due to the deterioration of the approximation estimate already noted in Remark 2.6.This fact will be confirmed in the numerical experiments given in §5.2.In the case where τ is of the optimal order of τ ∼ h ∼ m −1/d , the above condition becomes C(n/m) be achieved with a number of measurement m * that scales linearly with n, similar to univariate pointwise evaluation described by Theorem 2.3.Note that we are not able recover the results on univariate pointwise evaluation from the results on local averages, since the estimate (2.30) deteriorate as τ → 0 even when d = 1.This hints that there is some room for improvement in this estimate.

A collective OMP algorithm
In this section we discuss a first numerical algorithm for the incremental selection of the spaces W m , inspired by the orthonormal matching pursuit (OMP) algorithm which is recalled below.More precisely, our algorithm may be viewed as applying the OMP algorithm for the collective approximation of the elements of an orthonormal basis of V n by linear combinations of m members of the dictionary.
Our objective is to reach a bound (1.20) for the quantity σ m .Note that this quantity can also be written as that is, σ m is the spectral norm of I − P Wm restricted to V n .

Description of the algorithm
When n = 1, there is only one unit vector φ 1 ∈ V 1 up to a sign change.A commonly used strategy for approximating φ 1 by a small combination of elements from D is to apply a greedy algorithm, the most prominent one being the orthogonal matching pursuit (OMP): we iteratively select where W k−1 := span{ω 1 , . . ., ω k−1 } and W 0 := {0}.In practice, one often relaxes the above maximization, by taking ω k such that for some fixed 0 < κ < 1, for example κ = 1 2 .This is known as the weak OMP algorithm, but we refer to it as OMP, as well.It has been studied in [2,10], see also [14] for a complete survey on greedy approximation.
For a general value of n, one natural strategy is to define our greedy algorithm as follows: we iteratively select Note that in the case n = 1, we obtain the original OMP algorithm applied to φ 1 .
As to the implementation of this algorithm, we take (φ 1 , . . ., φ n ) to be any orthonormal basis of V n .Then Therefore, at every step k, we have which amounts to a stepwise optimization of a similar nature as in the standard OMP.Note that, while the basis (φ 1 , . . ., φ n ) is used for the implementation, the actual definition of the greedy selection algorithm is independent of the choice of this basis in view of (3.3).It only involves V n and the dictionary D. Similar to OMP, we may weaken the algorithm by taking ω k such that n i=1 for some fixed 0 < κ < 1.
For such a basis, we introduce the residual quantity This quantity allows us to control the validity of (1.16) since we have and therefore (1.16) holds provided that r m ≤ σ 2 = 1 − γ 2 .m simply expresses the fact that the Hilbert-Schmidt norm controls the spectral norm.On the other hand, in dimension n, the Hilbert-Schmidt norm can be up to n 1/2 times the spectral norm.This lack of sharpness is one principle limitation in our convergence analysis which uses the fact that we can estimate the decay of r m , but not directly that of σ m .

Convergence analysis
By analogy to the analysis of OMP provided in [10], we introduce for any Ψ = (ψ 1 , . . ., ψ n ) ∈ V n the quantity This quantity is a norm on the subspace of V n on which it is finite.
Given that Φ = (φ 1 , . . ., φ n ) is any orthonormal basis of V n , we write This quantity is indeed independent on the orthonormal basis Φ: if Φ = ( φ1 , . . ., φn ) is another orthonormal basis, we have Φ = U Φ where U is unitary.Therefore any representation Φ = ω∈D c ω ω induces the representation with the equality One important observation is that if Φ = (φ 1 , . . ., φ n ) is an orthonormal basis of V n and if Φ = ω∈D c ω ω, one has Therefore, we always have Using the quantity J(V n ), we can generalize the result of [10] on the OMP algorithm in the following way.
Theorem 3.2 Assuming that J(V n ) < ∞, the collective OMP algorithm satisfies

.4)
Proof: For m = 0, we have We then write On the other hand, if Φ = ω∈D c ω ω, we have which by Cauchy-Schwarz inequality implies and therefore This implies that from which (3.4) follows by the same elementary induction argument as used in [10]. 2 Remark 3.3 Note that the right side of (3.4), is always larger than n(m+1) −1 , which is consistent with the fact that β(V n , W m ) = 0 if m < n.
One natural strategy for selecting the measurement space W m is therefore to apply the above described greedy algorithm, until the first value m = m(n) is met such that β(V n , W m ) ≥ γ.According to (3.4), this value satisfies For a general dictionary D and space V n we have no control on the quantity J(V n ) which could even be infinite, and therefore the above result does not guarantee that the above selection strategy eventually meets the target bound β(V n , W m ) ≥ γ.In order to treat this case, we establish a perturbation result similar to that obtained in [2] for the standard OMP algorithm.
One obviously has Ψ 1 (D) < ∞, and therefore one has 4 for m ≥ m(δ) sufficiently large.It follows that r m ≤ δ for m ≥ m(δ). 2 The above corollary shows that if γ > 0, one has β(V n , W m ) ≥ γ for m large enough.
Remark 3.6 One alternative strategy to select the measurements could be to apply the OMP algorithm separately on each basis element φ j .This leads for each j ∈ {1, . . ., n} to the selection of ω 1,j , . . ., ω m j ,j ∈ D and to a space The following argument shows that, even when optimizing the choice of the number of iterations m j used for each basis element, this strategy leads to convergence bounds that are not as good as those achieved by the collective OMP algorithm.Since the residual satisfies we optimize by choosing m j := m φ j 1 (D) ( n j=1 φ j 1 (D) ) −1 .This leads to the convergence bound This bound is not as good as (3.4), since we have Remark 3.7 In the case n = 1, a well-known variant to the OMP algorithm, also discussed in [2], is the so-called relaxed greedy algorithm.This variant avoids the computation of the projection onto W k : the approximation of φ 1 is updated by where ω k is selected by maximizing weights.This strategy is generalized in [11] to the collective setting (with α k = 1 − (k + 1) −1 and β k minimizing the norm of the residual), and is proved to achieve similar convergence properties as the collective OMP algorithm.

A worst case OMP algorithm
We present in this section a variant of the previous collective OMP algorithm.In our numerical experiments presented in §5 this variant performs better than the collective OMP algorithm, however its analysis is more delicate.In particular we do not obtain convergence bounds that are as good.

Description of the algorithm
We first take the vector in the unit ball of V n that is less well captured by W k−1 and then define ω k by applying one step of OMP to this vector, that is for some fixed 0 < κ < 1.
Remark 4.1 A variant to this algorithm was priorily suggested in [12] in the particular case where the dictionary consists of the ω x,τ in (2.14) associated to the local average functionals in (2.12) with ϕ τ a Gaussian of fixed width (see algorithm 2, called SGreedy, therein).In this variant, the selection is done by searching for the point x k where |v k −P W k−1 v k | takes its maximum, and taking There is no evidence provided that this selection process has convergence properties similar to those that we prove next for the selection by (4.2).

Convergence analysis
The first result gives a convergence rate of r m under the assumption that J(V n ) < ∞, similar to Theorem 3.2, however with a multiplicative constant that is inflated by n 2 .
Theorem 4.2 Assuming that J(V n ) < ∞, the worst case OMP algorithm satisfies Proof: As in the proof of Theorem 3.2, we use the inequality and try to bound n i=1 | φ i − P W m−1 φ i , ω m | 2 from below.For this purpose, we write We have v m = n i=1 a j φ j for a vector a = (a 1 , . . ., a n ) such that a 2 = 1.Thus, if Φ = ω∈D c ω ω we may write and therefore Next we find by Cauchy-Schwartz that We have thus obtained the lower bound from which we conclude in a similar way as in Theorem 3.2. 2 For the general case, we establish a perturbation result similar to Theorem 3.4, with again a multiplicative constant that depends on the dimension of V n .Theorem 4.3 Let Φ = (φ 1 , . . ., φ n ) be an orthonormal basis of V n and Ψ = (ψ 1 , . . ., ψ n ) ∈ V n be arbitrary.Then the application of the worst case OMP algorithm on the space V n gives r m ≤ 4 where Proof: We introduce for which we have We next write, as in the proof of Theorem 4.2, for a vector a = (a 1 , . . ., a n ) such that a 2 = 1.If Ψ = ω∈D c ω ω, using (4.4), we now reach By Cauchy-Schwartz and Young inequality, the second term is bounded by 1 2 r m−1 + 1 2 n 2 Φ − Ψ 2 .By subtracting, we thus obtain Proceeding in a similar way as in the proof of Theorem 4.2, we obtain the lower bound and we conclude in the same way as in the proof of Theorem 3.4.2 By the exact same argument as in the proof of Corollary 3.5 we find that that the worst case OMP converges for any space V n , even when J(V n ) is not finite.
Corollary 4.4 For any n dimensional space V n , the application of the worst case OMP algorithm on the space V n gives that lim m→+∞ r m = 0.

Application to point evaluation
Let us now estimate m(n) if we choose the points with the greedy algorithms that we have introduced.This boils down to estimate for J(V n ).In this simple case, Thus, using the basis functions φ k defined by (2.5), we have Estimate (3.5) for the convergence of the collective OMP approach yields while for the worst case OMP, estimate (4.3) gives As already anticipated, these bounds deviate from the optimal estimation (2.11) due to the use of the Hilbert-Schmidt norm in the analysis.The numerical results in the next section reveal that greedy algorithms actually behave much better in this case.

Numerical tests
All numerical tests presented in this section were performed in Python using the standard NumPy and SciPy packages.

Pointwise evaluation
In this test we consider the setting as outlined in §2.2.That is, we have V = H 1 0 (I), where the interval I =]0, 1[, the approximation space is the Fourier basis, V n = span{φ k : k ∈ 1, . . ., n} where φ k (t) = √ 2 πk sin(kπt), and we pick the measurement functionals that define the space W m from a dictionary of pointwise evaluations in ]0, 1[.The dictionary D is thus composed of elements ω x , the normalized Riesz representers of function evaluation at the point x, given explicitly in (2.3).
The selection of the evaluation points performed both using the collective OMP and worst-case OMP algorithms.The inner product between ω x and φ k can be computed using an explicit formula, hence there is no discretization or approximation in this aspect of our implementation.In both algorithms we require a search of the dictionary D to find the element that maximizes either (3.2) or (4.2).Evidently we require D to be of finite size to perform this search, so we use where we take M to be some large number.In practice we found that M = 10 4 offered similar results to any larger number, so we kept this value.
We compare the inf-sup constant β(V n , W m ) for the evaluation points picked by these two OMP algorithms with those obtained when these points are picked at random with uniform law on ]0, 1[.In all cases, the selected points are nested as we increase m ≥ n, that is, W m ⊂ W m+1 , so that β is monotone increasing.We also compare with the value of β(V n , W m ) obtained for equally spaced points, which in view of Theorem 2.3 are expected to be a near optimal choice, but are not nested.Results in Fig 5 .1 show the behavior of β(V n , W m ) as m increases for two representative of n.They reveal that the worst-case OMP algorithm produces a slightly better behaved inf-sup constant than the collective OMP algorithm, not far from the near optimal value obtained with evenly spaced evaluations.In contrast, the random selection is clearly suboptimal.We note that the obtained curves (solid lines) are not monotone increasing, due to the fact that the selected points for different values of n have no nestedness properties.An alternative strategy that leads to a nested sequence consist in intertwining the greedy algorithm with the growth of V n : assuming that we have selected m(n) points such that β(V n , W m(n) ) > β * , we apply the greedy algorithm for V n+1 to enrich W m(n) until we reach the first value m(n + 1) such that β(V n+1 , W m(n+1) ) > β * .We may again use either the collective or worst-case OMP algorithm.In this fashion we construct a sequence of W m(1) ⊂ . . .⊂ W m(n) ⊂ . . .that pair with each V n and always ensure an inf-sup constant larger than β * .This strategy bears similarity to the generalized empirical interpolation method [15] where, at each step, one adds a new function to V n and a new linear functional to W m .In that case, we always have m = n, but no theoretical guarantee that β(V n , W n ) remains bounded away from zero.
The resulting curves are also plotted on Fig 5.2 (dashed lines).We see that we do not pay a significant penalty, as m(n) is not significantly worse for this incremental method that when W m is built from nothing for each V n .We again find that the worst-case algorithm is slightly superior to the collective algorithm.

Local averages
In this test we perform the collective and worst-case OMP algorithms on dictionaries of localaverages.Here we build our measurement spaces W m against a Fourier space V n , working on the unit interval I =]0, 1[ as in §5.1.
The dictionary D is the collection of Riesz representers ω x,τ of M local averages of width τ , for some points y 1 , . . ., y n ∈ [0, 1], and therefore a space of piecewise constant functions on the intervals [y i , y i+1 ], assuming that these points have been increasingly ordered.By taking a y to be the midpoint of the largest of such intervals, that has length larger than n −1 , it is easily checked that χ [0,y] − P Vn χ [0,y] ≥ 1 2 n −1/2 , (5.9) and therefore the approximation rate is not better than with the Fourier space.More generally it can be checked that the Kolmogorov n-width of M in V decays like n −1/2 , that is, any linear approximation method cannot have a better rate.This is illustrated by the left side of Figure 5.6, which shows the slow decay of the approximation error for the reduced basis spaces, slightly worse that when using Fourier spaces.It is of course still possible to apply the collective and worst case OMP algorithm in order to select measurements in this setting.Here, we use a dictionary of local averages which are continuous linear functionals on L 2 .The behaviour of the inf-sup constant is displayed on the right-side of Figure 5.6, and shows that the greedy algorithm performs well, for both Fourier and reduced bases.However the recovery performance is affected by the fact that both of these linear spaces have poor approximation properties over the class M.More generally, we stress that for this class of problems, linear recovery methods can be highly suboptimal.Consider for example the case where the data are given by a single measurement in the form of the global average.(5.11) Then, we find that ( χ [0,y] ) = y, so that an optimal reconstruction map from w = (u) that gives exact recovery is simply given by A * (w) = χ [0,w] . (5.12) This map is non-linear since χ [0,w 1 ] + χ [0,w 2 ] obviously differs from χ [0,w 1 +w 2 ] .

Remark 3 . 1 quantity r 1 / 2 m
The is the Hilbert-Schmidt norm of the operator I − P Wm restricted to V n .The inequality σ m ≤ r 1/2

((Figure 5 . 1 :
Figure 5.1: Comparisons of collective OMP, worst-case OMP, uniformly spaced and equally spaced point selection, using a Fourier basis V n , with n = 20 or n = 40, and increasing values of m ≥ n.

Figure 5 . 2 :
Figure 5.2: Minimum m = m(n) required for β(V n , W m) > β * = 0.5 against a variety of n, using collective OMP and worst-case OMP, and intertwining the growth of V n and W m (dashed).

Fig 5 . 2
Fig 5.2 displays the minimum value m = m(n) required to make β(V n , W m) > β * := 0.5 for a variety of n.It shows a clear linear trend, with the rate of increase almost equal to 1 for the worst-case OMP algorithm.This shows that the estimates on m(n) obtained in §4.3 from our theoretical results are in this case too pessimistic.We note that the obtained curves (solid lines) are not monotone increasing, due to the fact that the selected points for different values of n have no nestedness properties.An alternative strategy that leads to a nested sequence consist in intertwining the greedy algorithm with the growth of V n : assuming that we have selected m(n) points such that β(V n , W m(n) ) > β * , we apply the greedy algorithm for V n+1 to enrich W m(n) until we reach the first value m(n + 1) such that β(V n+1 , W m(n+1) ) > β * .We may again use either the collective or worst-case OMP algorithm.In this fashion we construct a sequence of W m(1) ⊂ . . .⊂ W m(n) ⊂ . . .that pair with each V n and always ensure an inf-sup constant larger than β * .This strategy bears similarity to the generalized empirical interpolation method[15] where, at each step, one adds a new function to V n and a new linear functional to W m .In that case, we always have m = n, but no theoretical guarantee that β(V n , W n ) remains bounded away from zero.The resulting curves are also plotted on Fig 5.2 (dashed lines).We see that we do not pay a significant penalty, as m(n) is not significantly worse for this incremental method that when W m is built from nothing for each V n .We again find that the worst-case algorithm is slightly superior to the collective algorithm.