CONSISTENT ESTIMATION OF A CONVEX DENSITY AT THE ORIGIN

Motivated by Hampel’s birds migration problem, Groeneboom, Jongbloed, and Wellner (2001b) established the asymptotic distribution theory for the nonparametric Least Squares and Maximum Likelihood estimators of a convex and decreasing density, g0, at a fixed point t0 > 0. However, estimation of the distribution function of the birds’ resting times involves estimation of g′ 0 at 0, a boundary point at which the estimators are not consistent. In this paper, we focus on the Least Squares estimator, g̃n. Our goal is to show that consistent estimators of both g0(0) and g ′ 0(0) can be based solely on g̃n. Following the idea of Kulikov and Lopuhaä (2006) in monotone estimation, we show that it suffices to take g̃n(n −α) and g̃′ n (n−α), with α ∈ (0, 1/3). We establish their joint asymptotic distributions and show that α = 1/5 should be taken as it yields the fastest rates of convergence.


Introduction
Suppose that we are interested in estimating the distribution function of resting periods of migrating birds in a certain habitat area. We denote the true distribution by F 0 . The birds are marked individually so that they can be identified, and some of them are captured in mist nets with a certain probability. lim t→∞ f 0 (t) = 0. Those birds that are caught exactly twice yield a distribution of observed (minimum) resting periods.
The question is: What is the relationship between the observed resting period distribution to the true one? Assuming that f 0 admits a finite second moment, and if g 0 is the density of the observed spans between two capture dates, Hampel (1987) showed that this relationship is given by where µ 2 = ∞ 0 t 2 f 0 (t)dt. Inverting (1.1) yields at any continuity point t > 0 of F 0 where g ′ 0 (0) denotes here the right derivative of g 0 at 0. See also Anevski (2003) for an explicit derivation. Groeneboom, Jongbloed, and Wellner (2001b) considered two nonparametric estimators of g 0 : The Least Squares and Maximum Likelihood estimators, denoted thereafter by LSE and MLE. Under the assumption that g ′′ 0 is continuous in the neighborhood of a fixed point t 0 with g ′′ 0 (t 0 ) = 0, they showed that these estimators are piecewise linear, and   n 2 5 ḡ n (t 0 ) − g 0 (t 0 ) whereḡ n denotes either one of the estimators, H is the "invelope" of the process Y (t) = t 0 W (s)ds + t 4 , t ≥ 0 0 t W (s)ds + t 4 t < 0 (1.3) in the following sense: H is an almost surely unique random process satisfying: if H (2) changes slope at t. In view of (ii), the condition (iii) can be also rewritten as ∞ −∞ (H(t) − Y (t))dH (3) (t) = 0 (see Groeneboom, Jongbloed, and Wellner (2001a)). Here, W is a standard two-sided Brownian motion starting at 0. The constants c 0 (t 0 ) and c 1 (t 0 ) are given by c 0 (t 0 ) = 1 24 (g 0 (t 0 )) 2 g ′′ 0 (t 0 ) 1 5 , c 1 (t 0 ) = 1 24 3 g 0 (t 0 )(g ′′ 0 (t 0 )) 3 1 5 . (1.4) The asymptotic result in (1.2) does not apply anymore when t 0 is replaced by 0. To study the behavior of the estimators near this boundary point, we focus in this paper on the LSE. The MLE can be handled very similarly, but it involves much more cumbersome calculations, due mainly to the nonlinear nature of the characterization of this estimator.
The LSE enjoys the property of preserving the shape restrictions of the estimated density g 0 , and any consistent estimator of g ′ 0 (0) will yield an estimator of F 0 , that is a genuine distribution function (−g ′ n is nondecreasing) and pointwise consistent if g 0 is differentiable on (0, ∞). This follows from Lemma 3.1 of Groeneboom, Jongbloed, and Wellner (2001b).
To achieve consistency at 0, methods based on boundary corrections of kernel density estimators have been suggested (see e.g. Jones (1993) and the references therein). On the other hand, Cheng (1997) showed how local linear density estimators adjust automatically for boundary effects without requiring any further correction. Consistent estimation of higher derivatives can be achieved using local polynomial density estimators as it was shown by Cheng, Fan and Marron (1997). For estimating g ′ 0 (0), where g 0 is a compactly supported density on [0, 1] assumed to be twice differentiable, their Theorem 1 implies that n −1/5 is the optimal rate of convergence, and is achieved by their local quadratic smoother provided that the kernel function and its derivative are bounded on [0, 1].
In this paper, the goal is to answer the following question: Would it be possible, using only the already computed LSE, to construct a consistent estimator of g ′ 0 (0)? This is possible by following the same idea of Kulikov and Lopuhaä (2006) of "staying away of 0" which they used to achieve consistency at 0 in monotone estimation. More precisely, we show that for α ∈ (0, 1/3),g n (n −α ) andg ′ n (n −α ) are consistent estimators of 0, and that α = 1/5 yields the fastest rates of convergence.
The paper will be structured as follows: We recall in the next section that the LSE fails to be consistent at 0, and show that its first derivative is not even bounded in probability. In Section 3, we review the characterization of the LSE, give preliminary results that we will use in the main proof and explain via heuristic arguments where the main difficulty lies. In Section 4, we present the three joint asymptotic regimes ofg n (n −α ) andg ′ n (n −α ) depending on whether α ∈ (0, 1/5), α = 1/5 or α ∈ (1/5, 1/3). The optimal rates of convergence, n −2/5 and n −1/5 , are attained by our modified LSE estimator, which is obtained by replacing the original LSE, on the shrinking neighborhood [0, n −1/5 ], by tangential extrapolation from the point n −1/5 to the left. To illustrate the theory, we present in Section 5 numerical results based on simulations, and finish off with conclusions and some open questions. The proof of our main result is given in Section 6. As the cases α = 1/5 and α ∈ (1/5, 1/3) share many similarities in the proof, we show the result only for α = 1/5 and indicate at the end of Section 6 how the proof can be readapted in the other case. We also give heuristic arguments for α ∈ (0, 1/5), for which the limiting distribution depends on the same process in the estimation problem considered by Groeneboom, Jongbloed, and Wellner (2001b) at an interior point.
Throughout the paper, we assume that the true convex density g 0 is twice continuously differentiable at 0, and that g ′′ 0 (0) > 0.

Inconsistency of the Least Squares estimator
Given X 1 , · · · , X n n i.i.d. observations from a convex and nonincreasing density g 0 and G n their corresponding empirical distribution, the LSEg n is defined as the unique minimizer of the quadratic criterion over the class of square integrable convex and nonincreasing functions on (0, ∞). Groeneboom, Jongbloed, and Wellner (2001b) established that the LSE exists, is piecewise linear, and that it is a density although the minimization is not restricted to the space of densities. Furthermore, ifH n (t) = t 0 (t − s)g n (s)ds and Y n (t) = t 0 (t − s)dG n (s), then the nonincreasing and convex piecewise linear functiong n is the LSE of g 0 if and only if (2.6) By the inequality condition in (2.6), any jump location τ is a point whereH n −Y n reaches its minimum. Therefore, the derivatives ofH n and Y n at τ have to be equal, that is whereG n is the cumulative distribution function corresponding tog n .
We show now thatg ′ n (0) is not a consistent estimator of g ′ 0 (0). This is a consequence of the following proposition.
Lemma 2.1 If Y 1 and Y 2 are two independent standard Exponentials, then Hence,g n (0) is not consistent.

Local perturbation functions
The characterization ofg n given in (2.6) follows from taking the directional derivative of the quadratic functional Φ n defined in (2.5) in the direction of the perturbation function s → (t − s) + , where z + is the usual notation of the positive part of a z. To make sure that the resulting functiong n (s) + ǫ(t − s) + for small ǫ belongs to class of convex functions, ǫ is only allowed to take positive values when t is not necessarily a knot ofg n . This leads to the inequality part of the characterization in (2.6). When t is a knot, then small negative values of ǫ are allowed and the directional derivative is equal to 0. This in turn yields the equality condition in (2.6).
The knots ofg n , or equivalently the jump points ofg ′ n , are defined only implicitly in the sense that there exists no explicit formula which pins down their locations exactly. However, these knots can be obtained numerically as limits of an iterative algorithmic procedure (see Groeneboom, Jongbloed, and Wellner (2003) and also Mammen and van de Geer (1997) in the context of nonparametric regression). To establish the limiting theory, it is necessary to exploit the characterization of the LSE and make use of appropriate perturbation functions to be able to track down the asymptotic behavior of these knots and hence that of the estimator.
Using (2.7), this yields On the other hand, since our estimation problem is local, most interesting perturbation functions are those that vanish except on an interval of the form [τ 1 , τ 2 ].
Such a function, p say, must satisfy thatg n + ǫp is convex and nonincreasing for ǫ > 0 small enough. As ǫ ց 0, nonnegativity of the corresponding directional derivative implies that τ2 τ1 and this inequality remains valid if we add an arbitrary constant to p, using again the fact thatG n (τ j ) = G n (τ j ), j = 1, 2.
The choice of perturbation functions in this problem is actually limited, and the simplest ones to take are of the form of a "reversed hat"; i.e., a convex triangular function on a compact set. Let s ∈ (τ 1 , τ 2 ) andτ be the mid-point of [τ 1 , τ 2 ]. Consider the following choices of local perturbation functions and It can be easily checked that these functions satisfy If τ 1 and τ 2 are successive jump points, theng ′ n is constant on (τ 1 , τ 2 ), and g n is linear on [τ 1 , τ 2 ]. On the other hand, using Taylor expansion and the assumption that g 0 is twice continuously differentiable to the right of 0, the true density g 0 can be decomposed into a quadratic polynomial plus a lower order remainder. Thus combining this with the inequality in (3.9) and the properties of p 0 , p 1,s and p 2,s (in this order) yields (3.13) The above identity in (3.8) and the inequalities in (3.11), (3.12) and (3.13) will prove to be very useful for deriving the main result. They will be exploited for different purposes, but note that they share the common feature of linking the LSE to some empirical process term. Lemma 6.1 will give us a way to bound the corresponding rates of convergence of these empirical processes, which is essential for establishing the uniform tightness result of Proposition 6.1. This is explained in more detail in the next subsection, where we also point to the main difficulty encountered in establishing the main result.

The crucial role of the distance between jump points
To give some insight into the proof of Theorem 4.1 and to describe the main difficulty of the problem, we focus here on α = 1/5 for which our proof is given in detail. The key argument in proving the weak convergence is to show the following uniform tightness result: for all 0 < M 1 < M 2 . Actually, we prove a stronger result where the supremum is taken over the bigger interval [τ − , M 2 n −1/5] and τ − is the last jump point ofg ′ n located before M 1 n −1/5 (see Proposition 6.1). In estimating the convex density at an interior point t 0 > 0, Groeneboom, Jongbloed, and Wellner (2001b) did not need to consider this stronger version; it is anyway implied by the n 1/5 − order of the distance between two jump points τ − and τ + in the neighborhood of t 0 > 0 as n → ∞; i.e., (see their Lemma 4.2). Uniform tightness is the key argument in showing convergence of the second and third derivatives of the "empirical invelope" to the corresponding derivative of the limit invelope, denoted in this manuscript by H + .
The "empirical invelope" is the scaled version of the local processH loc n which we define in Section 6.
The difficulty in establishing the uniform tightness result in our boundary problem is that it is not known how the jump points cluster to the right of the origin. For example, for a fixed K 2 > 0, we do not know whether the probability that a jump point is found in a neighborhood [K 1 n −1/5 , K 2 n −1/5 ] for some 0 < K 1 < K 2 increases to 1 as n → ∞. In the interior point problem considered by Groeneboom, Jongbloed, and Wellner (2001b), the event that g ′ n will have a jump between t 0 + M n −1/5 and t 0 − M n −1/5 for some M > 0 large enough occurs with an increasing probability as n → ∞. This fact follows again from (3.14) and was used by Groeneboom, Jongbloed, and Wellner (2001b) as a very useful "trick". Indeed, this offered some flexibility in choosing jump points, τ − and τ + that are far enough to show that By far enough, we mean that the distance τ + − τ − > rn 1/5 , where r > 0 does not depend on n. Of course, this distance stays bounded with large probability by M n 1/5 for some M > 0. The result in (3.15) states that with increasing probability, we can find at least a point in [τ − , τ + ] such that the estimation error is of the order n 2/5 . This turned out to be enough to show the uniform n 2/5 − and n 1/5 − tightness of the LSE and its derivative on an arbitrary neighborhood This might seem surprising but it was indeed possible using convexity of the estimator and the true density.
Unfortunately, the "trick" cannot be directly used here, and some new idea was needed. Our approach to go around the problem is to consider both the situations: (1) a jump point can be found in a given interval, [K 1 n −1/5 , K 2 n −1/5 ] say, and (2) no jump point can be found. In our proof of uniform tightness, A.
Case 2 and C. Case 2 describe exactly the second situation. There, we make use of the inequalities in (3.12) and (3.13) and the empirical process argument of Lemma 6.1.
Localizing and scaling appropriately the processesH n (t) = t 0 (t − s)g n (s)ds and Y n (t) = t 0 G n (s)ds, as it is done in Subsection 6.2, yieldH l n and Y l n . Uniform tightness is then used to show convergence ofH l n to a limiting process, which has to be necessarily equal almost surely to the invelope of the limit of Y l n as n → ∞.
In the next section, we give the three regimes of the joint asymptotic distributions of the consistent estimatorsg n (n −α ) andg ′ n (n −α ) depending on the value of α. This result can be compared directly with Theorem 4.1 of Kulikov and Lopuhaä (2006) in the case where the first derivative of the true monotone density does not vanish (k = 1 in their notation). A corollary follows, where we give the asymptotic distribution of our modified LSE.

Consistency at 0 and limiting distributions
Let W be a standard two-sided Brownian motion on R starting from 0, and Y the Gaussian drifting process defined on R in (1.3). For all t > 0, let We consider H + and H 0 the invelopes of Y + and Y 0 , defined in the same sense as the invelope H (see Section 1). Finally, let c 0 (0) and c 1 (0) the resulting constants when we replace t 0 by 0 in (1.4), c 3 (0) = g 0 (0) and Our main result is stated in the following theorem: Theorem 4.1 Let α ∈ (0, 1/3).
1. If α ∈ (1/5, 1/3) and t > 0, then 2. If α = 1/5 and t > 0, then To keep the manuscript to a reasonable length, we do not present the details of the proof of existence and almost surely uniqueness of the processes H + and H 0 . However, we would like to mention that the proof can be constructed along the lines of Groeneboom, Jongbloed, and Wellner (2001a) by defining a stochastic Least Squares problem in the class of convex functions on finite intervals [0, c] and let the intervals grow as c → ∞. For example, the invelope H + can be shown to be the limit, in an appropriate sense, of the processes H +,c as c → ∞. Here, H +,c is equal to the second integral of the unique minimizer of over the class of convex functions g on [0, c] such that g(0) = 0 and g(c) = 12c 2 , and H +,c satisfies the four boundary conditions H +, and that these conditions are necessary and sufficient for H H +,c is an approximation of H + and we can compute it using the Haar construction (see Rogers and Williams (1994)) and the iterative cubic spline algorithm as described in Groeneboom, Jongbloed, and Wellner (2003). In Figure 1 Corollary 4.1 The fastest rates of convergence ofg n (n −α ) andg ′ n (n −α ) to the true values g 0 (0) and g ′ 0 (0) are attained for α = 1/5 . Furthermore, if we define the estimator We will refer to this estimator as the modified LSE estimator.

Simulation results
We simulated n = 500 independent random variables from the standard exponential distribution Exp(1) and computedg n using the support reduction algorithm, as described by Groeneboom, Jongbloed, and Wellner (2003) for Least Squares problems. Inconsistency of the estimator and its first derivative at 0 can be clearly seen on the left panels of Figure 2, whereas the right panels illustrate consistency of the modified estimator and its first derivative.

Conclusions and some open questions
Based solely on the LSEg n of a convex density g 0 on (0, ∞) with g ′′ 0 continuous to the right of 0 and g ′′ 0 (0) = 0, we found that consistent estimation of g ′ 0 (0) is achieved by takingg ′ n (n −α ) with α ∈ (0, 1/3), and that α = 1/5 should be chosen as it yields the fastest rate n −1/5 . The limiting distribution involves a process H + , which is the invelope of the Gaussian process Y + (t) = t 0 W (s)ds + t 4 , t ≥ 0. Our idea was inspired by the work of Kulikov and Lopuhaä (2006) who found that taking the Grenander estimator at n −α with α ∈ (0, 1) ensures consistency in the monotone estimation problem, and that α = 1/3 yields the optimal rate n −1/3 . It is interesting to note that the penalization approach of Woodroofe and Sun (1993) forces rather the data to stay away from 0 with a distance >> n −1 : Their estimator can be viewed as the Grenander estimator for the transformed data λ n +γ n X j , withγ n → p 1 and λ n is the penalization parameter which must satisfy λ n n → ∞, and hence λ n >> n −1 . For the harder problem of estimating the slope g ′ 0 (0), an alternative approach based on shifting the data (suggested to us by Jongbloed (2006)) would probably require to have even bigger shift to the right of 0. A second approach is to penalize the derivative of the LSE; i.e., to minimize where g is convex and λ n is the penalizing parameter. This gives rise to the following open questions: How big λ n should be chosen to achieve consistency of the first derivative of the estimator? Would λ n have the same order as the shift in the approach suggested by Jongbloed (2006)? Could we choose λ n such that the estimator is a density? How do the rates of convergence and limiting distributions depend on λ n ?
In the monotone problem, the proofs of Kulikov and Lopuhaä (2006) make use of the so-called switching relationship introduced the first time by Groeneboom (1985) as a nice geometric interpretation of the Grenander estimator: If f n (x) is the value of this estimator at a point x, and U n (a) is the location of the maximum of G n (t) − at over [0, ∞), then A similar relationship is still lacking in the convex problem. In the latter, the characterization of the estimator is at the level of its second integral. This makes the geometric interpretation of the LSE less obvious. However, this does not imply that one should not explore different ways of viewing this characterization which might enable to simplify many of the arguments used in general in the problem of adaptive estimation of a convex density.
Finally, we would like to note that our modified LSE estimator can be compared to the simple estimator of Kulikov and Lopuhaä (2006). Their adaptive estimator is more efficient as it minimizes the mean square error. In our convex problem, it follows from Theorem 4.1 that and an adaptive estimator of g ′ 0 (0) can be given byg n (n −1/5k 2 t * ), where t * is the minimizer of E[H (3) + (t * )] 2 , andk 2 is a consistent estimator of k 2 . We would like to investigate this in a future work as finding an approximate value for t * would require a more efficient way to generate a sufficient number of invelopes H + on a fine grid on (0, ∞).

Proof of Theorem 4.1
We prove Theorem 4.1 for α = 1/5, which yields the fastest rates of convergence.
For α ∈ (1/5, 1/3), the core of the proof remains exactly the same, except for minor changes that will be indicated at the end of this section. We will also give heuristic arguments for the claimed weak convergence for α ∈ (0, 1/5).

Proof of uniform tightness
Proposition 6.1 For 0 < M 1 < M 2 , let τ − be the last jump point ofg ′ n occurring before M 1 n −1/5 . Then, The following lemmas will provide the necessary pieces that will go into the proof of Proposition 6.1.
Lemma 6.1 Let G 0 denote the true cumulative distribution function, and fix x > 0, r > 0. Consider a VC-subgraph class of functions f x,y defined on [x, y], x ≤ y ≤ x + r: for some real constant C > 0 and an integer k that are independent of x. Then, Proof. The argument is very similar to that of Kim and Pollard (1990) for proving their Lemma 4.1 (page 201) except that their grid mesh [(j −1)n −1/3 , jn −1/3 ), 1 ≤ j ≤ ⌊R 0 n 1/3 ⌋, is replaced here by [(j − 1)n −1/5 , jn −1/5 ), 1 ≤ j ≤ ⌊rn 1/5 ⌋. To see intuitively why we have the power (3 + k)/5 instead of their power 2/3, note that their Maximal inequality 3.1 (i) (see also Theorem 2.14.2 of van der Vaart and Wellner (1996)) implies that for a fixed ǫ > 0 there exists some M > 0 independent of j and n such that with probability greater than 1 − ǫ. Summation over 1 ≤ j ≤ ⌊rn 1/5 ⌋ yields the result. See also Groeneboom, Jongbloed, and Wellner (2001b), page 1677. 2 Lemma 6.2 For arbitrary 0 ≤ K 1 ≤ K 2 < K 3 ≤ K 4 , consider the event where τ − and τ + are jump points ofg ′ n . Then, for all ǫ > 0, there exists c > 0 such that Proof. Suppose that both the events T K1,K2,K3,K4 and inf t∈[τ − ,τ + ] |g n (t)−g 0 (t)| > cn −2/5 occur. Then, we can write Using now the identity (3.8) of Section 3 and Lemma 6.1 with F x,r = {f x,y (t) = y − t, x < y < x + r} (VC-class with index ≤ 3 by Lemma 2.6.15 of van der Vaart and Wellner (1996)) and k = 1, we can write and hence O p (n −4/5 ). Thus, we can find c > 0 large enough such that |g n (t) − g 0 (t)| > cn −2/5 , T K1,K2,K3,K4 < ǫ which implies the claimed result. 2 Lemma 6.3 Let ǫ > 0. For any K 1 ≥ 0, there exist K 2 ≥ K 1 such that, for n large enough, the event T K1,K2 = ∃ a jump point τ : K 1 n −1/5 ≤ τ ≤ K 2 n −1/5 occurs with probability greater than 1 − ǫ. In particular, this implies that for a given K 1 ≥ 0, there exist K 2 and K 4 such that K 4 ≥ K 3 > K 2 ≥ K 1 and Proof. Given K 1 ≥ 0, let τ 1 and τ 2 be the last and first jump points occurring before and after K 1 n −1/5 respectively (τ 1 can be equal to 0). Note that for r 0 > 0 small enough such that g ′′ 0 > 0 on (0, r 0 ], the event τ 2 < r 0 occurs with probability → 1 since we know that there exists a jump point τ < r 0 /2 such that r 0 /2 − τ = O p (n −1/5 ) with probability → 1 (by Lemma 4.2 of Groeneboom, Jongbloed, and Wellner (2001b)). Using now the inequality (3.11) in Section 3 and Lemma 6.1 (with k = 0), we have Note that this is the same upper bound obtained by Groeneboom, Jongbloed, and Wellner (2001b) for the distance between jump points in the neighborhood of an interior point t 0 > 0, as we have already mentioned in Subsection 3.2. Thus, there exists M > 0 such that τ 2 < τ 1 + M n −1/5 ≤ (K 1 + M )n −1/5 with large probability, and hence we can take K 2 = K 1 + M . To show the second assertion of the lemma it suffices, for any K 3 > K 2 , to consider τ ′ to be the first jump point after K 3 n −1/5 . Then there exists K 4 ≥ K 3 such that the probability of the event n −1/5 K 3 ≤ τ ′ ≤ n −1/5 K 4 is greater than 1 − ǫ. 2 Proof of Proposition 6.1. In all what follows we denote B u = sup t∈(0,r0] g ′′ 0 , where (0, r 0 ] is the biggest neighborhood of 0 on which g ′′ 0 > 0 and continuous. For convenience, we are going to assume without loss of generality that M 1 = 6/7 and M 2 = 1. Of course, the same arguments can be used for any other values 0 < M 1 < M 2 as long as they do not depend on n. Let τ − be the last jump point occurring before 6/7n −1/5 and fix ǫ > 0. A. Uniform tightness of n 1/5 (g ′ n − g ′ 0 ) from below: We show in the following that there exists c > 0 such that for n large enough. If we divide [0, n −1/5 ] into seven equally sized subintervals, then there are only two cases.
We recall again that τ − is the last jump point occurring before 5n −1/5 /7.
Remark. The class is a VC-class because the class of sets of between graphs -class (van der Vaart and Wellner (1996), Section 2.6, problem 11).
The latter is true since D = D 1 ∩ D 2 , where which are VC-classes. This follows from the fact that t → f x,y (t)1 [x,s] (t) and t → f x,y (t)1 (s,y] (t) are polynomials of degree at most 1. Hence, D is a VC-class by van der Vaart and Wellner (1996), Lemma 2.6.17 (iii).
B. Uniform tightness of n 1/5 (g ′ n − g ′ 0 ) from above: We show now that there exists c > 0 such that for n large enough.
Case 2. As before, consider the case where one of the subintervals does not contain any jump point, and let τ 1 and τ 2 be the last and first jump points occurring before and after the end points of the smallest subinterval. Using the inequality in (3.13), Lemma 6.1 and Lemma 6.3 combined with fact that τ 2 − τ 1 > n −1/5 /7 (exactly as in A. Case 2), we can find c > 0 such that (6.28) Now, combining (6.27) and (6.28) gives the result.
D. Uniform tightness of n 2/5 (g n − g 0 ) from below: We show finally that there exists c > 0 such that for n large enough. With ξ 1 as above in C. Case 1; i.e., ξ 1 is the minimizer of , where τ + 1 is the first jump after n −1/5 and τ + 2 is the first jump after τ + 1 + n −1/5 . we have by convexitỹ with probability greater than 1−ǫ. It suffices to take c = 1 2 B u K 2 +2c ′ . In (6.30), we used the fact that ξ 1 is bounded with increasing probability by M n −1/5 for some M > 0 large enough (this follows from Lemma 6.3), and the uniform n 1/5 − tightness ofg ′ n established above. 2
Hence, n −α is playing a very similar role to that of t 0 > 0 in the problem of estimating a convex density at an interior point. Thus, the asymptotics in this case can heuristically deduced by replacing t 0 in (1.2) by n −α , and the constants c 0 (t 0 ) and c 1 (t 0 ) by c 0 (0) = lim n→∞ c 0 (n −α ) and c 1 (0) = lim n→∞ c 1 (n −α ). 2