How to define or measure convexity for datasets that fit to a convex function

Question

A function $f$ in $\mathbb{R}$ is convex, iff $f''(x)\geq 0$ for all $x\in\mathbb{R}$. The terms "non-negative" and "monotonously rising" can be seen related to "convex" in the context of real functions as they use the conditions $f(x)\geq 0$ and $f'(x)\geq 0$ respectively.

In the context of a dataset $\{(x_1,y_1),...,(x_n,y_n)\}$, there is concepts to describe a trend of non-negativity or rising behaviour. Respectively, one could characterize/measure them with $\mathbb{E}(Y)\geq 0$ and $Cov(X,Y)\geq 0$. Is there a related measure or definition to characterize datasets with a trend of an increasing slope?

I would be especially happy about some literature recommendation.

Jean Marie · Answer 1 · 2024-11-04T13:01:16.387

2

Let $A_k=(x_k,y_k)$.

If the $x_k$ are assumed an increasing sequence : $x_1 < x_2 < \cdots < x_n$, the criteria is :

$$\text{for any} \ k=2,3, \cdots n-1,$$ $$\det \pmatrix{ x_{k-1}&x_k&x_{k+1}\\y_{k-1}&y_k&y_{k+1}\\1&1&1} \ge 0 \tag{1}$$

because this determinant is twice the signed area of triangle $A_{k-1}A_kA_{k+1}$. (Convexity amounts to say that this triangle is oriented in the direct sense).

Edit : following a comment by Isidor, condition (1) above is strictly equivalent to the fact that the difference between successive slopes is positive :

$$\frac{y_{k}-y_{k-1}} {x_{k}-x_{k-1}}-\frac{y_{k+1}-y_{k}} {x_{k+1}-x_{k}} \ge 0 \tag{2}$$

In the case where the "step" $s$ between successive $x_k$s is the same, (2) can be rewritten under the form :

$$y_{k+1}-2y_{k}+y_{k-1} \ge 0$$

where we recognize on the LHS the classical expression for the discrete second derivative (see the first sentence in the question).

edited Nov 04 '24 at 13:01

answered Nov 03 '24 at 21:59

Jean Marie

88,997

See for example this answer – Jean Marie Nov 03 '24 at 22:09
1

Very cute :-). ${}$ – copper.hat Nov 03 '24 at 22:40
Thank you Jean Marie. If I review this correctly, your proposed condition is true, iff the slope between $x_{i}$ and $x_{i+1}$ is larger than the slope between $x_{i-1}$ and $x_{i}$ for all $i$. However, I am looking for a weaker condition that only tells about the statistical trend of the data and is robust to noise. Just as the covariance reveals a trend of positive/negative relation between X an Y without demanding that it holds true everywhere – Isidor Konrad Maier Nov 04 '24 at 09:45
Two answers : 1) you may be interested by this question 2) In my answer I am just considering the sign of the determinant, but if you consider the value of the determinant (weighted by the extend $x_{i+1}-x_{i-1}$), you can, in a certain statistical sense, "accept" "slightly negative" values... – Jean Marie Nov 04 '24 at 10:25
I just added an "Edit" to my answer. – Jean Marie Nov 04 '24 at 13:03

score 0 · Answer 2 · answered Nov 07 '24 at 16:02

If we measure convexity by the covariance between the $x_i$ and the slopes $\Delta_i:=\frac{y_{i+1}-y_i}{x_{i+1}-x_i}$, we yield

$Cov(X,\Delta Y)=\frac{1}{n^2}\sum_{k=1}^{n-2}\sum_{j=1}^k\sum_{i=1}^n (x_i-x_j)(\Delta_{k+1}-\Delta_k)$

I found this out based on this representation of the covariance as a linear combination of the $\Delta_i$:

$Cov(X,Y)=\frac{1}{n^2}\sum^{n-1}_{k=1}a_k\Delta_k$ with $a_k=(x_{k+1}-x_k)\sum^k_{j=1}\sum^n_{i=k+1}(x_i-x_j)$

If the $x_i$ are equidistant to each other, then the terms $\sum^k_{j=1}\sum^n_{i=k+1}(x_i-x_j)$ are exactly $\frac{x_n-x_1}{2}(k-(x_n-x_1))k$. Having tested some very skew distributions of the $x_i$, I believe that the formula may be a good guess in general.

I am a bit skeptical about the measurement $Cov(X,\Delta Y)$ because the coefficients of the sum are only decreasing quadratically towards the edges. I feel like they should decrease cubically, because they are already quadratic in the formula for $Cov(X,Y)$, and a convexity measure should actually have weight on the values in the middle.

How to define or measure convexity for datasets that fit to a convex function

2 Answers2