Notebook

Convex Nonparametric Least Squares (`CNLS`): An alternative formulation¶

CNLS¶

Hildreth (1954) was the first to consider nonparametric regression subject to monotonicity and concavity constraints in the case of a single input variable $x$ (see also Hanson and Pledger 1976). Kuosmanen (2008) extended Hidreth’s approach to the multivariate setting with a vector-valued $\bf{x}$, and coined the term convex nonparametric least squares (CNLS) for this method. CNLS builds upon the assumption that the true but unknown production function f belongs to the set of continuous, monotonic increasing and globally concave functions, $F_2$, imposing exactly the same production axioms as standard DEA.

The CNLS estimator of function $f$ is obtained as the optimal solution to the infinite dimensional least squares problem

\begin{align} & \underset{f} min \sum_{i=1}^n(y_i-f(\bf{x}_i))^2 \\ & \text{s.t.} \\ & f \in F_2 \end{align}

CNLS estimator with the observed data points ($\mathbf{x}_i$, $y_i$)¶

\begin{align} & \underset{\alpha, \beta, \varepsilon} {min} \sum_{i=1}^n\varepsilon_i^2 \\ & \text{s.t.} \\ & y_i = \alpha_i + \beta_i^{'}X_i + \varepsilon_i \quad \forall i \\ & \alpha_i + \beta_i^{'}X_i \le \alpha_j + \beta_j^{'}X_i \quad \forall i, j\\ & \beta_i \ge 0 \quad \forall i \\ \end{align}

where $\alpha_i$ and $\beta_i$ define the intercept and slope parameters of tangent hyperplanes that characterize the estimated piece-wise linear frontier. $\varepsilon_i$ denotes the CNLS residuals.

Alternative formulation¶

To estimate the above CNLS model on Python, we have to resort to an alternative CNLS formulation.

\begin{align} & \underset{\hat{y}, \beta} {min} \frac{1}{2} \mid\mid y_i-\hat{y}_i\mid\mid_2^2 \\ & \text{s.t.} \\ & \hat{y}_i - \hat{y}_j \ge \beta_i^{'}(X_i -X_j) \quad \forall i, j\\ & \beta_i \ge 0 \quad \forall i \\ \end{align}

Proof 1: The constraints in the two formulations are equivalent.

Replace the CNLS residuals by: $\varepsilon_i = y_i − \alpha_i - X_i \beta_i^{'} = y_i - \hat{y}_i$
Left-hand side in second constraint: $\hat{y}_i$
Right-hand side in second constraint: $\hat{y}_j - X_j \beta_j^{'} + X_i \beta_j^{'} = \hat{y}_j + \beta_j^{'} (X_i-X_j)$

Therefore, the second constraint is redefined as:

\begin{align} & \alpha_i + \beta_iX_i \le \alpha_j + \beta_jX_i \quad \forall i, j\\ & \hat{y}_i \le \hat{y}_j + \beta_j^{'} (X_i-X_j) \\ & \hat{y}_j - \hat{y}_i \ge \beta_j^{'} (X_j-X_i) \\ & \hat{y}_i - \hat{y}_j \ge \beta_i^{'} (X_i-X_j) \quad \blacksquare \\ \end{align}

Proof 2: The objective function is a standard quadratic problem (QP).

The standard QP formulation: \begin{align} \text{min} f(x) = \frac{1}{2}x^TAx + q^Tx \end{align}

The objective function in the alternative CNLS formulation: \begin{align} & \text{min} \frac{1}{2} \mid\mid y-\hat{y}\mid\mid_2^2 \\ & = \frac{1}{2} (\hat{y} - y)^T(\hat{y} - y) \\ & = \frac{1}{2}(\hat{y}^T\hat{y} -y^T\hat{y} -\hat{y}^Ty+ y^Ty) \\ & = \frac{1}{2}\hat{y}^T\hat{y} - y^T\hat{y} + \frac{1}{2}y^Ty \\ \end{align}

since $\frac{1}{2}y^Ty$ is a contant term, it is sufficient to solve the QP-form objective function. Note that A is a identity matrix, q= -y. $\blacksquare$

Now, we can estimate the alternative CNLS model based on convex quadratic problem solver in Python, eg., CVXOPT.

References¶

[1] Johnson, A. L. and Kuosmanen, T. (2015) An Introduction to CNLS and StoNED Methods for Efficiency Analysis: Economic Insights and Computational Aspects, in Ray, S. C., Kumbhakar, S. C., and Dua, P. (eds) Benchmarking for Performance Evaluation: A Production Frontier Approach. Springer, pp. 117–186.

[2] Kuosmanen, T., Johnson, A. and Saastamoinen, A. (2015) Stochastic Nonparametric Approach to Efficiency Analysis: A unified Framework, in Zhu, J. (ed.) Data Envelopment Analysis. Springer, pp. 191–244.

Convex Nonparametric Least Squares (CNLS): An alternative formulation¶

CNLS¶

CNLS estimator with the observed data points ($\mathbf{x}_i$, $y_i$)¶

Alternative formulation¶

References¶

Convex Nonparametric Least Squares (`CNLS`): An alternative formulation¶