Quadratic Programming Algorithms

Quadratic Programming Definition

Quadratic programming is the problem of finding a vector x that minimizes a quadratic function, possibly subject to linear constraints:

$\min_{x} \frac{1}{2} x^{T} H x + c^{T} x$

such that A·x ≤ b, Aeq·x = beq, l ≤ x ≤ u.

`interior-point-convex` `quadprog` Algorithm

The interior-point-convex algorithm performs the following steps:

Presolve/Postsolve
Generate Initial Point
Predictor-Corrector
Multiple Corrections
Total Relative Error

Presolve/Postsolve

The algorithm begins by attempting to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step include:

Check if any variables have equal upper and lower bounds. If so, check for feasibility, and then fix and remove the variables.
Check if any linear inequality constraint involves just one variable. If so, check for feasibility, and change the linear constraint to a bound.
Check if any linear equality constraint involves just one variable. If so, check for feasibility, and then fix and remove the variable.
Check if any linear constraint matrix has zero rows. If so, check for feasibility, and delete the rows.
Check if the bounds and linear constraints are consistent.
Check if any variables appear only as linear terms in the objective function and do not appear in any linear constraint. If so, check for feasibility and boundedness, and fix the variables at their appropriate bounds.
Change any linear inequality constraints to linear equality constraints by adding slack variables.

If algorithm detects an infeasible or unbounded problem, it halts and issues an appropriate exit message.

The algorithm might arrive at a single feasible point, which represents the solution.

If the algorithm does not detect an infeasible or unbounded problem in the presolve step, it continues, if necessary, with the other steps. At the end, the algorithm reconstructs the original problem, undoing any presolve transformations. This final step is the postsolve step.

For details, see Gould and Toint [63].

Generate Initial Point

The initial point x0 for the algorithm is:

Initialize x0 to ones(n,1), where n is the number of rows in H.
For components that have both an upper bound ub and a lower bound lb, if a component of x0 is not strictly inside the bounds, the component is set to (ub + lb)/2.
For components that have only one bound, modify the component if necessary to lie strictly inside the bound.

Predictor-Corrector

Similar to the fmincon interior-point algorithm, the interior-point-convex algorithm tries to find a point where the Karush-Kuhn-Tucker (KKT) conditions hold. For the quadratic programming problem described in Quadratic Programming Definition, these conditions are:

$\begin{matrix} H x + c - A_{e q}^{T} y - {\bar{A}}^{T} z = 0 \\ \bar{A} x - \bar{b} - s = 0 \\ A_{e q} x - b_{e q} = 0 \\ s_{i} z_{i} = 0, i = 1, 2, ..., m \\ s \geq 0 \\ z \geq 0. \end{matrix}$

Here

$\bar{A}$ is the extended linear inequality matrix that includes bounds written as linear inequalities. $\bar{b}$ is the corresponding linear inequality vector, including bounds.
s is the vector of slacks that convert inequality constraints to equalities. s has length m, the number of linear inequalities and bounds.
z is the vector of Lagrange multipliers corresponding to s.
y is the vector of Lagrange multipliers associated with the equality constraints.

The algorithm first predicts a step from the Newton-Raphson formula, then computes a corrector step. The corrector attempts to better enforce the nonlinear constraint s_iz_i = 0.

Definitions for the predictor step:

r_d, the dual residual:
$r_{d} = H x + c - A_{e q}^{T} y - {\bar{A}}^{T} z .$
r_eq, the primal equality constraint residual:
$r_{e q} = A_{e q} x - b_{e q} .$
r_ineq, the primal inequality constraint residual, which includes bounds and slacks:
$r_{i n e q} = \bar{A} x - \bar{b} - s .$
r_sz, the complementarity residual:
r_sz = Sz.
S is the diagonal matrix of slack terms, z is the column matrix of Lagrange multipliers.
r_c, the average complementarity:
$r_{c} = \frac{s^{T} z}{m} .$

In a Newton step, the changes in x, s, y, and z, are given by:

$(\begin{matrix} H & 0 & - A_{e q}^{T} & - {\bar{A}}^{T} \\ A_{e q} & 0 & 0 & 0 \\ \bar{A} & - I & 0 & 0 \\ 0 & Z & 0 & S \end{matrix}) (\begin{matrix} Δ x \\ Δ s \\ Δ y \\ Δ z \end{matrix}) = - (\begin{matrix} r_{d} \\ r_{e q} \\ r_{i n e q} \\ r_{s z} \end{matrix}) .$

However, a full Newton step might be infeasible, because of the positivity constraints on s and z. Therefore, quadprog shortens the step, if necessary, to maintain positivity.

Additionally, to maintain a "centered" position in the interior, instead of trying to solve s_iz_i = 0, the algorithm takes a positive parameter σ, and tries to solve

s_iz_i = σr_c.

quadprog replaces r_sz in the Newton step equation with r_sz + ΔsΔz – σr_c1, where 1 is the vector of ones. Also, quadprog reorders the Newton equations to obtain a symmetric, more numerically stable system for the predictor step calculation.

For details, see Mehrotra [47].

Multiple Corrections

After calculating the corrected Newton step, quadprog can perform more calculations to get both a longer current step, and to prepare for better subsequent steps. These multiple correction calculations can improve both performance and robustness. For details, see Gondzio [62].

Total Relative Error

quadprog calculates a merit function φ at every iteration. The merit function is a measure of feasibility, and is also called total relative error. quadprog stops if the merit function grows too large. In this case, quadprog declares the problem to be infeasible.

The merit function is related to the KKT conditions for the problem—see Predictor-Corrector. Use the following definitions:

$\begin{matrix} ρ = \max (1, ‖ H ‖, ‖ \bar{A} ‖, ‖ A_{e q} ‖, ‖ c ‖, ‖ \bar{b} ‖, ‖ b_{e q} ‖) \\ r_{eq} = A_{eq} x - b_{eq} \\ r_{ineq} = \bar{A} x - \bar{b} + s \\ r_{d} = H x + c + A_{eq}^{T} λ_{eq} + {\bar{A}}^{T} {\bar{λ}}_{ineq} \\ g = x^{T} H x + f^{T} x - {\bar{b}}^{T} {\bar{λ}}_{ineq} - b_{eq}^{T} λ_{eq} . \end{matrix}$

The notation $\bar{A}$ and $\bar{b}$ means the linear inequality coefficients, augmented with terms to represent bounds. The notation ${\bar{λ}}_{ineq}$ similarly represents Lagrange multipliers for the linear inequality constraints, including bound constraints. This was called z in Predictor-Corrector, and $λ_{eq}$ was called y.

The merit function φ is

$\frac{1}{ρ} (\max ({‖ r_{eq} ‖}_{\infty}, {‖ r_{ineq} ‖}_{\infty}, {‖ r_{d} ‖}_{\infty}) + g) .$

quadprog iterative display includes a column showing the merit function under the heading Total relative error.

`trust-region-reflective` `quadprog` Algorithm

Many of the methods used in Optimization Toolbox™ solvers are based on trust regions, a simple yet powerful concept in optimization.

To understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimize f(x), where the function takes vector arguments and returns scalars. Suppose you are at a point x in n-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximate f with a simpler function q, which reasonably reflects the behavior of function f in a neighborhood N around the point x. This neighborhood is the trust region. A trial step s is computed by minimizing (or approximately minimizing) over N. This is the trust-region subproblem,

\min_{s} {q (s), s \in N} .

(9-1)

The current point is updated to be x + s if f(x + s) < f(x); otherwise, the current point remains unchanged and N, the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach to minimizing f(x) are how to choose and compute the approximation q (defined at the current point x), how to choose and modify the trust region N, and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximation q is defined by the first two terms of the Taylor approximation to F at x; the neighborhood N is usually spherical or ellipsoidal in shape. Mathematically the trust-region subproblem is typically stated

\min {\frac{1}{2} s^{T} H s + s^{T} g such that ‖ D s ‖ \leq Δ},

(9-2)

where g is the gradient of f at the current point x, H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, Δ is a positive scalar, and ∥ . ∥ is the 2-norm. Good algorithms exist for solving Equation 9-2 (see [48]); such algorithms typically involve the computation of a full eigensystem and a Newton process applied to the secular equation

$\frac{1}{Δ} - \frac{1}{‖ s ‖} = 0.$

Such algorithms provide an accurate solution to Equation 9-2. However, they require time proportional to several factorizations of H. Therefore, for large-scale problems a different approach is needed. Several approximation and heuristic strategies, based on Equation 9-2, have been proposed in the literature ([42] and [50]). The approximation approach followed in Optimization Toolbox solvers is to restrict the trust-region subproblem to a two-dimensional subspace S ([39] and [42]). Once the subspace S has been computed, the work to solve Equation 9-2 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace S is determined with the aid of a preconditioned conjugate gradient process described below. The solver defines S as the linear space spanned by s₁ and s₂, where s₁ is in the direction of the gradient g, and s₂ is either an approximate Newton direction, i.e., a solution to

H \cdot s_{2} = - g,

(9-3)

or a direction of negative curvature,

s_{2}^{T} \cdot H \cdot s_{2} < 0.

(9-4)

The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.
Solve Equation 9-2 to determine the trial step s.
If f(x + s) < f(x), then x = x + s.
Adjust Δ.

These four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e., f(x + s) ≥ f(x). See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special cases of f with specialized functions: nonlinear least-squares, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.

The subspace trust-region method is used to determine a search direction. However, instead of restricting the step to (possibly) one reflection step, as in the nonlinear minimization case, a piecewise reflective line search is conducted at each iteration. See [45] for details of the line search.

Preconditioned Conjugate Gradient Method

A popular way to solve large symmetric positive definite systems of linear equations Hp = –g is the method of Preconditioned Conjugate Gradients (PCG). This iterative approach requires the ability to calculate matrix-vector products of the form H·v where v is an arbitrary vector. The symmetric positive definite matrix M is a preconditioner for H. That is, M = C², where C^–1HC^–1 is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix H is symmetric. However, H is guaranteed to be positive definite only in the neighborhood of a strong minimizer. Algorithm PCG exits when a direction of negative (or zero) curvature is encountered, i.e., d^THd ≤ 0. The PCG output direction, p, is either a direction of negative curvature or an approximate (tol controls how approximate) solution to the Newton system Hp = –g. In either case p is used to help define the two-dimensional subspace used in the trust-region approach discussed in Trust-Region Methods for Nonlinear Minimization.

Linear Equality Constraints

Linear constraints complicate the situation described for unconstrained minimization. However, the underlying ideas described previously can be carried through in a clean and efficient way. The trust-region methods in Optimization Toolbox solvers generate strictly feasible iterates.

The general linear equality constrained minimization problem can be written

\min {f (x) such that A x = b},

(9-5)

where A is an m-by-n matrix (m ≤ n). Some Optimization Toolbox solvers preprocess A to remove strict linear dependencies using a technique based on the LU factorization of A^T [46]. Here A is assumed to be of rank m.

The method used to solve Equation 9-5 differs from the unconstrained approach in two significant ways. First, an initial feasible point x₀ is computed, using a sparse least-squares step, so that Ax₀ = b. Second, Algorithm PCG is replaced with Reduced Preconditioned Conjugate Gradients (RPCG), see [46], in order to compute an approximate reduced Newton step (or a direction of negative curvature in the null space of A). The key linear algebra step involves solving systems of the form

[\begin{matrix} C & {\tilde{A}}^{T} \\ \tilde{A} & 0 \end{matrix}] [\begin{matrix} s \\ t \end{matrix}] = [\begin{matrix} r \\ 0 \end{matrix}],

(9-6)

where $\tilde{A}$ approximates A (small nonzeros of A are set to zero provided rank is not lost) and C is a sparse symmetric positive-definite approximation to H, i.e., C = H. See [46] for more details.

Box Constraints

The box constrained problem is of the form

\min {f (x) such that l \leq x \leq u},

(9-7)

where l is a vector of lower bounds, and u is a vector of upper bounds. Some (or all) of the components of l can be equal to –∞ and some (or all) of the components of u can be equal to ∞. The method generates a sequence of strictly feasible points. Two techniques are used to maintain feasibility while achieving robust convergence behavior. First, a scaled modified Newton step replaces the unconstrained Newton step (to define the two-dimensional subspace S). Second, reflections are used to increase the step size.

The scaled modified Newton step arises from examining the Kuhn-Tucker necessary conditions for Equation 9-7,

{(D (x))}^{- 2} g = 0,

(9-8)

where

$D (x) = diag ({| v_{k} |}^{- 1 / 2}),$

and the vector v(x) is defined below, for each 1 ≤ i ≤ n:

If g_i < 0 and u_i < ∞ then v_i = x_i – u_i
If g_i ≥ 0 and l_i > –∞ then v_i = x_i – l_i
If g_i < 0 and u_i = ∞ then v_i = –1
If g_i ≥ 0 and l_i = –∞ then v_i = 1

The nonlinear system Equation 9-8 is not differentiable everywhere. Nondifferentiability occurs when v_i = 0. You can avoid such points by maintaining strict feasibility, i.e., restricting l < x < u.

The scaled modified Newton step s_k for the nonlinear system of equations given by Equation 9-8 is defined as the solution to the linear system

\hat{M} D s^{N} = - \hat{g}

(9-9)

at the kth iteration, where

\hat{g} = D^{- 1} g = diag ({| v |}^{1 / 2}) g,

(9-10)

and

\hat{M} = D^{- 1} H D^{- 1} + diag (g) J^{v} .

(9-11)

Here J^v plays the role of the Jacobian of |v|. Each diagonal component of the diagonal matrix J^v equals 0, –1, or 1. If all the components of l and u are finite, J^v = diag(sign(g)). At a point where g_i = 0, v_i might not be differentiable. $J_{i i}^{v} = 0$ is defined at such a point. Nondifferentiability of this type is not a cause for concern because, for such a component, it is not significant which value v_i takes. Further, |v_i| will still be discontinuous at this point, but the function |v_i|·g_i is continuous.

Second, reflections are used to increase the step size. A (single) reflection step is defined as follows. Given a step p that intersects a bound constraint, consider the first bound constraint crossed by p; assume it is the ith bound constraint (either the ith upper or ith lower bound). Then the reflection step p^R = p except in the ith component, where p^R_i = –p_i.

`active-set` `quadprog` Algorithm

Recall the problem quadprog addresses:

\min_{x} \frac{1}{2} x^{T} H x + c^{T} x

(9-12)

such that A·x ≤ b, Aeq·x = beq, and l ≤ x ≤ u. m is the total number of linear constraints, the sum of number of rows of A and of Aeq.

The quadprog active-set algorithm is an active-set strategy (also known as a projection method) similar to that of Gill et al., described in [18] and [17]. It has been modified for both Linear Programming (LP) and Quadratic Programming (QP) problems.

The solution procedure involves two phases. The first phase involves the calculation of a feasible point (if one exists). The second phase involves the generation of an iterative sequence of feasible points that converge to the solution.

Active Set Iterations

In this method an active set matrix, S_k, is maintained that is an estimate of the active constraints (i.e., those that are on the constraint boundaries) at the solution point. Specifically, the active set S_k consists of the rows of Aeq, and a subset of the rows of A. S_k is updated at each iteration k, and is used to form a basis for a search direction d_k. Equality constraints always remain in the active set S_k. The search direction d_k is calculated and minimizes the objective function while remaining on active constraint boundaries. The feasible subspace for d_k is formed from a basis Z_k whose columns are orthogonal to the estimate of the active set S_k (i.e., S_kZ_k = 0). Thus a search direction, which is formed from a linear summation of any combination of the columns of Z_k, is guaranteed to remain on the boundaries of the active constraints.

The matrix Z_k is formed from the last m – l columns of the QR decomposition of the matrix $S_{k}^{T}$ , where l is the number of active constraints and l < m. That is, Z_k is given by

Z_{k} = Q [:, l + 1 : m],

(9-13)

where

$Q^{T} S_{k}^{T} = [\begin{matrix} R \\ 0 \end{matrix}] .$

Once Z_k is found, a search direction d_k is sought that minimizes the objective function at d_k, where d_k is in the null space of the active constraints. That is, d_k is a linear combination of the columns of Z_k: d_k = Z_kp for some vector p.

Then if you view the quadratic objective function as a function of p, by substituting for d_k, the result is

q (p) = \frac{1}{2} p^{T} Z_{k}^{T} H Z_{k} p + c^{T} Z_{k} p .

(9-14)

Differentiating this with respect to p yields

\nabla q (p) = Z_{k}^{T} H Z_{k} p + Z_{k}^{T} c .

(9-15)

∇q(p) is referred to as the projected gradient of the quadratic function because it is the gradient projected in the subspace defined by Z_k. The term $Z_{k}^{T} H Z_{k}$ is called the projected Hessian. Assuming the Hessian matrix H is positive definite, the minimum of the function q(p) in the subspace defined by Z_k occurs when ∇q(p) = 0, which is the solution of the system of linear equations

Z_{k}^{T} H Z_{k} p = - Z_{k}^{T} c .

(9-16)

The next step is

x_{k + 1} = x_{k} + α d_{k}, where d_{k} = Z_{k}^{T} p .

(9-17)

At each iteration, because of the quadratic nature of the objective function, there are only two choices of step length α. A step of unity along d_k is the exact step to the minimum of the function restricted to the null space of S_k. If such a step can be taken, without violation of the constraints, then this is the solution to QP (Equation 9-12). Otherwise, the step along d_k to the nearest constraint is less than unity and a new constraint is included in the active set at the next iteration. The distance to the constraint boundaries in any direction d_k is given by

α = \min_{i \in {1, ..., m}} {\frac{- (A_{i} x_{k} - b_{i})}{A_{i} d_{k}}},

(9-18)

which is defined for constraints not in the active set, and where the direction d_k is towards the constraint boundary, i.e., $A_{i} d_{k} > 0, i = 1, ..., m$ .

Lagrange multipliers, λ_k, are calculated that satisfy the nonsingular set of linear equations

S_{k}^{T} λ_{k} = c .

(9-19)

If all elements of λ_k are positive, x_k is the optimal solution of QP (Equation 9-12). However, if any component of λ_k is negative, and the component does not correspond to an equality constraint, then the corresponding element is deleted from the active set and a new iterate is sought.

Initialization

The algorithm requires a feasible point to start. If the initial point is not feasible, then you can find a feasible point by solving the linear programming problem

\begin{array}{l} \min_{γ \in ℜ, x \in ℜ^{n}} γ such that \\ A_{i} x = b_{i}, i = 1, ..., m_{e} (the rows of A e q) \\ A_{i} x - γ \leq b_{i}, i = m_{e} + 1, ..., m (the rows of A) . \end{array}

(9-20)

The notation A_i indicates the ith row of the matrix A. You can find a feasible point (if one exists) to Equation 9-20 by setting x to a value that satisfies the equality constraints. You can determine this value by solving an under- or overdetermined set of linear equations formed from the set of equality constraints. If there is a solution to this problem, the slack variable γ is set to the maximum inequality constraint at this point.

You can modify the preceding QP algorithm for LP problems by setting the search direction d to the steepest descent direction at each iteration, where g_k is the gradient of the objective function (equal to the coefficients of the linear objective function):

d = - Z_{k} Z_{k}^{T} g_{k} .

(9-21)

If a feasible point is found using the preceding LP method, the main QP phase is entered. The search direction d_k is initialized with a search direction d₁ found from solving the set of linear equations

H d_{1} = - g_{k},

(9-22)

where g_k is the gradient of the objective function at the current iterate x_k (i.e., Hx_k + c).

Was this topic helpful?

Documentation