Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient
Proposal
Main paper can be found here.
- First order method for Linear Programming with high accuracy while being amenable to parallelization and scaling to large problems
- algorithmic improvements to PDHG: adaptive restarting, dynamic primal-dual step size selection, presolving techniques and diagonal preconditioning.
- however the proposed algo here has no convergence guarantee!
Summary
- simplex and IPM wont scale well to large scale problems due to their need for matrix factorization
- primal LP problem is stated as:
- $$min_x c x$$
- subject to:
- $$G x >= h$$
- $$A x = b$$
- $$l <= x <= u $$
- where:
- $$x \epsilon R^n$$
- $$c \epsilon R^n$$
- $$G \epsilon R^{m_1 x n}$$
- $$h \epsilon R^{m_1}$$
- $$A \epsilon R^{m_2 x n}$$
- $$b \epsilon R^{m_2}$$
- $$l \epsilon R^n$$
- $$u \epsilon R^n$$
- $$n$$ = number of variables
- dual LP problem is then stated as:
- $$max_y q y + l \lambda^+ - u \lambda^-$$
- subject to:
- $$c - K y = \lambda$$
- $$y_{1:m_1} >= 0$$
- $$\lambda \epsilon \Gamma$$
- $$K = (G, A)$$
- $$q = (h, b)$$
- where:
- $$y \epsilon R^{m_1 + m_2}$$
- $$\lambda \epsilon R^n$$
- $$\Gamma = product_i \Gamma_i i \epsilon [1,n]$$
- $$\Gamma_i = 0 if l_i = -\inf, u_i = \inf$$
- $$\Gamma_i = R^- if l_i = -\inf, u_i \epsilon R$$
- $$\Gamma_i = R^+ if l_i \epsilon R, u_i = \inf$$
- $$\Gamma_i = R else$$
- $$\lambda_+ = max(0, \lambda)$$
- $$\lambda_- = min(0, \lambda)$$
- $$R^-$$ = set of non-positive real numbers
- $$R^+$$ = set of non-negative real numbers
- this is represented as the following saddle point problem: $$min_x max_y L(x, y) = c x - y K x + q y$$
- PDHG solves this iteratively as:
- $$x^{k+1} = proj_X(x^k - \tau (c - K y^k))$$
- $$y^{k+1} = proj_Y(y^k + \sigma (q - K (2x^{k+1} - x^k)))$$
- where:
- $$\tau = \frac{\eta}{\omega}$$ = primal stepsize
- $$\sigma = \eta \omega$$ = dual stepsize
- $$\eta \epsilon (0, \inf)$$ = stepsize
- $$\omega \epsilon (0, \inf)$$ = primal weight
- PDHG is known to converge for all values of $$\eta < \frac{1}{sn(K)}$$
- $$sn(.)$$ = matrix spectral norm, estimated via power iteration
PDLP Algorithm
- $$z^{0,0}$$ = initial solution
- $$n = 0$$ = outer loop counter
- $$k = 0$$ = total iterations
- $$\eta_c^{0,0} = \frac{1}{infnorm(K)}$$
- $$\omega^0 = initializePrimalWeight()$$
- repeat until termination
- $$t = 0$$
- repeat until restart or termination (this check is done once every 40 iterations to reduce overhead)
- $$z^{n,t+1}, \eta^{n,t+1}, \eta_c^{n,t+1} = adaptivePDHGstep(z^{n,t}, \omega^n, \eta_c^{n,t}, k)$$
- $$z'^{n,t+1} = \frac{1}{\Sigma_{i=1}^{t+1} \eta^{n,i}} \Sigma_{i=1}^{t+1} \eta^{n,i} z^{n,i}$$
- $$z_c^{n,t+1} = restartCandidate(z^{n,t+1}, z'^{n,t+1}, z^{n,0})$$
- $$t = t + 1$$
- $$k = k + 1$$
- $$z^{n+1,0} = z_c^{n,t}$$
- $$n = n + 1$$
- $$\omega^n = primalWeightUpdate(z^{n,0}, z^{n-1,0}, \omega^{n-1})$$
- output = $$z^{n,0}$$
adaptivePDHGstep
- $$(x, y) = z^{n,t}$$
- $$\eta = \eta_c^{n,t}$$
- while not done
- $$x' = proj_X(x - \frac{\eta}{\omega^n} (c - K^T y))$$
- $$y' = proj_Y(y + \eta \omega^n (q - K (2x' - x)))$$
- $$\eta_b = \frac{norm(x' - x, y' - y)_{\omega^n}^2}{2(y' - y)K(x' - x)}$$
- $$\eta' = min((1 - (k + 1)^{-0.3})\eta_b, (1 + (k + 1)^{-0.6})\eta)$$
- return $$((x', y'), \eta, \eta')$$ if $$\eta <= \eta'$$
- $$\eta = \eta'$$
restartCandidate
- normalized duality gap for a radius $$r$$ is: $$\rho_r^n(z) = \frac{1}{r} max_{(x',y') = z', norm(z'-z)_{\omega^n} <= r} (L(x, y') - L(x', y))$$
- for a given $$z_{ref}$$ let's write this as: $$\mu_n(z, z_{ref}) = \rho_{norm(z - z_{ref})_{\omega^n}}(z)$$
- then the restart candidate would be
- if $$\mu_n(z^{n,t+1}, z^{n,0}) < \mu_n(z'^{n,t+1}, z^{n,0})$$
- $$z^{n,t+1}$$
- else
- $$z'^{n,t+1}$$
restart criteria
- $$\beta_{sufficient} \epsilon (0, 1)$$
- $$\beta_{necessary} \epsilon (0, \beta_{sufficient})$$
- $$\beta_{artificial} \epsilon (0, 1)$$
- algo inner loop restarts if one of the below conditions hold
- sufficient decay in the normalized duality gap
- $$\mu_n(z_c^{n,t+1}, z^{n,0}) <= \beta_{sufficient} \mu_n(z^{n,0}, z^{n-1,0})$$
- necessary decay + no progress in the normalized duality gap
- $$\mu_n(z_c^{n,t+1}, z^{n,0}) <= \beta_{necessary} \mu_n(z^{n,0}, z^{n-1,0})$$
- and
- $$\mu_n(z_c^{n,t+1}, z^{n,0}) > \mu_n(z_c^{n,t}, z^{n,0})$$
- long inner loop: $$t >= \beta_{artificial} k$$
initializePrimalWeight
- $$\omega_0 = \frac{l2norm(c)}{l2norm(q)}$$ if both numerator and denominator are $$> \epsilon_{zero}$$
- $$\omega_0 = 1$$ else
- $$\epsilon_{zero}$$ = a small nonzero tolerance
primalWeightupdate
- $$\Delta_x^n = l2norm(x^{n,0} - x^{n-1,0})$$
- $$\Delta_y^n = l2norm(y^{n,0} - y^{n-1,0})$$
- if $$\Delta_x^n > \epsilon_{zero} and \Delta_y^n > \epsilon_{zero}$$
- return $$exp(\theta log(\frac{\Delta_y^n}{\Delta_x^n}) + (1 - \theta)log(\omega^{n-1}))$$
- return $$\omega^{n-1}$$ else
- $$\theta \epsilon [0,1]$$ = exponential smoothing parameter (default = 0.5)
Presolve
- detect inconsistent bounds
- delete empty rows/columns of $$K$$
- removing variables with equal lower/upper bounds
- tightening bounds
- removing duplicate rows of $$K$$
diagonal preconditioning
- helps improve convergence
- Idea is to rescale $$K$$ as $$D_1 K D_2$$ with $$D_1$$ and $$D_2$$ being diagonal matrices
- Ruiz scaling
- iterative in nature
- $$D_1(i, i) = \sqrt{infnorm(K_{i,.})}$$
- $$D_2(i, i) = \sqrt{infnorm(K_{.,i})}$$
- Pock-Chambolle
- $$\alpha$$ = parameter $$\epsilon [0, 2]$$
- $$D_1(i, i) = \sqrt{norm(K_{i,.})_{2-\alpha}}$$
- $$D_2(i, i) = \sqrt{norm(K_{.,i})_{2-\alpha}} \forall i \epsilon [1, n]$$
- applies 10 iterations of Ruiz scaling and then apply Pock-Chambolle scaling.