Proposal

Main paper can be found here.

  • First order method for Linear Programming with high accuracy while being amenable to parallelization and scaling to large problems
  • algorithmic improvements to PDHG: adaptive restarting, dynamic primal-dual step size selection, presolving techniques and diagonal preconditioning.
  • however the proposed algo here has no convergence guarantee!

Summary

  • simplex and IPM wont scale well to large scale problems due to their need for matrix factorization
  • primal LP problem is stated as:
    • $$min_x c x$$
    • subject to:
      • $$G x >= h$$
      • $$A x = b$$
      • $$l <= x <= u $$
    • where:
      • $$x \epsilon R^n$$
      • $$c \epsilon R^n$$
      • $$G \epsilon R^{m_1 x n}$$
      • $$h \epsilon R^{m_1}$$
      • $$A \epsilon R^{m_2 x n}$$
      • $$b \epsilon R^{m_2}$$
      • $$l \epsilon R^n$$
      • $$u \epsilon R^n$$
      • $$n$$ = number of variables
  • dual LP problem is then stated as:
    • $$max_y q y + l \lambda^+ - u \lambda^-$$
    • subject to:
      • $$c - K y = \lambda$$
      • $$y_{1:m_1} >= 0$$
      • $$\lambda \epsilon \Gamma$$
      • $$K = (G, A)$$
      • $$q = (h, b)$$
    • where:
      • $$y \epsilon R^{m_1 + m_2}$$
      • $$\lambda \epsilon R^n$$
      • $$\Gamma = product_i \Gamma_i i \epsilon [1,n]$$
      • $$\Gamma_i = 0 if l_i = -\inf, u_i = \inf$$
      • $$\Gamma_i = R^- if l_i = -\inf, u_i \epsilon R$$
      • $$\Gamma_i = R^+ if l_i \epsilon R, u_i = \inf$$
      • $$\Gamma_i = R else$$
      • $$\lambda_+ = max(0, \lambda)$$
      • $$\lambda_- = min(0, \lambda)$$
      • $$R^-$$ = set of non-positive real numbers
      • $$R^+$$ = set of non-negative real numbers
  • this is represented as the following saddle point problem: $$min_x max_y L(x, y) = c x - y K x + q y$$
  • PDHG solves this iteratively as:
    • $$x^{k+1} = proj_X(x^k - \tau (c - K y^k))$$
    • $$y^{k+1} = proj_Y(y^k + \sigma (q - K (2x^{k+1} - x^k)))$$
    • where:
      • $$\tau = \frac{\eta}{\omega}$$ = primal stepsize
      • $$\sigma = \eta \omega$$ = dual stepsize
      • $$\eta \epsilon (0, \inf)$$ = stepsize
      • $$\omega \epsilon (0, \inf)$$ = primal weight
  • PDHG is known to converge for all values of $$\eta < \frac{1}{sn(K)}$$
  • $$sn(.)$$ = matrix spectral norm, estimated via power iteration

PDLP Algorithm

  • $$z^{0,0}$$ = initial solution
  • $$n = 0$$ = outer loop counter
  • $$k = 0$$ = total iterations
  • $$\eta_c^{0,0} = \frac{1}{infnorm(K)}$$
  • $$\omega^0 = initializePrimalWeight()$$
  • repeat until termination
    • $$t = 0$$
    • repeat until restart or termination (this check is done once every 40 iterations to reduce overhead)
      • $$z^{n,t+1}, \eta^{n,t+1}, \eta_c^{n,t+1} = adaptivePDHGstep(z^{n,t}, \omega^n, \eta_c^{n,t}, k)$$
      • $$z'^{n,t+1} = \frac{1}{\Sigma_{i=1}^{t+1} \eta^{n,i}} \Sigma_{i=1}^{t+1} \eta^{n,i} z^{n,i}$$
      • $$z_c^{n,t+1} = restartCandidate(z^{n,t+1}, z'^{n,t+1}, z^{n,0})$$
      • $$t = t + 1$$
      • $$k = k + 1$$
    • $$z^{n+1,0} = z_c^{n,t}$$
    • $$n = n + 1$$
    • $$\omega^n = primalWeightUpdate(z^{n,0}, z^{n-1,0}, \omega^{n-1})$$
  • output = $$z^{n,0}$$

adaptivePDHGstep

  • $$(x, y) = z^{n,t}$$
  • $$\eta = \eta_c^{n,t}$$
  • while not done
    • $$x' = proj_X(x - \frac{\eta}{\omega^n} (c - K^T y))$$
    • $$y' = proj_Y(y + \eta \omega^n (q - K (2x' - x)))$$
    • $$\eta_b = \frac{norm(x' - x, y' - y)_{\omega^n}^2}{2(y' - y)K(x' - x)}$$
    • $$\eta' = min((1 - (k + 1)^{-0.3})\eta_b, (1 + (k + 1)^{-0.6})\eta)$$
    • return $$((x', y'), \eta, \eta')$$ if $$\eta <= \eta'$$
    • $$\eta = \eta'$$

restartCandidate

  • normalized duality gap for a radius $$r$$ is: $$\rho_r^n(z) = \frac{1}{r} max_{(x',y') = z', norm(z'-z)_{\omega^n} <= r} (L(x, y') - L(x', y))$$
  • for a given $$z_{ref}$$ let's write this as: $$\mu_n(z, z_{ref}) = \rho_{norm(z - z_{ref})_{\omega^n}}(z)$$
  • then the restart candidate would be
  • if $$\mu_n(z^{n,t+1}, z^{n,0}) < \mu_n(z'^{n,t+1}, z^{n,0})$$
    • $$z^{n,t+1}$$
  • else
    • $$z'^{n,t+1}$$

restart criteria

  • $$\beta_{sufficient} \epsilon (0, 1)$$
  • $$\beta_{necessary} \epsilon (0, \beta_{sufficient})$$
  • $$\beta_{artificial} \epsilon (0, 1)$$
  • algo inner loop restarts if one of the below conditions hold
  • sufficient decay in the normalized duality gap
    • $$\mu_n(z_c^{n,t+1}, z^{n,0}) <= \beta_{sufficient} \mu_n(z^{n,0}, z^{n-1,0})$$
  • necessary decay + no progress in the normalized duality gap
    • $$\mu_n(z_c^{n,t+1}, z^{n,0}) <= \beta_{necessary} \mu_n(z^{n,0}, z^{n-1,0})$$
    • and
    • $$\mu_n(z_c^{n,t+1}, z^{n,0}) > \mu_n(z_c^{n,t}, z^{n,0})$$
  • long inner loop: $$t >= \beta_{artificial} k$$

initializePrimalWeight

  • $$\omega_0 = \frac{l2norm(c)}{l2norm(q)}$$ if both numerator and denominator are $$> \epsilon_{zero}$$
  • $$\omega_0 = 1$$ else
  • $$\epsilon_{zero}$$ = a small nonzero tolerance

primalWeightupdate

  • $$\Delta_x^n = l2norm(x^{n,0} - x^{n-1,0})$$
  • $$\Delta_y^n = l2norm(y^{n,0} - y^{n-1,0})$$
  • if $$\Delta_x^n > \epsilon_{zero} and \Delta_y^n > \epsilon_{zero}$$
    • return $$exp(\theta log(\frac{\Delta_y^n}{\Delta_x^n}) + (1 - \theta)log(\omega^{n-1}))$$
  • return $$\omega^{n-1}$$ else
  • $$\theta \epsilon [0,1]$$ = exponential smoothing parameter (default = 0.5)

Presolve

  • detect inconsistent bounds
  • delete empty rows/columns of $$K$$
  • removing variables with equal lower/upper bounds
  • tightening bounds
  • removing duplicate rows of $$K$$

diagonal preconditioning

  • helps improve convergence
  • Idea is to rescale $$K$$ as $$D_1 K D_2$$ with $$D_1$$ and $$D_2$$ being diagonal matrices
  • Ruiz scaling
    • iterative in nature
    • $$D_1(i, i) = \sqrt{infnorm(K_{i,.})}$$
    • $$D_2(i, i) = \sqrt{infnorm(K_{.,i})}$$
  • Pock-Chambolle
    • $$\alpha$$ = parameter $$\epsilon [0, 2]$$
    • $$D_1(i, i) = \sqrt{norm(K_{i,.})_{2-\alpha}}$$
    • $$D_2(i, i) = \sqrt{norm(K_{.,i})_{2-\alpha}} \forall i \epsilon [1, n]$$
  • applies 10 iterations of Ruiz scaling and then apply Pock-Chambolle scaling.