% vortrag aussois  Januar 2004
\documentclass[color,landscape,12pt]{article} 


\usepackage{color}

\definecolor{blau}{rgb}{0,0,1}
\definecolor{gruen}{rgb}{0.2,.6 ,0.3}
\definecolor{rot}{rgb}{1,0,0}

\newcommand{\Blau}[1]{\textcolor{blau}{#1}}
\newcommand{\Gruen}[1]{\textcolor{gruen}{#1}}
\newcommand{\Rot}[1]{\textcolor{rot}{#1}}

% --------- pdflatex helper declarations  
\newif\ifpdf
\ifx\pdfoutput\undefined
    \pdffalse    % we are not running pdfLaTeX
\else
    \pdfoutput=1 % we are running pdfLaTeX
    \pdftrue
\fi

\ifpdf
   \usepackage[pdftex, colorlinks, bookmarks]{hyperref}
   \usepackage[pdftex,final]{graphicx}
   \pdfcompresslevel=5  % 0: no compression, 9: highest compression,
                        % or, set compress_level in file pdftex.cfg
   \usepackage{times}   % produce readable fonts
\else
   \usepackage{epsf}
   \usepackage[dvips,final]{graphicx}
\fi
% --------- 





\oddsidemargin -12pt
\evensidemargin -12pt
\topmargin -27pt
\textheight 470pt  %% 635
\textwidth 670pt   %% 460

\begin{document}

\newcommand{\diag}{\mbox{diag}}
\newcommand{\Diag}{\mbox{Diag}}
\newcommand{\tr}{\mbox{tr}}
\newcommand{\R}{I\!\!R}
\def \ni {\noindent}

\newtheorem{thm}{Theorem}
\newtheorem{lem}[thm]{Lemma}
\newtheorem{cor}[thm]{Corollary}
\newtheorem{prop}[thm]{Proposition}
\newtheorem{rem}[thm]{Remark}
\newtheorem{expl}[thm]{Example}


\newcommand{\epr}{\hfill $\Box$}
\newcommand{\bpr}{{\it Proof.} }


\title{\Blau{
Semidefinite Programming: Algorithms (Part 2)}}
\author{
{\Rot{Franz Rendl}}
\\
Institut f\"ur Mathematik \\
Universit\"at Klagenfurt \\
A - 9020 Klagenfurt, Austria \\
{\tt franz.rendl@uni-klu.ac.at}}

\date{Waterloo, May 2004}


\maketitle


\eject


\begin{Large}

% --------------------------------
% generic page
% \centerline{\Blau{
% Generic title} }
%
% \ni \Rot{ Linearize:}
%
% $$x^{T}Lx = \langle L, (xx^{T}) \rangle = \langle L, X \rangle$$
%
% \ni new variable is $X$:
%
% \vfill
%{\Rot{ Poljak, Rendl (1995): Primal-dual formulation.}}
%
%\eject
% end generic page
% ---------------------------------



\centerline{\Blau{
Practical Performance of Interior-Point Methods for SDP (1)} }

\bigskip
$$~~ \max \{ \langle C,X \rangle: A(X)=b, X \succeq 0 \} =
   ~~ \min \{ b^Ty: A^T(y) - C = Z \succeq 0 \}
$$

\ni \Rot{Primal-Dual Path-following Methods:}

\ni At start of iteration:
$(X \succ 0, y, Z \succ 0)$

\ni Linearized system to be solved for $(\Delta X, \Delta y, \Delta Z)$:
$$A( \Delta X) = r_P := b - A(X)     \mbox{~~~~ primal residue}$$
$$A^T(\Delta y) -\Delta Z = r_D := Z + C - A^T(y) 
\mbox{~~~~ dual residue}$$
$$Z \Delta X + \Delta Z X = \mu I -ZX \mbox{ ~~~~path residue}$$

\medskip
\ni
The last equation can be reformulated in many ways, which all 
are derived from the complementarity condition
\Rot{$ ZX = 0 $}

\medskip
\ni
This is not a square linear system in $(\Delta X, \Delta y, \Delta Z)$
because there are $$2 {n+1 \choose 2} + m $$ variables
but the number of equations is $${n+1 \choose 2} + n^2 + m.$$

%\vfill
%{\Rot{ see also: R.D.C. Monteiro, Math. Programming 97 (2003), 209ff}}

\eject

\centerline{\Blau{
Practical Performance of Interior-Point Methods for SDP (2)} }

\bigskip
\ni {\Rot{Direct approach with partial elimination:}}

\ni
Using the second and third equation to eliminate $\Delta X$ and 
$\Delta Z$, and substituting into the first gives

$$\Delta Z = A^T(\Delta y) - r_D, ~~\Delta X = \mu Z^{-1} - X -Z^{-1}
\Delta Z X,$$

\ni
and the final system to be solved:
$$\Rot{A( Z^{-1} A^T( \Delta y)X) = \mu A(Z^{-1}) - b + A(Z^{-1} r_D X)
}$$

\ni
Computational effort:
$$\bullet \mbox{ determine explicitely } Z^{-1}  ~~~O(n^3)$$
$$\bullet \mbox{ several matrix multiplications } ~~~O(n^3)$$
$$\bullet \mbox{ final system of order } m \mbox{ to compute } \Delta y
~~~O(m^3)$$
$$\bullet \mbox{ forming the final system matrix } ~~O(m n^3 + m^2n^2)$$
$$\bullet \mbox{ line search to determine } X^+ := X + t \Delta X, 
Z^+ := Z + t \Delta Z  ~~\mbox{ is at least } O(n^3)$$

\medskip
Effort to determine system matrix depends on structure of $A(.)$


\eject

\centerline{\Blau{
Practical Performance of Interior-Point Methods for SDP (3)} }

\bigskip
\ni {\Rot{Example 1: SDP Relaxation for Max-Cut:}}

$$~~ \max \{ \langle C,X \rangle: \diag(X)=e, X \succeq 0 \} =
   ~~ \min \{ e^Ty: \Diag(y) - C = Z \succeq 0 \}
$$


$$\mbox{ Here: } m = n, ~A(X) = \diag(X), ~ A^T(y) = \Diag(y)$$

\ni 
and the system matrix becomes
$$\diag( Z^{-1} \Diag( \Delta y) X) = (Z^{-1} \circ X) \Delta y.$$

\ni
It can be computed in \Gruen{ $O(n^2)$ } 

  \begin{center} 
  \begin{Large}
  \begin{tabular}{|rr|}
\hline
 $n$  &  seconds    \\
\hline
%      200 &    1.84 \\
      400 &    8.92 \\
      600 &   24.10 \\
      800 &   51.45 \\
     1000 &   99.27 \\
     1500 &  314.99 \\
     2000 &  714.21 \\
\hline
    \end{tabular}
    \end{Large}

\bigskip \medskip

   \centerline{ \Large{      Computation times (seconds) 
to solve the SDP  on a PC (Pentium 4, 1.7 Ghz).} }
  \end{center}




\vfill
\Rot{ see Helmberg, Rendl, Vanderbei, Wolkowicz: SIOPT (6) 1996, 342ff}


\eject

\centerline{\Blau{
Practical Performance of Interior-Point Methods for SDP (4)} }

\bigskip
\ni {\Rot{Example 2: Lovasz Theta function:}}

\medskip
\ni
Given a graph $G= (V,E)$ with $|V|=n, ~|E|=m$.
$$~~ \max \{ \langle J,X \rangle: \tr(X)=1, \Rot{x_{ij}=0~ \forall (ij) \in E},
~X \succeq 0 \} 
$$
\ni
Here the number of constraints depends on the edge set $|E|$.

\bigskip

\ni
{\Rot{If $m >> n^2/4$ then system size impractical. }}

\bigskip
\ni
{\Gruen{Use explicit representation of $X$, i.e.
express $X$ through main diag and non-edge variables }}

\medskip

\ni
This gives final system of size $n^2/2- m$ which is smaller than
$m$.
 
\bigskip

\ni
This allows to compute $\vartheta(G)$ for very sparse and
very dense graphs. The computationally difficult class are
graphs with $m \approx n^2/4$, i.e. about half the possible
edges are present.

\vfill
\Rot{see: Dukanovic, Rendl, technical report, Klagenfurt 2004}

\eject

\centerline{\Blau{
Practical Performance of Interior-Point Methods for SDP (5)} }

\bigskip
\ni {\Rot{Iterative solution of linear system:}}

\medskip
\ni
To avoid computing $Z^{-1}$ explicitely, and forming the
system matrix, one could use iterative methods to compute
$(\Delta X, \Delta y, \Delta Z)$.

\bigskip
\ni
{\Rot{Preconditioned Conjugate Gradient method}}


%\vfill
%\Rot{see: Dukanovic, Rendl, technical report, Klagenfurt 2004}

\eject

\centerline{\Blau{
Spectral Bundle Method for SDP (1)} }

\bigskip
\ni {\Rot{Dual as eigenvalue optimization problem:}}

\medskip
\ni
Assume that $A(X)=b$ implies that $\tr(X)=a>0$.
(Holds for many combinatorially derived SDP!)

\medskip

\ni
Reformulate dual
$$
   ~~ \min \{ b^Ty: A^T(y) - C = Z \succeq 0 \}
$$
\ni
as follows.
Adding (redundant) primal constraint $\tr(X)=a$ introduces
new dual variable, say $\lambda$, and dual becomes

$$
   ~~ \min \{ b^Ty + a \lambda: A^T(y) - C  +\lambda I= Z \succeq 0 \}
$$

\ni
At optimality, $Z$ is singular, hence $\lambda_{\min}(Z) =0$.

\medskip
\ni
{\Rot{compute dual variable $\lambda$ explicitely:}}
$$\lambda_{\max}(-Z) = \lambda_{\max}(C-A^T(y)) -\lambda = 0, 
\Rightarrow
{\Gruen{  \lambda = \lambda_{\max}(C-A^T(y)) } } $$

\ni
Dual equivalent to
{\Rot{$$\min a \lambda_{\max}(C- A^T(y)) +b^Ty: y \in \R^m  $$}}

\medskip
\ni
This is non-smooth unconstrained convex problem in $y$.


\eject

\centerline{\Blau{
Spectral Bundle Method for SDP (2)} }

\bigskip
\ni {\Rot{Minimizing $f(y) = \lambda_{\max}(C-A^T(y)) + b^Ty$: }}

\medskip
\ni
Note: Evaluating $f(y)$ at $y$ amounts to computing largest eigenvalue
of $C-A^T(y)$.

\medskip
\ni
Can be done by iterative methods for very large (sparse) matrices.

\medskip
\ni
If we have some $y$, how do we move to a better point?

\medskip
\ni
{\Rot{$$\lambda_{\max}(X) = \max \{\langle X, W \rangle: ~\tr(W) = 1,
~ W \succeq 0   \}  $$ }}
\ni Define
$$L(W,y) := \langle C-A^T(y),W \rangle + b^Ty.  $$

\ni
Then $f(y) = \max \{ L(W,y): ~ \tr(W)=1, ~ W \succeq 0 \}.$

\bigskip
\ni
{\Gruen {Idea 1: Minorant for $f(y)$}}

\medskip
\ni
Fix some $m \times k$ matrix $P$. $k\geq 1$ can be chosen arbitrarily.
The choice of $P$ will be explained later.

\ni
Consider $W$ of the form {\Rot{$W = PVP^T$ }} with new $k \times k$
matrix variable $V$.
$$ \hat{f} (y) := \max \{L(W,y):~ W=PVP^T,~ V \succeq 0   \} ~\leq ~f(y) $$

\eject

\centerline{\Blau{
Spectral Bundle Method for SDP (3)} }

\bigskip
\ni {\Gruen{Idea 2: Proximal point approach}}

\medskip
\ni
The function $\hat{f}$ depends on $P$ and will be a good approximation
to $f(y)$ only in some neighbourhood of the current iterate $\hat{y}$.

\ni
Instead of minimizing $f(y)$ we minimize 
{\Rot{ $$\hat{f}(y) + \frac{u}{2}\|y - \hat{y} \|^2.$$  }}
\ni
This is a strictly convex function, if $u>0$ is fixed.

\ni
Substitution of definition of $\hat{y}$ gives
$$ {\Rot{
\min_y \max_W L(W,y) + \frac{u}{2} \|y -\hat{y} \|^2 =  \ldots
}} $$
$$  {\Rot{
= \max_{W, ~y = \hat{y} + \frac{1}{u}(A(W) -b)} L(W,y) 
+ \frac{u}{2} \|y -\hat{y} \|^2
} } $$
$$ {\Rot{
= \max_W \langle C - A^T(\hat{y}), W \rangle + b^T\hat{y} 
- \frac{1}{2u} \langle A(W) - b, A(W) -b \rangle.
}} $$

\ni
Note that this is a quadratic SDP in the $k \times k$ matrix $V$.
\ni
Once $V$ is computed, we get with $W=PVP^T$ that 
${\Gruen{ y = \hat{y} + \frac{1}{u}(A(W) -b)}}$
\vfill
\Rot{see: Helmberg, Rendl: SIOPT 10, (2000), 673ff}

\eject
\centerline{\Blau{
Spectral Bundle Method for SDP (4)} }

\bigskip
\ni {\Gruen{Update of $P$:}}

\medskip
\ni
Having new point $y$, we evaluate $f$ at $y$ (sparse eigenvalue
computation), which produces also an eigenvector $v$ to 
$\lambda_{\max}$.

\ni
The vector $v$ is added as new column to $P$, and $P$ is purged
by removing unnecessary other columns.

\bigskip
\ni
{\Rot{
Convergence is slow, once close to optimum
 }}


\medskip
\ni
Can approximately solve SDP with quite large matrices,
$n \approx 5000$.

\vfill
{\Rot{see also DIMACS challenge for SDP - DIMACS web-page}}

\eject

\centerline{\Blau{
Bundle methods and  SDP   (1)} }

\bigskip
\ni {\Rot{
Dealing with SDP with too many inequality constraints }}

\bigskip
\ni
The number $m$ of equality constraints is clearly always
less than ${n+1 \choose 2}$, because this is the dimension
of the space of $S_n$.

\medskip
\ni 
{\Rot{ But there can be an arbitrary number of inequality constraints !}}

\medskip
\ni
If we do not know whether a contraint is active or not, we introduce
a nonnegative slack variable and make the constraint an equality.

\medskip
\ni
This increases also the dimension of the space (we added one more variable).

\bigskip
\ni
Many SDP from combinatorial optimization can be tightened 
by introducing combinatorial cutting planes (=linear inequalities)

\bigskip 

\ni
{\Gruen{Example: Max-Cut SDP Relaxation with Triangle inequalities}}

\eject

\centerline{
\Blau{Triangle inequalities for Max-Cut:}}

\bigskip
\ni
A simple observation: if $x$ is an arbitrary cut-vector:
$$x \in \{-1,1\}^{n}, ~~f = 
\left( \begin{array}{c} 1 \\ 1\\ 1\\ 0 \\ \vdots \\ 0 \end{array} \right)
~~ \Rightarrow |x^{T}f| \geq 1
$$
Translated to $X=xx^T$:
$$x^{T}f ~ f^{T}x = \langle(xx^{T}),(ff^{T})\rangle = 
\Rot{  \langle X, ff^{T} \rangle \geq 1}$$

\ni
Can be applied to any {\bf triangle } $i <j < k$.
Nonzero elements of $f$ can also be -1.
This gives \Rot{$4 {n \choose 3}$}
linear inequalities.

\medskip \ni
{\Blau{ Triangle Relaxation}}
\Rot{
$$x_{ij} + x_{ik} + x_{jk} \geq -1 ~~~~~~x_{ij} -x_{ik} - x_{jk} \geq -1
~~~~\forall i<j<k$$
}

\vfill
\Rot{Deza, Laurent: Hypermetric Inequalities, Padberg: 
Quadric Boolean Polytope}
\eject


\bigskip
\centerline{
\Blau{Dealing with the Triangle Relaxation}}

\medskip

%\begin{table}[t]
  \begin{center}
  \begin{Large}
  \begin{tabular}{|r|r|r|}
\hline
  $n$ & $n \choose 2$ & $4{n \choose 3}$     \\
\hline
  50  & 1.225  & 78.400 \\
  100 & 4.950 & 646.800 \\
  200 & 19.900 & 5.253.600 \\
  500 & 124.750 & 82.834.000 \\
\hline
    \end{tabular}
    \end{Large}

\bigskip
\medskip

   \centerline{Triangle constraints as $n$ increases }
%    \label{sdp-met-times}
  \end{center}
%\end{table}

\medskip
\ni
Only $n \choose 2$ constraints determine off-diagonal part of $X$.
Good candidates for active constraints ??

%\bigskip
\ni
Explicitely maintaining all these constraints is infeasible 
for  $n \geq 25.$

\medskip
\ni
Computation times with only a limited number of triangle inequlities 
included.

\centerline{
\Rot{Computation times for SDP with $k$ triangles included}}

%\bigskip
%\begin{table}[t]
  \begin{center}
\begin{Large}
    \begin{tabular}{|r|r|r|r|}
\hline
  & $n=100$ & $n=200$ & $n=300$     \\
\hline
$k=500$  &  21 (19)  &  34 (20) &  49 (19) \\
$k=1000$ & 103 (21)  & 136 (22) & 164 (21) \\
$k=1500$ & 304 (24)  & 358 (24) & 422 (24) \\
$k=2000$ & 643 (25)  & 763 (26) & 816 (24) \\
$k=2500$ &1090 (24)  &1313 (26) &1360 (24) \\
\hline
    \end{tabular}
    \end{Large}
  \end{center}

\ni
Computation times (seconds) on a PC (Pentium 4, 1.7 GHz)
to compute the semidefinite relaxation of Max-Cut for a graph
with $n$ nodes and $k$ triangle inequalities. 
The number of interior point iterations is given in parentheses. 

\eject

\centerline{\Blau{
Bundle methods and  SDP   (2)} }


\medskip
\ni
If there are too many inequality constraints, we can look at their
{\Rot{Lagrangian Dual}}. 

\ni
Maintain only part of constraints explicitely 
{\Gruen{$$X \in F := \{ X: \diag(X)=e, ~ X \succeq 0\} $$ }}

\ni
Remaining constraints (\Rot{$A(X) \leq b$}) are dualized through Lagrangian:
\Gruen{ $$L(X, \gamma) = \langle C, X \rangle + \gamma^{T}(b-A(X))$$}
$$\max_{X \in F,~ A(X)\leq b} \langle C,X \rangle = 
\max_{X \in F} \min_{\gamma \geq 0} L(X,\gamma) $$ 

Dual functional:
$$f(\gamma):= \max_{\Gruen{X \in F}} L(X,\gamma)$$ 

\bigskip

Minimizing $f$ is equivalent to original problem

$f$ is convex, but non-smooth


\eject

\centerline{\Blau{
Bundle methods and  SDP   (3)} }

\bigskip

\Blau{Bundle methods:}

\ni
minimize $f$ using function and subgradient evaluation only

\ni
Note: function evaluation means solving over \Gruen{$X \in F$}.

\ni
\Gruen{Big graphs (from Helmberg)}.

\ni
The number of bundle iterations is 50 for $n=800$, and 30 
for $n=2000$.


  \begin{center}\begin{Large}
    \begin{tabular}{|r|rr|r|rr|rr|r|r|}

\hline
problem  &  $n$  & $|E|$ & cut   & 
initial bd  &gap (\%) & final bd &gap (\%) & $m$ & time \\
\hline
G1  & 800 & 19176 & 11612 & 12083.2 &  4.06 & 12005.4 & 3.39 &  7372 & 51.76 \\
G6  & 800 & 19176 &  2172 & 2656.2 & 22.29 &  2566.2 & 18.15 & 6983 & 43.11 \\
G11 & 800 &  1600 &   564 &  629.2 & 11.56 &   572.7 & 1.54 & 15946 & 60.20 \\
G14 & 800 &  4694 &  3054 & 3191.6 &  4.51 &  3140.7 & 2.84 &  8973 & 59.68 \\
G18 & 800 &  4694 &   985 & 1166.0 & 18.38&  1063.4 &  7.96& 17635& 69.19 \\
\hline
\hline
G22 & 2000 & 19990 &13293 &14135.9 &  6.34 & 14045.8 & 5.66 & 18325 & 278.06\\ 
G27 & 2000 & 19990  &3293 & 4141.7 & 25.77 & 4048.4 & 22.94 & 15178 & 406.66\\
G39 & 2000 & 11779  &2373 & 2877.7 & 21.27 & 2672.7 & 12.63 & 26471 & 533.36 \\
\hline
    \end{tabular} 
\end{Large}
  \end{center}


\vfill
\Rot{These are currently the best bounds for 
these problems, see Fischer et al., technical report, Klagenfurt, 2004}

\eject





\eject

\centerline{\Blau{
Approximation results using SDP} }

\bigskip

\bigskip
\ni 
{\Rot{
Goemans-Williamson hyperplane rounding technique for Max-Cut}}

\ni
{\Rot{
Nesterov Analysis for Max-Cut}}

\ni
{\Rot{
Karger-Motwani-Sudan technique for Graph Coloring
}}




\end{Large}


\end{document}
