\documentclass[leqno,12pt]{article}

% ==== pacakges =====
\usepackage[pdftex]{graphicx}
\usepackage{latexsym}
%\usepackage{changebar}
%\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
%\usepackage{theorem}
\usepackage{cite}
\usepackage{color}
%\usepackage[notref,notcite]{showkeys} % For debugging, to be commented !
\newcommand{\RR}{\ensuremath{\mathbb R}}

\newcommand{\NN}{\ensuremath{\mathbb N}}
\newcommand{\di}{\ensuremath{\operatorname{dist}}}


\usepackage[margin=1.3in, top=1.3in, bottom=1.3in]{geometry}

% ==== newcommands =====
\newcommand{\nc}{\newcommand}
\nc{\nt}{\newtheorem}
\nt{thm}{Theorem}[section]
\nt{cor}[thm]{Corollary}
\nt{prop}[thm]{Proposition}
\nt{lem}[thm]{Lemma}
\nt{defn}[thm]{Definition}
\nt{rem}[thm]{Remark}
\nt{exa}[thm]{Example}
\nt{ass}[thm]{Assumption}
\nc{\ip}[2]{\mbox{$\langle #1,#2 \rangle$}}
\nc{\pf}{\noindent{\bf Proof\ \ }}
\nc{\finpf}{\hfill{$\Box$}\linespace}
\nc{\linespace}{\vspace{\baselineskip} \noindent}
\nc{\R}{{\bf R}}
\nc{\bR}{{\overline{\R}}}
\nc{\C}{{\bf C}}
\nc{\E}{{\bf E}}
\nc{\Y}{{\bf Y}}
\nc{\RE}{\mbox{\rm Re}\,}
\nc{\Rn}{{\bf R}^n}
\nc{\Mn}{{\bf M}^n}
\nc{\bx}{\bar{x}}
\nc{\by}{\bar{y}}
\nc{\e}{\epsilon}
\nc{\inT}{\mbox{\rm int}\,}
\nc{\cl}{\mbox{\rm cl}\,}
\nc{\gph}{\mbox{\rm gph}\,}
\nc{\dist}{\mbox{\rm dist}}
\nc{\proj}{\mbox{\rm proj}}
\nc{\disp}{\mbox{\rm disp}}
\nc{\conv}{\mbox{\rm conv}\,}
\nc{\range}{\mbox{\rm rge}\,}
\nc{\rt}{\rightarrow}
\nc{\tra}{\mbox{\rm tr}\,}
\nc{\ri}{\mbox{\rm ri}\,}
\nc{\rb}{\mbox{\rm rb}\,}
\nc{\spann}{\mbox{\rm span}\,}

\nc{\ap}{a^{+}}
\nc{\bp}{b^{+}}
\def\ck{{\mathcal K}}

\def\del{\delta}
\def\slof{\overline{|\nabla f|}}


\nc{\xbar}{{\overline{x}}}
\nc{\vbar}{{\overline{v}}}
\nc{\yhat}{{\widehat{y}}}
\nc{\xhat}{{\widehat{x}}}

\nc{\und}{\quad\mbox{ and }\quad}
\nc{\emp}{\ensuremath{\varnothing}}

\def\tto{\;{\lower 1pt \hbox{$\rightarrow$}}\kern -12pt
           \hbox{\raise 2.8pt \hbox{$\rightarrow$}}\;}
\newenvironment{myequation}{\setcounter{equation}{\value{thm}}
   \begin{equation}}{\addtocounter{thm}{1}\end{equation}}
\newenvironment{myeqnarray}{\setcounter{equation}{\value{thm}}
    \begin{eqnarray}}{\setcounter{thm}{\value{equation}}\end{eqnarray}}
\renewcommand{\theequation}{\arabic{section}.\arabic{equation}}
\nc{\bmye}{\begin{myequation}}
\nc{\emye}{\end{myequation}}

\newcommand{\orth}[1]{{#1}^{\perp}}
\newcommand{\trans}[1]{{#1}^{\top}}

\newcommand{\ra}{\rightarrow}
\newcommand{\norm}[1]{\|#1\|}
\newcommand{\norminf}[1]{\norm{#1}_{\infty}}
\DeclareMathOperator{\blkdiag}{blkdiag}

% Henry's additional definition
\newenvironment{noteH}{\begin{quote}\small\sf HW $\clubsuit$~}{\end{quote}}

% ==== beginning ====
\begin{document}
\title{
Alternating projections and Sturm's error bounds in semidefinite programming}
\author{
Dmitriy Drusvyatskiy\thanks{Department of Mathematics, University of Washington, Seattle, WA 98195; Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1; \texttt{people.orie.cornell.edu/dd379}. Research supported by AFOSR.}
\and
Guoyin Li\thanks{Department of Applied Mathematics, University of New South Wales, Sydney 2052, Australia; \texttt{web.maths.unsw.edu.au/{\raise.17ex\hbox{$\scriptstyle\sim$}}gyli}. Research supported by an ARC Future Fellowship.}
\and
Henry Wolkowicz\thanks{Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1; \texttt{orion.uwaterloo.ca/{\raise.17ex\hbox{$\scriptstyle\sim$}}hwolkowi}. Research supported by The Natural Sciences and Engineering Research Council of Canada and AFOSR.}
}
\maketitle

\begin{abstract}
We observe that Sturm's error bounds readily imply that for semidefinite
programs, the method of alternating projections converges at a rate of
$\mathcal{O}\Big(k^{-\frac{1}{2^{d+1}-2}}\Big)$, where $d$ is the singularity
degree of the problem ---  the minimal number of facial reduction iterations needed to induce Slater's condition. Consequently, for almost all (in the sense of Lebesgue measure) semi-definite programs, alternating projections converge at a worst-case sublinear rate $\mathcal{O}\Big(\frac{1}{\sqrt{k}}\Big)$. We then use alternating projections to numerically test whether Sturm's error bounds are tight.
\end{abstract}
\medskip

\noindent{\bf Key words:} Error bounds, regularity, alternating projections, sublinear convergence, linear matrix inequality, semi-definite program
\medskip

\noindent{\bf AMS 2000 Subject Classification:} 90C22, 65K10, 49M20
\tableofcontents
\listoffigures

\section{Introduction}
%\begin{noteH}
%	look at ??? \cite{MR2160671}?????
%\end{noteH}
\begin{noteH}
Should we add a comment early about ????
... linear rate for subspaces? e.g. Aronszajn
result that convergence rate is linear    with rate $(\cos \theta)^2$; 
$\leq (\cos \theta)^{2k-1} \|P_{psd}\|$
and convergence is linear in convex case provided  intersection of
rel-int nonempty ... B-B '93????
\end{noteH}
In this work, we revisit a basic result of semi-definite programming due to Sturm \cite{sturm}: denoting by $\mathcal{V}$ an affine subspace of symmetric matrices having a nonempty intersection with the positive semi-definite cone $\mathcal{S}^n_+$, the semi-definite feasibility problem
\begin{equation*}
X\in\mathcal{V}\cap \mathcal{S}^n_+
\end{equation*}
always admits a {\em H\"{o}lder error bound}, meaning that on any
compact subset $U$ of $\mathcal{S}^n_+$, the distance of any putative
solution $X\in U$ from the true solution set $\mathcal{V}\cap
\mathcal{S}^n_+$ is bounded by a multiple of a certain power of the
distance of $X$ from the affine space $\mathcal{V}$ and from
$\mathcal{S}^n_+$, separately. Most interestingly, Sturm showed that the
power ({\em H\"{o}lder exponent}) can be set to $\frac{1}{2^d}$, where
$d$ is the {\em singularity degree} of the problem ---  the minimal number of {\em facial reduction} iterations needed to induce Slater's condition. For a discussion on facial reduction see the original work \cite{face_henry} or the more recent manuscripts \cite{gabor,MW13}. What is striking here is that the exponent {\em only} depends on the singularity degree, and not say on the size or the rank of the matrices. 

In this short note, we combine Sturm's error bounds with the recent work \cite{BLY} to conclude that the classical method of alternating projections (that of von Neumann \cite{N_proj}) converges at a rate of $\mathcal{O}\Big(k^{-\frac{1}{2^{d+1}-2}}\Big)$, where $d$ is the singularity degree of the problem. This is not very surprising, since the sublinear rate at which alternating projections converge clearly has to do with the H\"{o}lder regularity of the intersection $\mathcal{V}\cap\mathcal{S}^n_+$, a fact made precise in \cite{BLY}. Nonetheless, this is notable: contrary to the usual mantra that alternating projections converge ``arbitrarily slowly'', for semi-definite feasibility problems the rate of convergence is very specific. Many problems of interest, especially those that are degenerate only due to poor modeling choices, have singularity degree at most one. Consequently, for such problems, the method converges at the worst-case rate $\mathcal{O}\Big(\frac{1}{\sqrt{k}}\Big)$. Moreover, we will show that this worst-case rate is, in a precise mathematical sense, typical for semi-definite problems (even ones that are infeasible). Coming back full circle, we complete the paper by using alternating projections to numerically test whether Sturm's error bounds are tight.

The outline of the paper is as follows. In Section~\ref{sec:sublin}, we
discuss convergence guarantees of the method of alternating projections,
while in Section~\ref{sec:typ}, we observe that typically the singularity degree is no more than one. Finally in Section~\ref{sec:num}, we use alternating projections to numerically test whether Sturm's error bounds are tight.


\section{Sublinear convergence of alternating projections}\label{sec:sublin}
Consider a Euclidean space $\E$ (finite-dimensional real inner product space), along with an inner product $\langle\cdot,\cdot \rangle$ and the induced norm $|\cdot|$. 
Given a convex set $Q\subset \E$, we define the {\em distance function} 
$$\dist(x,Q):=\min_{y\in Q}|x-y|$$ 
and the {\em projection mapping}
$$\proj(x,Q):=\{y\in Q: |x-y|=\dist(x,Q)\}.$$ 
We let $\cl Q$, $\ri Q$, and $\rb Q$ denote the closure, relative interior, and relative boundary of $Q$, respectively. It is standard that if $Q$ is closed, then $\proj(x,Q)$ is a singleton. 

For the rest of the paper, we will consider two closed, convex sets $A$ and $B$ and focus on the feasibility problem: 
\begin{equation}\label{eqn:main_prob}
\textrm{ find some point}\quad x\in A\cap B.
\end{equation}
When working with an infeasible problem, it is useful to define the {\em displacement vector}, denoted
$\disp(A,B)$, to be the minimal norm element of $\cl (A-B)$. Observe that
$A-B$ may not be closed (e.g.,~the difference of the Lorentz cone $\{(x,y,r): |(x,y)|\leq r\}$ and a span of one of its bounding rays), and therefore the closure operation may be necessary. We say that the pair $(A,B)$ is {\em weakly infeasible} if the origin lies in the closure $\cl (A-B)$ but not in the difference $A-B$ itself. Weak infeasibility can be a point of concern in numerical optimization since this property is difficult to detect. When the vector $\disp(A,B)$ is attained, meaning that $\disp(A,B)$ actually lies in $A-B$, we have for any points $a\in A$ and $b\in B$ the equivalence 
$$a-b=\disp(A,B)\qquad\Longleftrightarrow \quad a-b\in N_B(b) \quad \textrm{and}\quad b-a\in N_A(a).$$
Here $$N_A(a)=\{v: \langle v,x-a\rangle \leq 0 \textrm{ for all } x\in A\}$$ is the usual normal cone of convex analysis. See \cite[Facts 1.1]{Heinz_Set} for details.

This note revolves around the {\em method of alternating projections} for solving the feasibility problem \eqref{eqn:main_prob}. Given a current point $a_k\in A$, the method simply iterates the following two steps
\begin{align*}
&\textrm{choose}\quad b_{k}	\in \proj_B(a_k)\\
&\textrm{choose}\quad a_{k+1}\in \proj_A(b_{k}).
\end{align*}
In studying the convergence rate of the method, the following stability property of the intersection $A\cap B$ appears naturally. 
\begin{defn}[H\"{o}lder regularity]
{\rm
Consider two closed, convex subsets $A$ and $B$ of $\E$. We say that the pair $(A,B)$ {\em is} $\gamma${\em-H\"{o}lder regular} if for any compact set $U$, there is a constant $c> 0$ so that  
$$\dist(x,A\cap B)\leq c\cdot\Big(\dist^{\gamma}(x,A)+\dist^{\gamma}(x,B)\Big)\qquad \textrm{ for all }x\in U.$$
We say that $(A,B)$ {\em is} $\gamma${\em-H\"{o}lder regular, up to displacement,} if $\big(A-\disp(A,B),B\big)$ is $\gamma$-H\"{o}lder regular.
}
\end{defn}


\begin{noteH}
???why clearly???attained????can we add - ``by compactness assumption'' ???
\end{noteH}
Clearly when $(A,B)$ is $\gamma$-H\"{o}lder regular, the intersection $A\cap B$ must be nonempty.
Moreover, if the pair $(A,B)$ is $\gamma$-H\"{o}lder regular, up to displacement, the displacement vector $\disp(A,B)$ must be attained by some points $a\in A$ and $b\in B$.

Basic convergence guarantees (with no rate) of alternating projections appear in \cite{conv_alt}. 
\begin{noteH}
	should a reference to classical Von Neumann result be added????
\end{noteH}
Linear convergence under linear regularity is discussed in \cite{Heinz_Set,proj_surv,BBJAT1}.
A sublinear convergence rate of alternating projections under H\"{o}lder regularity was proved in \cite{BLY}, in part using techniques of \cite{proj_surv} and \cite[Lemmas 3 and 4]{GPR}. Since the result and its proof are somewhat scattered throughout the text \cite{BLY}, we provide a proof sketch of the salient points for the reader. 
\begin{thm}{\bf (Convergence rate of alternating projections)}\label{TheAPM:1}
Consider two closed convex sets $A$ and $B$ in $\E$, and let $\{a_k,b_k\}$ be a sequence of iterates generated by alternating projections.
 Then exactly one of the following two situations holds:
 \begin{itemize}
 \item[{\rm (1)}] The iterates $\{a_k\}$ and  $\{b_k\}$ are unbounded in norm, in which case the infimum $\inf\{|a-b|: a\in A, b\in B\}$ is not attained.
 \item[{\rm (2)}] There exist points $\bar{a}\in A$ and $\bar{b} \in B$ satisfying $a_k \to \bar{a}$ and $b_k \to \bar{b}$ and $\bar{a}-\bar{b}=\disp(A,B)$. 
\end{itemize}
In the second case, if the pair $(A, B)$ is $\gamma$-H\"{o}lder regular, up to displacement, then the sequence $\{a_k,b_k\}$ converges at the sublinear rate
\begin{equation}
	\label{eq:sublrate}
\max\{|a_k-\bar{a}|, |b_k-\bar{b}|\}=\mathcal{O}\Big(k^{-\frac{1}{2\gamma^{-1}-2}}\Big).
\end{equation}
Moreover, if the pair $(A, B)$ is linearly regular ($\gamma=1$), up to displacement,  then the convergence is linear.
 \end{thm}
\begin{proof} 
The fact that only the two claimed situations can hold is well-known; see for example \cite[Facts~1.2]{Heinz_Set}. Suppose now that the second alternative holds, and define $v:=\disp(A,B)=\bar{a}-\bar{b}$. Suppose also that the pair $(A-v, B)$ admits a $\gamma$-H\"{o}lderian error bound.
Define for convenience $A_v:=A-v$. Then
a short computation (see \cite[Middle of the proof of Theorem 4.10]{BLY}) shows
\begin{equation}\label{eq:11}
\di^2(b_{k}, A_v)\le \di^2(b_k,A_v\cap B)-\di^2(b_{k+1},A_v\cap B).
\end{equation}
Taking also into account that the pair $(A_v, B)$ is $\gamma$-H\"{o}lder regular, we deduce that there exists a constant $c$ so that
\begin{align}
c^{-2\gamma^{-1}}\cdot
\di^{2\gamma^{-1}}(b_{k}, A_v \cap B )& \le \di^2(b_k,A_v )\nonumber\\
 &\le \di^2(b_{k},A_v \cap B )-\di^2(b_{k+1},A_v \cap B ).\label{PTsSrt:E1}
\end{align}
Thus the constants $\beta_k:=\di^2(b_k,A_v \cap B )$ satisfy the recursion
\begin{align}
\beta_{k+1} \le \beta_k\big(1-\frac{1}{c^{2\gamma^{-1}}}\beta_k^{\gamma^{-1}-1}\big).\label{LreAl:1}
\end{align}
Then by \cite[Lemma 4.1]{BLY}, the constants $\beta_k$ satisfy $$\beta_k=\mathcal{O}\Big((\delta+k)^{-\frac{1}{\gamma^{-1}-1}}\Big),$$ 
for some $\delta$.
In the case $\gamma <1$, the additive term $\delta$ can clearly be set to zero. 
On the other hand, \cite[Example~3.2]{Heinz_Set} shows that $(B_k)_{k\in\NN}$ is Fej\'{e}r monotone with respect to
$A_v \cap B$, and therefore by the standard estimate \cite[Theorem~3.3(iv)]{Heinz_Set}, we have
\begin{align*}
|b_k-\bar{b}|\leq 2\di(b_k,A_v \cap B) =\mathcal{O}\Big(k^{-\frac{1}{2\gamma^{-1}-2}}\Big) .
\end{align*}
In the case $\gamma = 1$, inequality \eqref{LreAl:1} shows a geometric decay in $\beta_k$. Appealing to \cite[Theorem~3.3(iv)]{Heinz_Set} again, linear convergence follows.
\end{proof}


We next turn to the singularity degree of set intersections -- a term coined by Sturm \cite{sturm} and rooted in \cite{face_henry}. From now on, we will exclusively consider the problem 
\begin{equation}\label{eqn:main_con}
\textrm{ find some point}\quad x\in \mathcal{V}\cap\mathcal{K},
\end{equation}
where $\mathcal{V}$ is an affine subspace of $\E$ and $\mathcal{K}$ is a
closed convex cone. We will assume that this problem is feasible and that
$\mathcal{K}$ has a nonempty interior (for simplicity). We then say that the
{\em Slater condition} holds if $\mathcal{V}$ meets the interior of
$\mathcal{K}$. Whenever, the Slater condition fails, one would like to detect
this and to somehow regularize the problem.  
With this in mind, Borwein and Wolkowicz \cite{face_henry} introduced the following procedure, called {\em facial reduction}, to successively embed the problem \eqref{eqn:main_con} in a smaller dimensional space, relative to which the Slater condition does hold. Here, we give an easy conceptual overview; a rigorous numerical study of facial reduction appears in \cite{vris}. To this end, consider some representation $$\mathcal{V}=\{x:\mathcal{A}(x)=b\}$$ for some linear mapping $\mathcal{A}\colon\E\to\R^m$ and for some vector $b\in \R^m$. Then the first iteration of facial reduction consists of solving the auxiliary problem: find $y\in \R^m$ satisfying
\begin{equation}\label{eqn:aux}
0\neq \mathcal{A}^*y\in \mathcal{K}^* \quad \textrm{and}\quad \langle y,b\rangle =0,
\end{equation}
where $\mathcal{A}^*$ denotes the adjoint and $\mathcal{K}^*=\{z: \langle z,x\rangle \geq 0 \textrm{ for all }x\in K\}$ is the polar cone.
The auxiliary problem is feasible if and only if Slater fails. Supposing the latter, let $y$ solve the auxiliary problem. Then the entire feasible region $\mathcal{V}\cap\mathcal{K}$ is contained in the slice $\mathcal{K}\cap (\mathcal{A}^*y)^{\perp}$. We now replace $\mathcal{K}$ with $\mathcal{K}\cap (\mathcal{A}^*y)^{\perp}$ and $\E$ with the linear span of $\mathcal{K}\cap (\mathcal{A}^*y)^{\perp}$, and 
\begin{noteH}
	should we say something on how the definition of $\mathcal A$
	changes? is it just the restriction to the new span? I forget the
	details but I think there was some subtlety to defining the new
	restriction.
\end{noteH}
repeat the procedure. The minimal number of facial reduction iterations needed to end up with a problem satisfying Slater's condition is the {\em singularity degree of the pair} $(\mathcal{V},\mathcal{K})$. 

It will be convenient to extend the definition of singularity degree to situations where $\mathcal{V}$ and $\mathcal{K}$ may not intersect. To this end, when the displacement vector $\disp(\mathcal{V},\mathcal{K})$ is attained, the translated affine subspace $\mathcal{V}- \disp(\mathcal{V},\mathcal{K})$ and the cone $\mathcal{K}$ do intersect and we define the {\em singularity degree of} $(\mathcal{V},\mathcal{K})$, {\em with displacement}, to be
the singularity degree of the pair $\big(\mathcal{V}- \disp(\mathcal{V},\mathcal{K}),\mathcal{K}\big)$. When $\disp(\mathcal{V},\mathcal{K})$ is unattained, we say that the singularity degree of $(\mathcal{V},\mathcal{K})$, with displacement, is equal to $+\infty$.


We next turn to the semi-definite feasibility problem:
\begin{equation*}
\textrm{ find some matrix}\quad X\in \mathcal{V}\cap\mathcal{S}^n_+,
\end{equation*}
where $\mathcal{V}$ is an affine subspace of the Euclidean space of $n\times n$-symmetric matrices $\mathcal{S}^n$ and $\mathcal{S}^n_+$ is the convex cone of $n\times n$ positive semi-definite matrices.
We will always endow $\mathcal{S}^n$ with the trace inner product $\langle X,Y\rangle=\tra XY$ and the Frobenius norm $\|X\|=\sqrt{\langle X,X\rangle}$. 
In \cite{sturm}, Jos F. Sturm discovered a surprising connection between H\"{o}lder regularity and singularity degree in the semi-definite feasibility problem. 

\begin{thm}[Sturm's error bounds for SDP]
Given an affine subspace $\mathcal{V}$ of $\mathcal{S}^n$, the pair $(\mathcal{V},\mathcal{S}^n_+)$ is $\frac{1}{2^d}-$H\"{o}lder regular, with displacement, where $d$ is the singularity degree of $(\mathcal{V},\mathcal{S}^n_+)$, with displacement. 
\end{thm}


Combining Sturm's result with Theorem~\ref{TheAPM:1}, we immediately deduce the main contribution of this section.
\begin{thm}{\bf (Convergence rate of alternating projections for SDP)}
\label{TheAPM:1}
Given an affine subspace
$\mathcal{V}$ of $\mathcal{S}^n$, 
consider the semi-definite feasibility problem:
\begin{equation*}
\textrm{ find some matrix}\quad X\in \mathcal{V}\cap\mathcal{S}^n_+.
\end{equation*}
Letting $\{X_k,Y_k\}$ be the sequence of iterates generated by the method of alternating projections,
exactly one of the following two situations holds:
 \begin{itemize}
 \item[{\rm (1)}] The iterates $\{X_k\}$ and  $\{Y_k\}$ are unbounded in norm, in which case the displacement vector $\disp(\mathcal{V},\mathcal{S}^n_+)$ is not attained.
 \item[{\rm (2)}] There exist matrices $\bar{X}$ and $\bar{Y}$ satisfying $X_k \to \bar{X}$ and $Y_k \to \bar{Y}$, with $\bar{X}-\bar{Y}=\disp(\mathcal{V},\mathcal{S}^n_+)$. 
\end{itemize}
In the second case, the iterates $\{X_k,Y_k\}$ converge at a rate $\mathcal{O}\Big(k^{-\frac{1}{2^{d+1}-2}}\Big)$, where $d$ is the singularity degree of the pair $(\mathcal{V},\mathcal{S}^n_+)$, with displacement.
Moreover, if Slater's condition holds, then the convergence is linear.
 \end{thm}


\section{Typical singularity degree and convergence of alternating projections for SDP}\label{sec:typ}
Consider the feasibility problem
\begin{equation*}
\textrm{ find some point}\quad x\in \mathcal{K}\cap\{x\in\E:\mathcal{A}(x)=b\},
\end{equation*}
where $\mathcal{K}$ is a closed convex cone and $\mathcal{A}\colon\E\to\R^m$ is a linear mapping. It is well known that Slater's condition holds for ``typical'' parameters $(\mathcal{A},b)$ among all parameters $(\mathcal{A},b)$ for which the problem is feasible. For a discussion of various generic properties of such problems, see for example \cite{gen_pat,prep,gen_adrian}. In this brief section, in contrast, we consider the more realistic situation of when perturbations in parameters can yield an infeasible problem, with an eye towards the singularity degree.

We first note that displacement vector of the problem is typically attained. Indeed, this is a direct consequence of \cite{stab}. From now on, all references to a measure on $\E$ will refer specifically to the Lebesgue measure on $\E$.

\begin{prop}[Displacement vector is usually attained]\label{prop:morse}
Consider a closed, convex cone $\mathcal{K}\subset\E$ and a vector
$b\in\R^m$. Then for an open, full-measure set of linear transformations $\mathcal{A}\colon\E\to\R^m$,  the infimum $$\inf\{|x-y|: x\in \mathcal{K}\quad \textrm{and}\quad  \mathcal{A}(y)=b\}$$
is attained.
\end{prop}
\begin{proof}
Define $$\mathcal{L}_{\mathcal{A},b}=\{x\in\E: \mathcal{A}(x)=b\},$$
and $v:=\disp(\mathcal{L}_{\mathcal{A},b},\mathcal{K})$.
Then the set $\mathcal{L}_{\mathcal{A},b}-v-\mathcal{K}$ is closed if and only if $(\ker\mathcal{A})-\mathcal{K}$ is closed. A well-known theorem of Abrams (see for example \cite[Lemma 3.1]{Berman} or \cite[Lemma 17H]{homes}) states that the latter holds if and only if the image $\mathcal{A}(\mathcal{K})$ is closed.
On the other hand, in \cite{stab,stab1} the authors show that the image
$\mathcal{A}(\mathcal{S}^n_+)$ is closed for some open, full-measure set of
transformations $\mathcal{A}$. 
\end{proof}


Next, we observe that though we cannot typically expect Slater's condition to hold (or feasibility to hold for that matter), the singularity degree of the problem, with displacement, is usually at most one. To this end, consider a closed, convex cone $\mathcal{K}\in\E$ and
define the affine space $\mathcal{V}:=\{x\in\E: \mathcal{A}(x)=b\}$.
Then the ``strict complementarity'' condition:
$$\textrm{ there exist } x\in \mathcal{V}\cap\mathcal{K} \textrm{ and }y\in \R^m \textrm{ satisfying } 0\neq \mathcal{A}^*y\in \ri N_{\mathcal{K}}(x).$$
is sufficient for the singularity degree of $(\mathcal{V},\mathcal{K})$ to be one, provided that $\mathcal{K}$ is facially exposed (see \cite[Section 18]{con_ter} for the definition). Indeed, if such $x$ and $y$ exist, then standard convex analysis shows that $y$ solves the auxiliary problem \eqref{eqn:aux}, and moreover that 
 $\mathcal{K}\cap(\mathcal{A}^*y)^{\perp}$ is the minimal exposed face of $\mathcal{K}$ containing $x$ \cite[Theorem A.2]{ext_ineq}. Hence, in particular, $x$ lies in the relative interior of $\mathcal{K}\cap(\mathcal{A}^*y)^{\perp}$ and Slater condition will hold for the second iteration. We are now ready to prove the main result.


\begin{prop}[Singularity degree is typically at most one]\label{prop:sing}
Consider a closed, facially exposed, convex cone $\mathcal{K}\subset\E$ and a
vector $b\in\R^m$. Then for a dense set of linear transformations $\mathcal{A}\colon\E\to\R^m$, the feasibility system
\begin{equation*}
\mathcal{K}\cap\{x:\mathcal{A}(x)=b\}
\end{equation*}
has singularity degree,with displacement, of at most one.
\end{prop}
\begin{proof}
Define $$\mathcal{L}_{\mathcal{A},b}=\{x\in\E: \mathcal{A}(x)=b\},$$
and $v:=\disp(\mathcal{L}_{\mathcal{A},b},\mathcal{K})$. 
By Proposition~\ref{prop:morse}, we may assume that the displacement vector $v$ is attained, that is we may write $v=y-x$ for some $y\in \mathcal{L}_{\mathcal{A},b}$ and some $x\in\mathcal{K}$. Clearly we may suppose that the Slater condition fails, since otherwise $\mathcal{A}$ would certainly lie in the claimed dense set. Moreover, we can assume $v\neq 0$, since the complement of the set of matrices $\mathcal{A}$ for which the problem is feasible but Slater fails is a dense set.
  
Observe
that $v$ lies in $N_{\mathcal{K}}(x)\cap\range \mathcal{A}^*$. Consequently
if $v$ actually lies in $\ri N_{\mathcal{K}}(x)$, then the pair
$(\mathcal{L}_{\mathcal{A},b}-v, \mathcal{K})$ has singularity degree at most
one. Suppose this is not the case, that is we have $v\in\rb
N_{\mathcal{K}}(x)$. Choose then an arbitrary vector $w\in \ri
N_{\mathcal{K}}(x)$ and define an orthogonal transformation $U\colon\E\to\E$,
whose restriction to $\spann\{v,w\}$ is a rotation sending $v$ to $w$, and
whose restriction to  $\spann\{v,w\}^{\perp}$ coincides with the identity
mapping. Define the linear transformation $\widehat{\mathcal{A}}:=\mathcal{A}\circ U^T$ and a point $\hat{y}:=Uy$. Consider the perturbed system
\begin{equation}\label{eqn:extra}
\mathcal{K}\cap\{x:\widehat{\mathcal{A}}(x)=b\}.
\end{equation}
The following properties are then easy to verify:
$$\hat{y} \in \mathcal{L}_{\widehat{\mathcal{A}},b}, \quad Ux=x, \quad \range \widehat{\mathcal{A}}^* = U(\range\mathcal{A}^*).$$
Observe 
$$\hat{y}-x=U(y-x)=Uv=w.$$
Consequently, the inclusion $w\in \big(\ri N_{\mathcal{K}}(x)\big)\cap\range \widehat{\mathcal{A}}^*$ holds. We deduce that the pair $(\mathcal{L}_{\widehat{\mathcal{A}},b}, \mathcal{K})$ has singularity degree, with displacement, of at most one. Letting $w$ tend to $v$, the matrices $\widehat{\mathcal{A}}$ tend to ${\mathcal{A}}$, and the result follows.
\end{proof}

When $\mathcal{V}$ is an affine subspace and the cone $\mathcal{K}$ is
semi-algebraic (e.g.,~the positive semi-definite cone) -- meaning that it can
be written as a finite union of finitely many set, each defined by finitely
many polynomial inequalities --  basic quantifier elimination (see
\cite{Coste-semi}) shows that the set of transformation for which the singularity degree, with displacement, of the pair $(\mathcal{V},\mathcal{K})$ is at most one, is a semi-algebraic set. On the other hand, dense semi-algebraic sets necessarily contain an open full-measure set. 
Thus in this case (e.g.,~for $\mathcal{K}=\mathcal{S}^n_+$), the dense set of
transformations$\mathcal{A}$ in Proposition~\ref{prop:sing} is actually an open, full-measure set. In particular, the following typical behavior of alternating projections is now immediate.

\begin{cor}[Generic convergence of alternating projections in SDP]
For a full-measure set of linear transformations $\mathcal{A}\colon\mathcal{S}^n\to\R^m$, the semi-definite program (for any $b\in\R^m$):
\begin{equation*}
\mathcal{S}^n_+\cap\{X:\mathcal{A}(X)=b\}
\end{equation*}
has singularity degree, with displacement, of at most one, and consequently the iterates generated by alternating projections converge at the rate of $\mathcal{O}\Big(\frac{1}{\sqrt{k}}\Big)$.
\end{cor}


\section{Verifying Sturm's error bounds.} \label{sec:num}
\textcolor{red}{Section to be augmented............}


\begin{thm}[Distance to the intersection]
\label{thm:dist_int}
Consider two closed convex sets $A$ and $B$ in $\E$ and a building block $b\to a^+ \to b^{+}$. Then the inequality
$$\dist(a^{+};A\cap B)\geq \sin^{-1}(\theta)\cdot|\bp-\ap|$$
holds, where $\theta$ is the angle between the two vectors $b-\ap$ and $\bp-\ap$.
\end{thm}
\begin{proof}
Since by definition we have $\ap=P_A(b)$ and $\bp=P_B(\ap)$, the intersection $A\cap B$ lies in the polyhedron
$$L:=\{x\in\E: \langle x-\ap,b-\ap\rangle \leq 0 \quad\textrm{ and }\quad\langle x-\bp, \ap -\bp\rangle \leq 0 \}.$$
Let $x$ be the projection of $\ap$ onto $L$. Clearly we have the lower bound
$\dist(\ap; A\cap B)\geq |x-\ap|$. 
We may write $$\ap- x=\lambda (b-\ap) + \mu(\ap-\bp),$$
for some real numbers $\lambda,\mu \geq 0$. It is easy to verify $\lambda,\mu > 0$ and therefore 
$$\langle x-\ap,b-\ap\rangle = 0\quad\textrm{ and }\quad\langle x-\bp,\ap -\bp\rangle = 0.$$
Multiplying through by $b-\ap$, we obtain
$$0=\lambda |b-\ap|^2 + \mu \langle \ap-\bp, b-\ap \rangle$$
Similarly multiplying through by $\ap-\bp$, we obtain
$$|\ap-\bp|^2=\lambda \langle b-\ap, \ap-\bp\rangle+ \mu |\ap-\bp|^2$$
We deduce
$$\mu=\sin^{-2}(\theta)    \quad\textrm{ and }\quad  \lambda = \sin^{-2}(\theta)\cos(\theta)\frac{|\ap-\bp|}{|b-\ap|}.$$
Finally taking into account the equality $\langle \ap-\bp,b-\ap\rangle=-\frac{\lambda}{\mu}|b-\ap|^2$, we obtain
\begin{align*}
|x-\ap|^2&=\lambda^2 |b-\ap|^2+ \mu^2 |\ap-\bp|^2+2\lambda\mu\langle b-\ap,\ap-\bp\rangle\\
&= \lambda^2 |b-\ap|^2 + \mu^2 |\ap-\bp|^2-2\lambda^2|b-\ap|^2\\
&= \mu^2 |\ap-\bp|^2 -\lambda^2 |b-\ap|^2\\
&= \sin^{-2}(\theta) |\ap-\bp|^2. 
\end{align*}
The result follows.
\end{proof}

\subsection{Figures}

\subsubsection{Random instances}
For each figure we generate a random $b\in \R^m$. We then obtain
$40$ random instances by generating random $A_i\in \mathcal{S}^n, 
i=1,\ldots,m$. This gives us our random linear manifold 
$\mathcal{L}:=\{P\in \mathcal{S}^n: \mathcal{A}P=b\}$. 
Figures \ref{fig:normplots40}, \ref{fig:sineplots40}, \ref{fig:infeasplots40} 
use the same $40$ random instances.

For each random instance, we start with a random starting point $P_0$ and
\emph{accurately} attempt to find a point in $\mathcal{L}\cap
\mathcal{S}_+^n$. This results in limit points $P_{psd} \succeq 0$ and
$P_\ell \in \mathcal{L}$ which yields the displacement vector
$D:=P_{psd}-P_\ell$.


In Figure \ref{fig:normplots40}, page \pageref{fig:normplots40}, 
we look at the rates of convergence for feasible instances obtained
after shifting $b\leftarrow \mathcal{A} P_{psd}$. The sequence
for each of the $40$ instances that we plot is
\[
\sqrt{k}\|P_k-P_{psd}\|,
\]
where $P_{psd}\succeq 0$ is the point for
each instance found when solving for the displacement vector.
These sequences are bounded which empirically verifies the worst case sublinear
$\mathcal{O}\left(\frac{1}{\sqrt{k}}\right)$ convergence rate.
Note that the shift implies that the Slater
condition fails here which accounts for the slower 
\underline{sub}linear convergence rate.

Figure \ref{fig:sineplots40}, page \pageref{fig:sineplots40}, similarly
shifts using $b\leftarrow \mathcal{A} P_{psd}$ to obtain feasibility.
we look at the rate of convergence
using Theorem \ref{thm:dist_int} in combination with Theorem \ref{TheAPM:1}. 
Here $P_k,L_k$ are the $k$-th iterates for the projections onto the
semidefinite cone and linear manifold, respectively, and $\theta_k$ is
the angle between the vectors $P_k-L_k,P_{k+1}-L_k$. Note that
\begin{equation}
\label{eq:sinratio}
\begin{array}{rcl}
\frac {\|P_k-L_k\| }{\sin(\theta_k)}
&\leq & 
\dist(P_k; \mathcal{S}_+^n\cap \mathcal{L}) 
\\&\leq&   C \dist^\gamma(P_k;\mathcal{S}_+^n)+
        C \dist^\gamma(P_k;\mathcal{L})
\\&=& C\|P_k - L_k\|^\gamma.
\end{array}
\end{equation}
Therefore, for each instance we look at the sequence
\[
\frac {\|P_{k+1}-L_k\|^{1-\gamma}}{\sin(\theta_k)}, \gamma=\frac 12.
\]
We see that this result is \emph{tight} in the sense
that the ratio stays bounded above and also bounded from below but above zero.

Figure \ref{fig:infeasplots40}, page \pageref{fig:infeasplots40}, 
looks at the norm rate of convergence
\emph{before shifting for feasibility}. We see that the rate is consistently
better than that obtained after the shift in Figure \ref{fig:normplots40}.


\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{normplotsfig40.pdf}
\caption{After feasibility shift $b \leftarrow {\mathcal A} P_{psd}$; 
$40$ random instances}
	\label{fig:normplots40}
\end{figure}
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{sineplotsfig40.pdf}
\caption{After feasibility shift $b \leftarrow {\mathcal A} P$; 
$40$ random instances}
	\label{fig:sineplots40}
\end{figure}
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{infeasplotsfig40.pdf}
\caption{\underline{Before} feasibility shift $b \leftarrow {\mathcal A} P$; 
$40$ random instances}
	\label{fig:infeasplots40}
\end{figure}

\subsubsection{Instances with specific degree of singularity}
We start with the \emph{worst case} instance for the
dual form $\mathcal{A}^*y\preceq C, C=0$, presented in \cite[Pg ?]{MR2724357}.
To form $\mathcal{A}$, let
\[
A_1=e_1e_1^T,
A_2=e_1e_2^T+e_2e_1^T,\quad
A_i=e_1e_1^T+ e_1e_i^T+ e_ie_1^T, i=3,\ldots,n_d \in \mathcal{S}^{n_d}.
\]
The feasibility problem $\mathcal{A}^* y \preceq 0$ 
has degree of singularity $d=n_d-1$.
We form an equivalent primal form problem as follows.
We form linear independent matrices $A_i, i=n+1,\ldots, n(n+1)/2$ that
are orthogonal to $\{A_i, i=1,\ldots,n\}$. 
We then choose random $B_i\in \mathcal{S}^{n_t}, i=n+1,\ldots, n(n+1)/2$
and set $A_i \leftarrow \blkdiag(A_i,B_i)$, where $\blkdiag$ forms the
block diagonal matrix from its arguments. we now set $n=n_d+t_n$ and,
for each instance we let $Q$ be a random $n\times n$ orthogonal matrix.
The equivalent feasibility problem is
\[
\mathcal{A}X=0, X\succeq 0,
\]
where $\mathcal{A}$ is formed using $QA_iQ^T, i=n+1,\ldots,n(n+1)/2$.

In Figures \ref{fig:degnormplots40}, page 
\pageref{fig:degnormplots40} and
\ref{fig:degsineplots40}, page \pageref{fig:degsineplots40} we present
plots for the degenerate cases where Slater's condition fails and the
degree of singularity $d=n-1$, i.e.,~this is the \emph{worst case}. 
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degnormplotsfig40.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}, d=n-1$ for degenerate case}
	\label{fig:degnormplots40}
\end{figure}
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degsineplotsfig40.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}, d=n-1$ for degenerate case}
	\label{fig:degsineplots40}
\end{figure}

\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degnormplotsfig40d8.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}$ for degenerate case}
	\label{fig:degnormplots40}
\end{figure}
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degsineplotsfig40d8.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}$ for degenerate case}
	\label{fig:degsineplots40}
\end{figure}

\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degnormplotsfig40d7.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}$ for degenerate case}
	\label{fig:degnormplots40}
\end{figure}
\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degsineplotsfig40d7.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}$ for degenerate case}
	\label{fig:degsineplots40}
\end{figure}

\begin{figure}[ht!]
\centering
\includegraphics[width=.8\textwidth]{degsineplotsfig40d4.pdf}
\caption{$\gamma = \frac 1{2^{d+1}-2}$ for degenerate case}
	\label{fig:degsineplots40}
\end{figure}
\subsection{????suggested numerics????}
2 sets of pics with 2 pics each

\begin{enumerate}
\item
first set:  one picture with sine ratios and one pic with convergence
rate norms for many instances
side by side ... increase number of tests to see best picture 


\item
second set:   1 pic with the convergence rate using norms ... to see
increased speed ...  without shifting ... 
\item
number of facial reductions??? generate instance with predefined
degeneracy degree    $k^{\frac 1{2^{d+1}-2}}$
e.g. from  1,2,3,4,...
\end{enumerate}

\section*{Acknowledgements}
We are grateful to Levent Tun\c{c}el for suggesting including the contents of Section~\ref{sec:num}.

\bibliography{bibliography}
\bibliographystyle{plain}

\end{document}