\documentclass[psfig,11pt]{article}
\usepackage{epsfig}
\usepackage{hyperref}
\setlength{\oddsidemargin}{-0.275in}
\setlength{\evensidemargin}{-0.275in}
\setlength{\topmargin}{-0.25in} \setlength{\textwidth}{7.0in}
\textheight9.25in
\begin{document}
\bibliographystyle{plain}

\title{Strategic Project Grants (SPG) on:\\
 Semidefinite Programming and other Large Scale
Convex Programming Techniques to Solve VLSI Circuit Layout
Problems}

\author{
Prof. Terlaky, Tamas   (9319)  \and
       Prof. Vannelli, Anthony  (15296)  \and
       Prof: Wolkowicz, Henry  (11131)
}

\maketitle


\section*{Type of grant: \large{\bf Information and Communications
Technologies}}


\section*{Budget Justification}

\subsection*{Salaries and benefits:}
\begin{itemize}
\item Support of seven graduate students yearly.
      Three with full support ({\$}45000.- per year) and four with partial
      support ({\$}20000.- per year) from this budget.
\item Major support of two postdoctoral fellows at the rate of
      {\$}27000.- per year  \\
      (including fringe benefits).
\item Technical assistance is needed to purchase on a regular base
      from McMaster's
      Computer and Information Services (CIS) to maintain
      the IBM-RS6000 workstation cluster, install new software
      and additional hardware.
\end{itemize}

\subsection*{Equipment or facility:}
\begin{itemize}
\item Maintenance contracts for computer equipment,
file backups, etc...
\end{itemize}


\subsection*{Materials and supplies:}
\begin{itemize}
\item Purchase, upgrade and yearly fee of computer software such as:
      MATLAB, MAPLE, GAMS, AIMMS, CONOPT, MINOS, CPLEX,
      XPRESSMP etc.
\item Office supplies for the Advanced Optimization group including:
      printer paper, transparency, diskettes, cables etc.
\end{itemize}

\subsection*{Travel:}
\begin{itemize}
\item Three conference trips per year, to conferences like:
      ISMP, SIAM, INFORMS, EURO, IFIP, ACM etc.
\item Four conference trips per year for the
      postdoctoral fellows to conferences as mentioned above.
\item Three conference trips per year of the graduate students
      to conferences as mentioned above.
\item Trips to IBM's Watson Research Laboratory and to visit
      colleagues to exchange ideas about project related research and
      implementation issues.
\end{itemize}

\subsection*{Dissemination costs:}
Cooperation with the Fields Institute in Toronto. Workshop and conference costs
for MOPTA held at McMaster University.


\newpage
\part*{Free Form Proposal Description}
\section{Synopsis}
The last 10 years have seen tremendous improvement in need to
solve larger combinatorial problems that are inherent in
circuit/computer chip fabrication. This has coincided with a
revolution in the theory behind optimization and our ability to
solve large scale optimization problems.

The three applicants have been intricately involved in these areas of
research and are perfectly suited to apply these new techniques to
further improve on
the solution of the combinatorial
aspects of manufacturing and VLSI circuit layout problems.


\section{Background}
One of the most important problems in information technology is that of
increasing computer speed. This involves obtaining better, more
efficient, chips and doing this quickly. The techniques for this are
????

\section{Proposed Research}
\subsection{Introduction and Overview}
The aim of this proposal is the development and implementation of
software for solving problems related to VLSI design.

The solution techniques we use come from convex programming relaxations
of hard combinatorial problems in combination with heuristics for final
placements.

\subsection{Related Research}
The three applicants have worked in areas directly and indirectly
related to VLSI and other large scale optimization problems.

\section{Benefits to Canada}
The designers in the Microelectronics Industry are always looking for
faster and more efficient CAD layout packages to deal with the denser,
smaller and faster chips hat are required to meet customer needs.
Our proposed advanced large-scale optimization approaches
will allow for ``near optimal'' placements
to aid in the development of CAD packages that designers can
totally rely on in making circuit or systems related decisions.

In addition, this project will open up direct communication between
the Universities  of Waterloo and McMaster. We hope that this
will result in a long term interaction between the two
universities and, thus, benefit both.

\section{Team Expertise}
The three applicants have expertise in optimization with applications to
VLSI problems. References can be found in Section A of form 100
and/or by using the CURLS of the respective co-investigators.
In the last 3 years:\\
Tamas Terlaky has ???\\
Tony Vannelli has ???\\
Henry Wolkowicz has been involved in several projects dealing with SDP
relaxations of hard combinatorial problems;  a few are
\cite{PardWolk:96,PardWolk:97,OvWo:96,SaVaWo:97,AnWo:00,Wolknonlinassign:99}
and the references therein. In particular, this includes numerical tests
with SDP relaxations for the quadratic assignment problem, graph
partitioning problem, and max-cut problem.
\section{Training}
We expect to have two postdocs and two PhD graduate students directly
involved in this project. They will learn both the optimization
techniques as well as the VLSI background for this project.
\section{Intellectual Property}
\section{Synopsis}
\appendix


OLDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD\\
\begin{center}
{\bf Integrating Mathematical Programming and Advanced Search
Approaches
\\ to Solve VLSI Circuit Layout Problems}
\end{center}

\noindent \textbf{NOTE:} \textit{Papers that are alpha-numeric are
found in Section A of Form 100. Explanations of Research
Contributions and Contributions to the Training of Highly
Qualified Personnel are described in Sections B and C of the Form
100 also. Other references are cited in text because of limited
space problems.}

In the last six years, I (along with my students and colleagues)
have looked at two major overlapping research areas: (i)  {\it
Advanced Techniques in Graph Partitioning and Applications} and
(ii)  {\it Interior Point Methods for Solving VLSI Circuit Layout
Problems}.

During this period, we have been able to formulate, develop and
integrate the underlying mathematical models and optimization
techniques that represent the design problems in several
engineering research areas. A large part of the investigation into
the \textit{related areas} of VLSI circuit layout has allowed us
to develop the underlying heuristics to solve the resulting
NP-hard problems. Moreover, we can assess the quality of
the solutions generated by these heuristic techniques; that is, \textit{%
bounds} on the global optimal solutions have been developed.

This proposal will review the research areas, modeling and
solution techniques that have been developed to solve the outlined
problems. We outline new directions that we plan to pursue in the
next four years in the areas of interior point research and
applications, advanced search techniques and their \textit{joint
application} to VLSI circuit layout.

\begin{center}
\textbf{RESEARCH PROBLEM AREAS, MODELS AND \\[0pt] REVIEW OF
RESEARCH PROGRESS}
\end{center}

\noindent \textbf{A. Advanced Techniques in Graph Partitioning and
Applications}

 In the last decade the technology for manufacturing
integrated circuits (ICs) has seen an explosive growth. Modern
integrated circuits are not only ubiquitous but typically contain
millions of transistors. Due to the inherent complexity of
integrated circuits, the design process is extremely
time--consuming, hence developing algorithms to automate the
process is of paramount importance. Furthermore, the
interconnection of circuit modules is the single most important
factor in performance criteria such as signal delay, power
dissipation and circuit size. The interconnection or {\it circuit
layout} problem is modelled as a sequence of (very large) discrete
optimization problems with the objective function, the overall
wirelength.

The circuit layout problem is further subdivided into three
subproblems: partitioning, placement, and routing. Each one of
these subproblems is briefly discussed below.

The {\it partitioning} subproblem is used to divide the circuit
into smaller, related sub-circuits and eventually into elementary
functional switching elements called {\it cells}. These circuits
can be designed and implemented as separate circuits and connected
at a later stage of the circuit layout process.

The next stage of the circuit layout step, the cell {\it
placement} subproblem, is applied to each partition to determine
positions of the cells while minimizing an estimate of the total
wirelength. This problem has been traditionally formulated as an
optimization problem, with either linear or quadratic functions.

In the last stage of the VLSI layout, the {\it routing} subproblem
is solved to determine the geometric layout of the wires that
connect the cells, called the {\it nets}, according to the
required interconnection defined by netlist. The routing
subproblem is typically divided into two additional subproblems:
{\it global routing} and {\it detailed routing}. In global
routing, the approximate course of the wires is determined using a
simplified description of the netlist. The solution from the
global routing problem is then used in the detailed routing
problem to determine the exact course of the wires in the channels
\textit{Cadence} [Sherwani, \textit{Algorithms for VLSI Design
Automation}, Kluwer, 1993]
%\cite{sherwani95}.

Circuit layout is an NP-hard problem, therefore, the focus of much
research has been to develop heuristic solution methodologies
which yield near--optimal solutions to the problem with reasonable
computational effort. Optimization techniques such as simulated
annealing, tabu search and interior point methods have been used
to solve the partitioning, placement and routing problems. These
techniques have the disadvantages of being computationally
expensive and tend to become trapped in local minima.


We have developed several interacting approaches to model and solve \textbf{%
all three} layout problems. These approaches have lead to good
initial prototype designs that contain short wirelengths and are
placed in a small area. We summarize the research results in the
partitioning area in this section. Our research on the placement
problem is described in the next section.

The decomposition of large-scale systems into several subsystems
is increasingly becoming an important engineering design tool. One
uses
decomposition to find strongly connected subsystems that have the \textit{%
weakest} interaction between them. Such decompositions are useful
because large-scale system analysis can then be performed at the
subsystem level.

The partitioning of a graph or hypergraph into $k$ blocks ($k \geq
2$)
containing no more than $u$ nodes is known to be NP-hard [Preas, \textit{%
Physical Design Automation of VLSI Systems}, Benjamin Publishing
Co., 1988]. The work in this area has led to the development of
three major approaches. First, an \textit{eigenvector} approach is
used to develop fast techniques \textit{O(k n (log n))} (where
$k<<n$ and $n$ is the number of graph or hypergraph nodes) [J3,
J6,J10,C10]. The second contribution is in the development and use
of \textit{advanced node interchange} approaches to find near
optimal solutions. Tabu Search [Glover, \textit{ORSA J. on
Computing}, Vol.1 pp. 190-206 and Vol. 2, pp. 4-32, 1990] combined
with Genetic Algorithms~[Venkatasubramanian and Androulakis,
\textit{Computers in Chemical Eng.} Vol. 15, pp. 217-228, 1991]
have led to the best variants and results for this very difficult
problem [J1,J5,J7,C10].

The Genetic Algorithm has the effect of defining ``general zones''
that are likely to lead to good partitions; the Tabu Search is a
``detailed optimizer'' to find the best partitions in these zones.
Few parameters have to be adjusted to obtain fast convergence
[J7,C10,J1]. Other hybrid node interchange techniques including a
very fast GRASP approach [Feo and Resende, \textit{Operations
Research Letters}, Vol. 8, pp. 67-71, 1989] have been combined
with node clustering to reduce the search regions for the Tabu
Search [J5]. In the last work [J5], the majority of partitions for
circuit
layout benchmark problems using a Genetic Algorithm/Tabu Search are \textbf{%
provably optimal} or within 3\% of optimality using a commercial
mixed integer programming solver CPLEX.

Finally, an interesting VLSI layout problem that requires
assigning wire segments in two layers to minimize the number of
vias (a switch point between the two layers) and wirelengths was
attacked by a new graph theoretic concept called a "signed
hypergraph". An extension of a simple partitioning technique led
to provably near optimal solutions that were within 3\% of
optimality [J3]. One well known VLSI CAD vendor, Cadence has
extended the technique to 20 layer via minimization problems.


These partitioning techniques have been specialized to yield
minimal interconnections for a
critical \textit{power engineering distribution problem} [J6]. The \textit{%
optimal partitions} that are obtained allow utility planners to
allocate future power needs within and between sectors that were
not believed to share load.

\noindent \textbf{B. Interior Point and Convex Optimization for
Solving VLSI Circuit Layout Problems - Theory and Applications}

The design of many fundamental network and systems in engineering
can be modeled as linear or nonlinear or combinatorial
optimization problems. This is certainly the case for many of the
circuit layout, filter, communication system and computer
architecture design problems that arise in electrical engineering
design practice. Very tight performance specifications for these
problems demand \textit{near optimal designs} subject to many
thousands of constraints and parameters (variables). Many
optimization problems that arise in engineering design can be
solved by \textit{interior point (IP)} techniques. The provably
\textit{polynomial time} feature of interior point methods allow
designers to find the optimal solution in a fraction of the time
(in many cases) than \textit{Simplex-based} approaches.

This research led to contributions in three engineering
applications areas. We have shown that placement problems in VLSI
design can be reasonably approximated by solving large-scale LP
and QP models~[J2,B2,S2,C1,C4,C5,C9]. These models were extended
to solve two different \textit{power systems} problems are also
solved with this powerful and flexible approach~[J8,C12,J9].

Just recently, the theory for modelling placement problems was
changed by the discovery of an elegant class of simple convex
optimization models for the placement problem. Classical relative
placement results are QP-based approaches that determine the
relative placement of modules in a two-dimensional area. The
problem with these approaches is that the modules usually overlap
in the centre of the placement area. Researchers have attempted to
{\it repel} modules away from each other while minimizing
wirelengths with limited success. The work described in [S1] (see
enclosed paper) is significant since it describes a novel {\it
convex programming} model that forces modules away from each
other. The class of convex functions are in the shape of a
``teacup with a flat bottom". A line search technique is used to
force solutions to the edge of the ``teacup" which contains {\it
nonoverlapping modules}. The approach is called {\it
Attractor-Repellor Method} and uses a second stage where attractor
modules are seeded into sparse areas to draw modules into empty
spaces. The results are outstanding, beating the best know
simulated annealing results by 3-8\% in a fraction of the time.
This work is the culmination of work described in [C1,B1].

In addition, we have made contributions to the development and
practical implementation of both the \textit{continuous} and \textit{discrete%
} interior point optimizers. We have developed a fundamental
\textit{preconditioned conjugate gradient} (iterative techniques)
[S1,C2,B2] to solve indefinite systems that arise in many of the
newer primal-dual interior point techniques [Vanderbei and
Carpenter, \textit{Math. Prog.}, Vol. 58, pp. 1-32, 1993].
\begin{equation}
\begin{array}{llll}
Min\  & \mathbf{c}^{T}\mathbf{x} & \ \ \ \ \ \ \ \ \ Max\  & \mathbf{b}^{T}%
\mathbf{y} \\
s.t. & \mathbf{Ax}=\mathbf{b} & \Longleftrightarrow \ \ \ \ \ \
s.t. &
\mathbf{A}^{T}\mathbf{y}+\mathbf{z}=\mathbf{c} \\
& \mathbf{x}\geq \mathbf{0} & \ \ \ \ \ \ \ \ \  & \mathbf{z}\geq
\mathbf{0}
\end{array}
\label{basic2}
\end{equation}
This work has focused on \textit{ordering} issues that arise from
the factorization of the first-order conditions for optimality
\begin{equation}
\left[
\begin{array}{cc}
-\mathbf{Z}\mathbf{X}^{-1} & \mathbf{A}^{T} \\
\mathbf{A} & \mathbf{0}
\end{array}
\right] \left[
\begin{array}{c}
\mathbf{\Delta x} \\
\mathbf{\Delta y}
\end{array}
\right] =\left[
\begin{array}{c}
\sigma -\mathbf{X}^{-1}\phi \\
\rho
\end{array}
\right]  \label{indefinite}
\end{equation}
where \textbf{X} and \textbf{Z} are the diagonal matrices
containing the components of \textbf{x} and \textbf{z}
respectively.

An important fundamental result in [S1] is that if the incomplete $\ell $%
-level factorization ($\ell \geq 0$) of the normal equations $\mathbf{AX}%
\mathbf{Z}^{-1}\mathbf{A}^{T}$ exists, then so does the incomplete $\ell $%
-level factorization of the system~(\ref{indefinite}). Our present
strategy, re-orders the rows and columns in ~(\ref{indefinite}) to
reduce fill so that a higher level preconditioner can be used
(other than 0). This work was recognized by a {\bf Best Paper
award at 5th SIAM Conference on Applied Linear Algebra} [C13].

\begin{center}
\textbf{SUMMARY OF PROPOSED RESEARCH}
\end{center}

The work initiated and developed during the past six years has
focused on using (i) interior point and (ii) advanced search
techniques to solve ``specific'' manufacturing and VLSI circuit
layout problems. Both areas of research have been quite fruitful
on their own. However, the inherent ``real world'' combinatorial
aspects of manufacturing and VLSI circuit layout problems are far
too complex for any single optimization technique to entirely
solve the problem on its own. We highlight the research that we
wish to continue in these areas to move towards integration of
these respective continuous and discrete optimizers on relevant
VLSI layouts.

The entire design cycle may be viewed as transformations of
representation in various steps. For example, the layout is
iteratively improved to meet timing specifications of the system.
Another example is to detect design rule violations during design
verification. If such violations are detected, the physical design
step needs to be repeated to \textit{correct} the error. Thus one
of the objectives of VLSI CAD tools is to minimize the number of
iterations and thus reduce the time-to-market. This is usually
accomplished by having efficient heuristics, that can produce near
optimal solutions in reasonable amounts of time.

\noindent \textbf{A. Exploration of Advanced Search Techniques in
Layout}

The two industrial VLSI CAD tools available today such as (i) \textit{%
TimberWolf} [Sechen and Sangiovanni-Vincentelli, \textit{Proc.
13th Design Automation Conference}, pp. 408-416, 1976] and (ii)
\textit{Cadence} [Sherwani, \textit{Algorithms for VLSI Design
Automation}, Kluwer, 1993] rely on the \textit{single} technique
of \textit{Simulated Annealing} [Kirkpatrick \textit{et al},
\textit{Science}, Vol. 220, pp. 671-680, 1983]. Simulated
Annealing often gives excellent solutions but converges slowly to
local optima due to a searching mechanism that may repeatedly
explore the same areas of the feasible solution region. The lack
of memory hurts Simulated Annealing as compared to Tabu Search or
Genetic Algorithms which keep track of their moves. Other
practical heuristics are based on the simple node interchange
suggested by Kernighan and Lin [Kernighan and Lin, \textit{Bell
System Journal}, Vol. 49, pp. 291-307, 1970]. These variants work
in \textit{O (m log (n))} time and space but \textit{often get
easily trapped} in local optima far from the global optimum.

The work initiated in [J1,J2,J3, J5,J7,C4, C10] has led to
efficient approaches for solving partitioning and module location
problems that arise in VLSI layout. The newer works described in
find a balance between the good combinatorial problem solutions
generated by simulated annealing (but with poor speed) and the
simple node interchange approaches that yield ``variable'' local
solutions in quick time. The advanced search techniques that we
plan to follow in this work possess all the desirable features of
a good search technique. They use GRASP and Genetic Algorithms to
find good initial solutions. It adjusts short and long term memory
and aspiration levels in Tabu Search [see enclosed paper, J1] to
perform a fast local neighbourhood search that does not explore
previous fruitless paths over and over like simulated annealing.

The important role of exploring the search space is accomplished
by the Genetic Algorithm which processes information sent back to
it from the Tabu Search heuristic. Thus, new regions are then
established that are likely to contain good local optima. This
approach achieved the required balance between speed and quality
of solution that good heuristics must achieve. The work in [J7]
shows that the combination of Genetic Algorithm/Tabu Search
achieves speed ups of \textit{10-50 times} while generated 2-block
partitions are \textit{5-10\% better} than simulated annealing.
These results were obtained on benchmark circuits up to 6000
nodes.

Finally, we plan to utilize \textit{clustering} [J5] to tackle
partitioning, placement and routing problems containing more than
100,000 nodes. Clustering allows us to keep the features of the
original large problem while reducing the search space for the
promising search techniques described earlier. Our clustering is
based on \textit{binding functions} which allow multiple nodes to
behave as one. In initial testing described in~[S4], the number of
cuts was reduced between 15-40\% while running time was reduced by
as much as a third. We will be extending these promising
approaches to both global and detailed routing.

\noindent \textbf{B. Interior Point Methods for Solving VLSI
Placement and Routing Problems}

During the next three years, we plan to continue our research on
the fundamental placement and routing problems that we analyzed
with interior point methods ~[J2,J3,B1,B2,C7,C5,C9,S2]. We plan to
conduct this research on two interrelated fronts. First, we will
investigate and compare several linear programming (LP) and
quadratic programming (QP) models for both the placement
[J2,S2,B1] and routing problems[J3,J4]. Second, we will continue
to develop and refine our \textit{iterative solvers}
[J2,B2,S2,C5,C9] for solving the expected huge indefinite systems
of equations ($>10^5$ constraints and variables) that evolve from
a numerically stable primal-dual interior point methods such as
the one due to Vanderbei and Carpenter, ~[Math. Prog., Vol. 58,
pp. 1-32, 1993]. We feel that this work will be productive by
focusing on \textit{ordering} issues that arise from the
factorization of the first-order conditions for optimality.

Testing on several partitioning and placement problems reported in
[J2,S2] show the promise on these iterative approaches on large
indefinite systems versus solving normal equations (CPLEX Barrier)
or indefinite
systems solved by \textit{direct} solvers (LOQO)~[\textit{LOQO User's Manual}%
, Tech. Report SOR-92-5, Dept. of Civil Eng. and OR, Princeton,
1992]. On a typical large-scale partitioning problem of 3000
nodes, our iterative approach using a second order level of fill
[S3, p.22] was 30 times faster than LOQO; 150 times faster than
CPLEX (Simplex) and CPLEX (Barrier) did not converge.

Finally, the power of the convex programming
model-Attractor-Repeller Method should allow us to further extend
the fundamental work in [S1,B1,C1] to the more difficult
performance driven placement and routing problems. Here we plan to
use the {\it linear formulations} of {\it delays} (the most
critical aspect) as presented in [J3] so that we can extend the
convex programming technique to this problem. We believe that this
avenue will be especially fruitful given the newer interior point
optimization convex optimization codes that can attack the
resulting models.

{\bf C. Semidefinite Programming Relaxations} \nopagebreak

Combinatorial optimization problems can be modelled as (nonconvex)
quadratically constrained quadratic programs. The Lagrangian dual
of these models result in semidefinite programs, i.e. linear
programs where the variables are matrices and the nonnegativity
constraint is with respect to the positive semidefinite partial
order, \cite{HePoReWo:95,PoReWo:94}. These relaxations have been
shown to provide very tight bounds for many simple combinatorial
optimization problems, e.g. \cite{HeReVaWo:93}. In particular,
they have been successful for equipartitioning of graphs, e.g.
\cite{KaReWoZh:94,Kar:95}.

The semidefinite programming relaxations are solved using
primal-dual interior-point methods, e.g. \cite{HeReVaWo:93}.
However, the linear systems, that need to be solved to find the
Newton direction, can be very large and nonsparse if handled
incorrectly. If handled correctly, conjugate gradient type methods
can be used. \noindent


\noindent \textbf{C. Towards Integration}

A major direction of our research over the next four years is
aimed at the integration of interior point methods which are used
to find relative placements (possibly overlapping) and global
routings with the advanced search techniques which are used to
find non overlapping placements and final detailed routings. An
initial focus will be on the generation of a VLSI circuit layout
for \textit{standard cell} technologies. In a standard cell
technology, one lays out the modules that are of a fixed height
but varying length on a fixed number of rows. The layout is
considered to good if a number of practical factors are optimized
such as chip area, wirelength and timing/delay problems.

The following ``real'' 800 node example (from [J2]) shows the
effect of using (i) iterative IP methods to find initial
overlapping placement (\textit{left-most figure}) (ii) a sequence
of advanced search techniques to partition the chip and reduce
overlap (\textit{two middle figures}) and (iii) a final mapping to
the rows to obtain a legal VLSI circuit layout (\textit{right-most
figure}). \newline 
???????4 figures go here?????
%\centerline{\hbox{
%\psfig{figure=pic1.ps,height=1.2in}
%\psfig{figure=pic2.ps,height=1.2in}
%\psfig{figure=pic3.ps,height=1.2in}
%\psfig{figure=pic4.ps,height=1.2in} }} 
You can see the effect that
the optimizers have in driving the modules from the obvious
useless solution of ``being on top of each other'' in the center
of the chip (no wirelength) to a complete ``non-overlapping'' and
legal placement at the far right. The resulting layout for this
example has wirelength 5\% shorter and area 10\% smaller than a
similar result generated by \textit{TimberWolf} (a competitive
academic CAD package). However, it has larger circuit timing and
delay problems. Along with the pursuit of the individual
optimization problems described in parts A and B of this section,
we wish to explore adding the ``important feedback'' link to
reduce these other objectives. We then plan to integrate our
optimization-based routines into packages such such as
\textit{TimberWolf} or \textit{Cadence} to compare (i.e. benchmark
on large circuits containing $>10^5$ modules and wires) the
generated chips performance with those that are \textbf{not}
generated by this approach.


\noindent

\bibliography{.master,.psd,.publs,.edm,.qap}
\end{document}