UnixReview.com

Main Menu

  Home
  Archives
  Reviews
  Books
  Editor's Choice
  Geek Links
  Contact Us

Sections

  Open Source
  Certification
  Shell Corner
  lost+found
  Unix Shows
  Updates

Newsletter

Get the Newsletter

Jon Bentley

I've spent most of my adult life making programs better: easier to use, more powerful, smaller, and cheaper. The most satisfying improvements tend to be the most quantifiable; it feels great to replace thousands of lines of code with dozens of lines. And the most quantifiable improvement of all is a program speedup. After you spend hours or days working on your program, you can state precisely how much faster it runs.

This column describes some of the largest speedups I have seen. We'll start with a simple program for an important problem in network design. We'll then apply four concrete speedups, each involving just a few lines of code. The result is a speedup factor of about 30,000 for typical problem sizes. Along the way, we'll see that hardware changes over the past decade or so give us another speedup factor of about 1,000. And finally, we'll go hog-wild and see speedups of, well, billions and billions.

The Problem

You face this problem when you schedule the half-dozen errands you need to run on a Saturday morning: proceeding from task to task to task with the least hassle. Mathematicians call it the "Traveling Salesman Problem": find the shortest tour that visits n cities.

This elegant problem has fascinated mathematicians for more than a century, yet it also has numerous practical applications. As this column went to press, the Web site http://www.fgi.net/LincolnLegalPapers/natsig.htm showed the tour Abraham Lincoln rode in 1850 through 14 county seats on the Eighth Judicial Circuit in Illinois. SONET rings (and any other rings, for that matter) can be viewed as Traveling Salesman tours. Ethernets without hubs call for finding good (but not necessarily optimal) Traveling Salesman paths. (A tour of n cities has n edges, while the path uses only n-1 edges by not returning to the start city.) Geometric Traveling Salesman algorithms are used to schedule mechanical devices such as drills, plotters, and solderers. Here is the shortest tour through nine points in the plane.

Our programs will assume that the n cities are numbered 0 through n-1, as in the diagram above. We determine the distance between two cities with the function

Dist dist1(int i, int j)

(The number 1 in dist1 identifies this as the first distance function.) The type Dist is defined for now as

typedef double Dist;

We will examine possible variations shortly.

We can represent a tour by a permutation of the integer city numbers. In our diagram, for instance, the optimal tour could be represented by the permutation 327815640, or by 17 equivalent permutations including 187230465. A program will represent the permutation by the array p:

int p[MAXN];

The vector p[0..n-1] always contains exactly one of each of the integers 0 through n-1.

The single element 0 has just one permutation. The two elements 0 and 1 have two permutations: 01 and 10. The three elements 0, 1, and 2 have six permutations: 012, 021, 102, 120, 201, and 210. In general, n elements have n! (pronounced "n factorial") permutations, where

n! = nx(n-1)x...x3x2x1

This function grows faster than exponentially:

10! 3.62x10⁶, 20! 2.43x10¹⁸, and 30! 2.65x10³²

We will soon see practical interpretations of these unimaginably huge quantities.

Algorithm 1: The Easiest Program

Our first program is thorough if not speedy. It will look at all n! permutations in the vector p, calculate the cost of each, and store the cheapest tour. We initialize Algorithm 1 with this code:

void solve1()
{ int i;
for (i = 0; i < n; i++)
p[i] = i;
minsum = INF;
search1(n);
}

This solve function has the suffix 1 to show that it is introduced in Algorithm 1; we'll see advanced versions later.

The n! permutations are constructed recursively by the search1 function. Its single parameter m tells how many elements of p remain to be permuted: it works top-down by leaving p[m..n-1] fixed while it permutes the bottom m elements in p[0..m-1] like this:

void search1(int m)
{ int i;
if (m == 1) {
check1();
} else {
for (i = 0; i < m; i++) {
swap(i, m-1);
search1(m-1);
swap(i, m-1);
}
}
}

When m is one, we call the check1 function on the given array. For larger values of m, we swap each of the m values to be at the top of the current vector, recursively enumerate the subsolutions, then swap it back to its original position. This code is succinct but subtle; take a moment to understand it before you read further.

Exercise 1. Trace the contents of the vector p as solve1 calls search1(3) to solve a three-city Traveling Salesman Problem.

The search1 function calls this swap function to exchange two elements of p:

void swap(int i, int j)
{ int t = p[i];
p[i] = p[j];
p[j] = t;
}

The check1 function calculates the cost of the tour currently in the p vector:

void check1()
{ int i;
Dist sum = dist1(p[0], p[n-1]);
for (i = 1; i < n; i++)
sum += dist1(p[i-1], p[i]);
save(sum);
}

When a shortest tour is found, its length and permutation are stored in these variables:

Dist minsum;
int minp[MAXN];

This code compares the current sum with the best so far and saves the current tour if it is better:

void save(Dist sum)
{ int i;
if (sum < minsum) {
minsum = sum;
for (i = 0; i < n; i++)
minp[i] = p[i];
}
}

Algorithm 1 takes about 40 lines of C code. On a 200MHz Mips R4400 processor, it finds an eight-city tour in 0.6 seconds, a nine-city tour in 6.3 seconds, and a ten-city tour in 70 seconds. At this rate, it would take a few hours to solve a twelve-city problem, and almost a year to solve a fifteen-city problem--time for some speedups!

Exercise 2. Modify the C code to compute the shortest path through n cities (that is, do not include the cost of returning from the last city to the first city).

Algorithm 2: A Fixed Point

Our first speedup will decrease the number of permutations we examine from n! to (n-1)!, for a speedup factor of n. Algorithm 1 inspects all n! permutations. Algorithm 2 will fix one city at the high end of the p array then inspect the resulting (n-1)! permutations. This algorithm will find the optimal tour (twice, in fact: once forwards, once backwards), but it avoids unnecessary work. The code for solve2 uses the same initialization but calls search1 with the parameter n-1 rather than n:

void solve2()
{ int i;
for (i = 0; i < n; i++)
p[i] = i;
minsum = INF;
search1(n-1);
}

The diagram on page 59 shows the run time of algorithms in seconds on the 200MHz Mips R4400. The inputs consist of n points uniformly distributed over a square. We've seen only Algorithms 1 and 2 so far; we'll see the rest shortly.

I stopped timing each algorithm when its run time exceeded a minute (or when I suspected as much by extrapolation). As predicted, Algorithm 2 is a factor of n faster than Algorithm 1. This lets us solve a problem of size one larger in roughly the same amount of time.

Exercise 3. How would you apply this speedup to the problem of computing paths (rather than tours) described in Exercise 2?

Algorithm 3: Sum Times Not

As Algorithm 2 visits each of the (n-1)! permutations, it computes the cost of the tour from scratch. We can avoid that redundant computation by keeping a new parameter for the sum of the edges included so far in the partial tour. At each recursive call, the search function adds in the cost of the most recent edge:

void search3(int m, Dist sum)
{ int i;
if (m == 1) {
check3(sum + dist1(p[0], p[1]));
} else {
for (i = 0; i < m; i++) {
swap(i, m-1);
search3(m-1, sum +
dist1(p[m-1], p[m]));
swap(i, m-1);
}
}
}

We initialize the cost in the solve3 function:

search3(n-1, ZERO);

The check function has to add in the cost of the edge between the two extreme cities:

void check3(Dist sum)
{ sum += dist1(p[0], p[n-1]);
save(sum);
}

Exercise 4. Mathematically analyze the cost of Algorithm 3 and compare your model with the table of run times.

Algorithm 4: Cheaper Distances

When the cities are geometric points, each distance function involves an expensive square-root operation. Algorithm 3 achieves a speedup by reducing the number of distance calculations. Algorithm 4 achieves further speedup by calculating all possible distances in advance, then replacing calls to the expensive distance function with a cheap table lookup.

We will store the distances in the array:

Dist distarr[MAXN][MAXN];

which is initialized with this code:

void initdist4()
{ int i, j;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
distarr[i][j] = dist1(i, j);
}

The function solve4 must begin with the call initdist4();

In functions check4 and search4, we will replace the dist1 function with the new macro dist4:

#define dist4(i, j) (distarr[i][j])

Exercise 5. Use a cost model like the ones described in the April 1997 column ("Cost Models For Sorting," p. 65) to predict the result of this speedup.

In our program, this gives a speedup factor of five.

A Detour: Even Cheaper Distances?

A venerable speedup replaces floating-point operations with integer operations. For this problem, we preserve the numeric precision by multiplying the floating-point distances by a scaling factor, such as 1,000 for 16-bit shorts or 1,000,000 for 32-bit longs. I timed Algorithm 4 with n=9 across several different Intel processors and across four different typedefs of Dist (double, float, long, and short). Figure 2a reports the run times in seconds.

The first six lines in Figure 2a were generated by the same 16-bit object code; the final line was compiled with a 32-bit compiler. On old machines, floating-point operations were much more expensive than integer operations, and converting to integers would have provided an order-of-magnitude speedup. On modern machines, though, the operation costs don't vary much, so it is not worth scaling floating-point values to integers.

This table does provide a simple benchmark for programs that perform control flow and addition. This little experiment indicates that since the mid-1980s, integer hardware has sped up by a factor of about 200, while floating-point hardware has sped up by a factor of about 2,000.

Algorithm 5: Pruning The Search

"Don't keep doing what doesn't work" is good advice for programs as well as for people. If the sum we have computed so far already exceeds the best sum to date, then we should waste no more time examining this permutation. Adding new cities will not shorten the tour. We can therefore add a single line of code to search5, before any other statement:

if (sum > minsum) return;

Algorithms 1 through 4 have a running time almost independent of their input graphs. Algorithm 5, though, is very sensitive to its input. Figure 2a presents data for inputs distributed uniformly over a square. Figure 2b shows the run time of Algorithm 5 for inputs drawn uniformly from a square, for geometric inputs distributed randomly around the perimeter of a circle, and for random symmetric distance matrices. This pruned search is fairly efficient on uniform data and even more efficient on the two other kinds of inputs.

I spent about half an hour coding Algorithm 1 and several hours later that night hacking my way to Algorithm 5. At the price of a few dozen lines of code, Algorithm 5 is significantly faster than Algorithm 1. While the initial program would take almost a year to solve a problem of size n=15, the final code can do the job in about 20 minutes.

And Faster Yet

In the late 1970s, Christos Papadimitriou of Berkeley kindly warned me that "the Traveling Salesman Problem is not just a problem, it's an addiction." After a pleasant night's coding, the bug bit me again, and I spent large chunks of the next week writing TSP code. The result was Algorithms 6, 7, and 8, whose run times are summarized in Figure 2c. The run times are for finding tours among the first n cities of a single set of 30 points, distributed uniformly over a square.

Exercise 6. Explain why adding a point sometimes decreases the run time.

Algorithm 8 can solve a problem of size n=30 in a couple of minutes. The diagram below, for instance, shows a set of 30 points and an optimal tour through them.

Algorithm 1 would require about 160 billion billion [sic] years to compute this tour, or about ten billion times the age of the universe. Don't hold your breath.

This column does not have enough space for details, so we'll just skim the final three programs. Algorithm 6 uses the minimum spanning tree (described in the October 1996 column, "A Network Design Algorithm," p. 89) as a lower bound on the length of the tour through the points not yet fixed in the permutation. We therefore prune the search with code like

if (sum + mstdist() > minsum) return;

This took a few dozen lines of code.

When I observed that the same minimum spanning trees were frequently recomputed, I stored their values in a hash table; the result was Algorithm 7, at the cost of a couple dozen lines of code.

Algorithm 8 orders the nodes to search nearest cities first (using about a dozen lines of code). This made only a slight speedup over Algorithm 7 on points distributed uniformly over a square, but it made a huge difference on other kinds of data sets. Figure 3 shows the run time of Algorithm 8 on the three types of inputs described earlier.

Principles

Speedups. The past decade has seen incredible speedups in computer hardware: two or three orders of magnitude. Really monumental speedups, though, come from algorithms.

Algorithmic Tricks. We started with the simple recursive Algorithm 1. Avoiding redundant computations led to Algorithms 2, 3, 4, and 7. Pruning the search led to Algorithms 5, 6, and 8; the resulting search algorithms are typically efficient but unpredictable.

Faster And Faster, But Not Fast Enough. Algorithm 8 is a lot faster than Algorithm 1, but it still is exponential. We'll never be able to use it to solve really large problems.

Further Reading

I have only scratched the surface; one could write a book on the Traveling Salesman Problem. In fact, four have. In 1985, John Wiley & Sons (New York) published The Traveling Salesman Problem by E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, and D.B. Shmoys. It is an invaluable reference for this fundamental problem.

Solutions To Selected Exercises

2. In the check1 function, replace the definition of sum with

Dist sum = 0;

3. Because the endpoints have special significance, we cannot immediately apply this speedup to computing paths.

6. Although the cost generally grows as the problem size increases, adding a new point sometimes allows more effective pruning.

Jon Bentley is a Member of Technical Staff in the Computing Science Research Center at Bell Labs.

Figure 1

**Run Times Of Early Algorithms**
	ALGORITHM
N	1	2	3	4	5
7	.07
8	.63	.08	.04
9	6.31	.70	.31	.06
10	69.68	6.97	2.81	.57	.10
11		76.21	29.12	5.74	.40
12				63.08	1.91
13					13.71
14					74.86

Figure 2
Run-Time Comparisons

**2a.**
	Data Type
Processor	MHz	double	float	long	short
8086	9.54	117.53	109.51	13.85	12.72
386	20	30.77	22.31	2.58	2.64
486SX	33	15.11	14.45	.91	.82
486DX	33	.99	.93	.77	.71
Pentium	90	.19	.16	.16	.14
Pentium Pro	200	.15	.14	.14	.13
Pentium Pro/32	200	.06	.06	.06	.06

**2b.**
N	11	12	13	14
Uniform	.04	1.91	13.71	74.86
Circle	.34	.97	4.24	14.29
Matrix	.33	.49	1.10	3.80

**2.c**
	ALGORITHM
N	5	6	7	8
11	.40	.04
12	1.91	.07
13	13.71	.08	.02	.01
14	74.86	.43	.07	.04
15		1.15	.14	.02
16		1.94	.21	.04
17		7.19	.69	.14
18		10.82	.92	.36
19		27.27	2.14	.76
20		5.35	.42	.13
21		70.56	4.88	1.05
22		52.31	3.16	.60
23		49.52	2.61	1.09
24			50.38	24.14
25			166.64	86.20
26			36.20	23.80
27			92.85	60.42
28				193.00
29				137.61
30				137.92

Figure 3

**Algorithm 8 On Three Distributions**
N	20	21	22	23	24
Matrix	.73	2.14	3.16	45.07	203.46
Uniform	.13	1.05	.60	1.09	24.14
Circle	.14	.12	.13	.12	.12

This article first appeared in the June 1997 issue of UNIX Review.

Home | Top | Editor's Choice

Copyright © 2001 UnixReview.com, UnixReview.com's Privacy Policy
Comments about the website: rreames@cmp.com
SDMG Web Sites: C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, Sys Admin, SD Expo, SD Magazine, UnixReview.com, Windows Developer's Journal