US History Traveling Salesman Problem

This large instance of the TSP was solved in November 2016.
See the tour in a 38-second movie.
An interactive map can be found on the Tour page.

The Traveling Salesman Problem

Ever have to plan a road trip? That's the traveling salesman problem, or TSP for short. As a mathematics challenge, the TSP is crazy hard. It's the poster child for the world of complexity, explaining that, despite what we may hope, computer speed alone will never solve all the problems dished out by business, engineering, and science.

The reputation of the TSP challenge is summed up by IBM.

The "Traveling Salesman" problem--which has long defied solution by man, or by the fastest computers he uses.
-- IBM Press Release, 1964

And, more colorfully, by Twitter user @FrankLynchBkln.

The Traveling Salesman Problem? You can't HANDLE the Traveling Salesman Problem! #JackNicholsonDoesMath
-- @FrankLynchBkln, March 2, 2014

There is even a thriller movie set around the world-changing consequences of a fast solution method for the TSP.

The problem is certainly simple to state. You are given a list of places to visit and you need to find the shortest-possible route to hit all your stops and return home safely. Easy. If the list is short.

With three or four sites to visit you can visualize the routes in your head and zoom in on the best one. For a dozen or so stops, a simple computer algorithm can, in a snap, check every possible tour for you, one by one. Beyond that, things get tricky.

So, for the mathematical challenge, it's go big or go home. That's why a report on a road trip to 50 USA landmarks went viral in the spring of 2015. Even with the world's fastest supercomputer, there is no way you can check all tours through 50 points. The story appeared in publications around the world, including an article in the Washington Post and a radio interview on NPR's Weekend Edition.

The author of the 50-landmarks study, Randal Olson, aimed to find only a good solution, rather than the one tour that was better than every other. This is an important area of research, but it's not the one that has "long defied solution by man." To really solve the TSP, you need to produce a perfect, shortest-possible, route to visit your target destinations. And, this is the most tricky part, to know that you have a perfect route.

Going Big

The research communities haven't been idle in the 52 years since IBM's press release. Far from it. The TSP is one of the most intensely studied problems in applied mathematics. And progress has been made, but always on the geometric version, where we are allowed to travel in straight lines from point to point.

Now, if you have spent even a day behind the wheel of a car or walking streets and trails, you know that a typical trip from A to B is not going to follow Euclid's recommended straight line. But how can a humble mathematician possibly know every shortcut, slipping between buildings and along dark allies, to find the absolute best way to reach a single destination, let alone tackle the road TSP itself?

The idea of Randal Olson was to rely on the fantastic service provided by Google Maps for point-to-point travel distances. Ask Google for the shortest way to walk from A to B and it will respond with excellent step-by-step directions. The level of detail covered by Google Maps is amazing.

Following Olson's work, a number of people created similar Google-based tours in the US and abroad. The largest of these was a route through 200 Tesla Superchargers in the United States. That is great, but how far can this be pushed? What are the limits when it comes to this uncrackable problem?

To explore what might be possible, we created two, much larger, test instances. The first, as a warm-up exercise, lists the locations of some 24,727 pubs in the United Kingdom. For the real target, we gathered 50,000 sites from the National Register of Historic Places. And, trying to make mathematics great again, we aimed to find perfect solutions to the problems, that is, shortest-possible routes.

The National Register has over 90,000 sites, but we filtered out some of these to get down to our target of 50,000. We actually went overboard, as we describe on the Data page, and ended up with 49,603 distinct points.

Earlier this year, our techniques solved the 24,727-pubs problem, but that computation didn't quite prepare us for the larger challenge. Solving the 49,603-point problem stretched current knowledge of the mathematics of the TSP nearly to its breaking point. The final piece of the computation ran from March through November on a computer cluster at the University of Waterloo, using every one of its 310 processors that was not otherwise busy with day-to-day projects. The total computing time, adding together the contributions of all the processors, was 178.9 years. (In comparison, the UK pubs problem took a total of 0.8 computing years.)

Let me be more precise about the problem we have solved. Our data base has the geographic coordinates for 49,603 points from the National Register. Measuring the distance between two points as the length of the walking route produced by Google Maps, what is the shortest-possible tour that visits all 49,603 historic places and returns to the starting point?

Well, almost. We need to make one final assumption. It sounds like something only a mathematician would consider, but we have to assume that the route Google suggests for walking between any two points A and B is no shorter than the length of a route a smart crow would fly. This makes it conceivable to solve the problem without actually asking Google for the distance to travel between each pair of points, an important consideration since there are 1,230,204,003 pairs and Google puts a cap of 2,500 distance requests per day.

This is the problem we have solved. The optimal tour has length 350,201,525 meters, or slightly more than 217,605 miles. It is a little less than the distance to the moon. Hiking to all of the 49,603 stops would be, as they say, a good step. But to be clear, our main result is that there simply does not exist any tour that is even one meter shorter (measuring the length using the distances we obtained from Google) than the one produced by our computation. It is the solution to a 49,603-city road TSP.

Mathematics+Engineering

The work was carried out over the past two years. In solving the two problems we, of course, did not have in mind to bring everything mathematics and engineering have to bear in order to plot life-defining vacations for history buffs and pub crawlers. Rather, we use the road TSP challenge as a means for developing and testing general-purpose optimization methods. The world has limited resources and our aim is to create tools to help us to use these resources as efficiently as possible. The work falls under the applied mathematics fields called mathematical optimization and operations research.

For general information on mathematical modeling and its impact on industry, commerce, medicine, and the environment, we point you to a number of societies that support mathematics research and education: American Mathematical Society, Mathematical Association of America, Mathematical Optimization Society, INFORMS (operations research), London Mathematical Society, and SIAM (applied mathematics).

Research Team

William Cook, Combinatorics and Optimization, University of Waterloo, Canada
Daniel Espinoza, Gurobi Optimization, USA
Marcos Goycoolea, School of Business, Universidad Adolfo Ibanez, Chile
Keld Helsgaun, Computer Science, Roskilde University, Denmark

The Tour

For a quick view of the optimal tour, click here for a high resolution drawing or have a look at the following 38-second video.

But to really get to know the route, the best bet is to play around with one of the interactive maps we have created with tools provided by Google Maps.

The full 49,603-stop route makes for a large map that may overwhelm your Web browser, particularly if you are viewing the page on a smart phone or tablet. With that word of warning, you can find the map along with viewing tips on the Tour page. For easier to load maps, please have a look at the States page, where we give interactive snapshots of the tour as it passes from state to state.

And if you decide to go for the long walk, be careful boarding the ferry boats. Google directions sometimes have you turning the boat around on a dime when you reach your stop, such as the middle of the Bay Bridge in San Francisco. If you don't have your own boat, just have a look at the site from the shoreline.

Optimality

How do we know the tour is the shortest possible? Clearly we did not check every tour, one by one by one. Indeed, the first thing you learn about the TSP is that it is impossible to solve in this way. If you have N cities, then, starting from any point, you have N-1 possibilities for the second city. Then N-2 possibilities for the third city, and so on. The total number of tours is obtained by multiplying these values: N-1 x (N-2) x (N-3) x . . . x 3 x 2 x 1. Now this is a big number. For the US history problem, it is roughly 3 followed by 211,367 zeroes, as computed by WolframAlpha. That is in an unimaginably large number of possibilities. Even for 50 cities, the world's fastest supercomputer has no hope of going through the full count of tours one by one to pick out the shortest.

But this by itself does not mean we can't possibily solve an example of the TSP. If you have 50 words to put into alphabetical order, you don't worry about the 50 x 49 x 48 x ... x 3 x 2 x 1 possible lists you could create. You just sort the words from first to last and build the one correct list among the huge number of possibilities.

For the TSP we don't know of any simple and fast solution method like we have for sorting words. And, for technical reasons, it is believed that there may be large, nasty TSP examples that no one can ever solve. (If you are interested in this and could use an extra $1,000,000, check out the P vs NP problem.) But if you need to plot a 50-point route for a holiday or to compute the order of 1,000 items on a DNA strand, then mathematics can help, even if you need the absolute shortest-possible solution.

The way to proceed is via a process known as the cutting-plane method. If you have twenty minutes to spare, there is a video explaining the method and how it is used to solve the TSP (in the pleasing voice of Siri). If you are in a hurry, here is how I try to describe the process in a short piece in Scientific American

The idea is to follow Yogi Berra's advice "When you come to a fork in the road, take it." A tool called linear programming allows us to do just this, assigning fractions to roads joining pairs of cities, rather than deciding immediately whether to use a road or not. It is perfectly fine, in this model, to send half a salesman along both branches of the fork.

The process begins with the requirement that, for every city, the fractions assigned to the arriving and departing roads each sum to one. Then, step-by-step, further restrictions are added, each involving sums of fractions assigned to roads. Linear programming eventually points us to the best decision for each road, and thus the shortest-possible route.

Our computations for the US history and UK pubs problems used a beefed-up version of the Concorde implementation of the TSP cutting-plane method. Even if you are in a hurry, you might want to see for yourself how the process solves smaller examples on an iPhone or iPad by downloading the free Concorde App.

In working with road data, we were faced with the additional challenge of finding the correct TSP solution even though we could not possibly ask Google for all 1,230,204,003 point-to-point distances. To handle this, we ran the cutting-plane method in tandem with a beefy variant of Keld Helsgaun's LKH code.

LKH combines a powerful local-search technique with a genetic algorithm to produce a high-quality tour, say of length U. Along the way, LKH discovers pairs of stops that look promising to include in any short tour, so for these pairs we ask Google for the correct walking distances.

While this is going on, Concorde's cutting-plane method finds a fractional tour of value L. From the way this is constructed with linear programming, we know for sure that no TSP tour can have value less than L. During this process, Concorde also discovers pairs of stops that look promising, in this case for fractional solutions, so we ask Google also for these distances.

Any new data obtained from Google is shared between LKH and Concorde, while both codes continue to look for better results. That is, we aim to decrease the value of U by finding better tours, and we aim to increase the value of L by adding further restrictions to the fractional linear-programming model. At any point, we know the optimal tour length is trapped between L and U, that is, we know the difference between the length of our tour and the length of an optimal tour is at most U - L. The name of the game is to reduce this gap U - L as quickly as we can.

Eventually, in the US history computation, the algorithms inside LKH and Concorde became satisfied that they had an adequate collection of Google distances, LKH could find no further improvements in its tour, and the cutting-plane method in Concorde could only produce tiny improvements in the value of its fractional tour. At this point, we had L = 350,176,226 and U = 350,201,525, and thus a gap of 25,299 meters.

To finish off the problem, we then turned to Concorde's branch-and-bound search procedure. In this process, the collection of tours is repeatedly subdivided and the cutting-plane method is applied to the resulting TSP subproblems. The simplest form of the division is to select a pair of locations, A and B, and consider first only tours where the two locations are visited consecutively, then consider only tours where, between the stops at A and B, we visit at least one other stop along the way. This selection divides the set of all tours neatly into two subsets.

In this final phase of the computation, we processed a collection of 830,505 subproblems. That is a very large number, greatly exceeding any of our previous computations. For example, in the solution of the UK pubs problem we processed a collection of 4,231 subproblems. And in solving the largest-ever geometric TSP, having 85,900 points arising in a computer-chip application, the final phase required only 1,239 subproblems. The US history problem was definitely a beast.

Click here to see a drawing of the search tree for the 49,603-point problem, where the position of a subproblem corresponds to the value of its fractional tour.

Data

If you are interested in creating your own US history tour, the best bet for data is to go back to the original sources, the United States National Register of Historic Places listings for locations and Google Maps for up-to-date walking distances. But the information provided by these sources changes over time. Therefore, to document the 49,603-stop TSP instance we have solved, we provide the raw data needed to reproduce the travel distances on the Data page.

Tour-Finding History

Early computational studies focused on the most natural class of salesman problems: select an interesting group of cities, look up the point-to-point distances in a road atlas, and have a go at finding the shortest tour. Record-setting solutions were found by legendary figures in applied mathematics and computer science.

49 USA cities in 1954 by George Dantzig, Ray Fulkerson, and Selmer Johnson.
57 USA cities in 1970 by Michael Held and Richard Karp.
120 German cities in 1977 by Martin Groetschel.

The first reference, in particular, is widely viewed as the most important paper in the history of the broad fields of discrete optimization and integer programming. The links are to technical research papers. For lighter viewing, have a quick look at our Road Trips page.

In the late 1970s, the focus switched to geometric examples of the TSP, where cities are points drawn on a sheet of paper and travel is measured by straight-line distances. The reasons were twofold. First, with over 100 stops it became difficult to obtain driving distances along road networks: printed road atlases included distances only for major cities. Second, there were classes of industrial problems that neatly fit into the geometric TSP setting. Indeed, the next world record, set in 1980 by Harlan Crowder and Manfred Padberg, consisted of locations of 318 holes that had to be drilled into a printed-circuit board.

Geometric TSP instances, arising in applications or from geographic locations, were gathered together in the TSPLIB by Gerhard Reinelt. This collection became the standard testbed for researchers. The largest of the instances is the 85,900-point problem we mentioned earlier. It arose in a VLSI application and was solved by Applegate et al. in 2006.

These geometric data sets are worthy adversaries, but the large industrial instances have points clustered into straight lines. These examples are punching below their weight, likely missing aspects of the complexity of the road TSP problems.

Acknowledgements

Google Maps provided the interface between the real world and the abstract mathematical model of the TSP. The engineers at Google do all of the heavy lifting in dealing with paths, roads, traffic circles, construction sites, closures, detours, and on and on.

The National Park Service of the U.S. Department of the Interior maintains the National Register of Historic Places. The geographic locations for our data set were obtained from the excellent Wikipedia United States National Register of Historic Places listings.

The huge number of linear-programming models that arose in the computation were solved with the IBM CPLEX Optimizer. Many thanks to IBM for making their great software freely available for academic research.

This research was carried out in part during the Combinatorial Optimization trimester program at the Hausdorff Research Institute for Mathematics in Bonn, Germany.