UK Pubs Traveling Salesman Problem

Update: In March 2018 we computed the shortest tour to nearly every pub in the UK. That is 49,687 pints, more than twice the number we had in the 2016 tour.

Nearly everyone in the UK knows by heart the best path to take them over to their favorite public house. But what about jotting down the shortest route to visit every pub in the country and return home safely? That is what we set out to do.

Okay, maybe every pub is overstating the goal. Pubs in the UK are closing shop or starting up, fresh and new, all of the time. Any route would be out-of-date by the time it was created. So we set a more modest goal: find the shortest route to visit some 24,727 stops found on the great Web site Pubs Galore - The UK Pub Guide.

This is a concrete target. But still an overstatement. Only a real local could possibly know every shortcut, slipping between buildings and along dark allies, to find the absolute best way to reach The Fiddler's Elbow or The Bald Faced Stag. This is well out of reach for a humble team of mathematicians.

Here we rely on the fantastic service provided by Google Maps. Ask Google for the shortest way to walk from The Elbow over to The Stag and it will respond with excellent step-by-step directions. The level of detail covered by Google Maps is amazing.

So this is our challenge. Using geographic coordinates of 24,727 pubs provided by Pubs Galore and measuring the distance between any two pubs as the length of the route produced by Google Maps, what is the shortest possible tour that visits all 24,727 and returns to the starting point?

Well, almost. We need to make one final assumption. It sounds like something only a mathematician would consider, but we have to assume that the route Google suggests for walking between The Fiddler's Elbow and The Bald Faced Stag is no shorter than the route a smart crow would fly. This makes it conceivable to solve the problem without actually asking Google for the distance to travel between each pair of pubs, an important consideration since there are 305,699,901 pairs and Google puts a cap of 2,500 distance requests per day.

This is the problem we have solved. The optimal tour has length 45,495,239 meters. To be clear, our main result is that there simply does not exist any pub tour that is even one meter shorter (measuring the length using the distances we obtained from Google) than the one produced by our computation. It is the solution to a 24,727-city traveling salesman problem (TSP).

The UK Pubs tour is easily the largest such road-distance TSP that has been solved to date, having over 100 times more stops than any road-distance example solved previously by other research groups.

The Big Picture

The work was carried out over the past two years. We, of course, did not have in mind to bring everything mathematics has to bear in order to improve the lot of a wandering pub aficionado. Rather, we use the UK pubs problem as a means for developing and testing general-purpose optimization methods. The world has limited resources and the aim of the applied mathematics fields of mathematical optimization and operations research is to create tools to help us to use these resources as efficiently as possible.

For general information on mathematical modeling and its impact on industry, commerce, medicine, and the environment, we point you to a number of societies that support mathematics research and education: American Mathematical Society, Mathematical Association of America, Mathematical Optimization Society, INFORMS (operations research), London Mathematical Society, and SIAM (applied mathematics).

Research Team

William Cook, Combinatorics and Optimization, University of Waterloo, Canada
Daniel Espinoza, Gurobi Optimization, USA
Marcos Goycoolea, School of Business, Universidad Adolfo Ibanez, Chile
Keld Helsgaun, Computer Science, Roskilde University, Denmark

The Tour

It is not easy to convey the structure and complexity of the optimal tour. A list of the 24,727 pubs, one after the other, in the correct order, resembles a good-sized phone book. Perhaps the best way to get a quick view is to look at a line drawing, where the solution is displayed without indicating the many destinations.

You see that we obviously cannot walk several of the indicated routes: to reach the Isle of Man, Northern Ireland, and the islands of Scotland, the tour uses scheduled passenger-ferry routes provided by Google's direction services.

To show a detailed view, we make use of the Google Maps drawing tools to display an interactive version of the tour, where you can zoom in and pan from one region to another. The link is given below, but first a word of warning: the map contains a great deal of information and it can take a minute or so to load. We provide tips for using the map on the tour page.

If the map refuses to load for you, please have a look at the tour page for high-resolution screen shots, as well as further information about the route.

Optimality

How do we know the tour is the shortest possible? Clearly we did not check every tour, one by one by one. Indeed, the first thing you learn about the TSP is that it is impossible to solve in this way. If you have N cities, then, starting from any point, you have N-1 possibilities for the second city. Then N-2 possibilities for the third city, and so on. The total number of tours is obtained by multiplying these values: N-1 x (N-2) x (N-3) x . . . x 3 x 2 x 1. Now this is a big number. For the pubs problem, it is roughly 1 followed by 100,000 zeroes. That is in an unimaginably large number of possibilities. Even for 50 cities, the world's fastest supercomputer has no hope of going through the full count of tours one by one to pick out the shortest.

But this by itself does not mean we can't possibily solve an example of the TSP. If you have 50 words to put into alphabetical order, you don't worry about the 50 x 49 x 48 x ... x 3 x 2 x 1 possible lists you could create. You just sort the words from first to last and build the one correct list among the huge number of possibilities.

For the TSP we don't know of any simple and fast solution method like we have for sorting words. And, for technical reasons, it is believed that there may be large, nasty TSP examples that no one can ever solve. (If you are interested in this and could use an extra $1,000,000, check out the P vs NP problem.) But if you need to plot a 50-point route for a holiday or to compute the order of 1,000 items on a DNA strand, then mathematics can help, even if you need the absolute shortest-possible solution.

The way to proceed is via a process known as the cutting-plane method. If you have twenty minutes to spare, there is a video explaining the method and how it is used to solve the TSP (in the pleasing voice of Siri). If you are in a hurry, here is how I try to describe the process in a short piece in Scientific American

The idea is to follow Yogi Berra's advice "When you come to a fork in the road, take it." A tool called linear programming allows us to do just this, assigning fractions to roads joining pairs of cities, rather than deciding immediately whether to use a road or not. It is perfectly fine, in this model, to send half a salesman along both branches of the fork.

The process begins with the requirement that, for every city, the fractions assigned to the arriving and departing roads each sum to one. Then, step-by-step, further restrictions are added, each involving sums of fractions assigned to roads. Linear programming eventually points us to the best decision for each road, and thus the shortest possible route.

Our pubs computation used a beefed-up version of the Concorde implementation of the TSP cutting-plane method. Even if you are in a hurry, you might want to see for yourself how the process solves smaller examples on an iPhone or iPad by downloading the free Concorde App.

In working with road data, we were faced with the additional challenge of finding the correct TSP solution even though we could not possibly ask Google for all 305,699,901 pub-to-pub distances. To handle this, we ran the cutting-plane method in tandem with a beefy variant of Keld Helsgaun's LKH code.

LKH combines a powerful local-search technique with a genetic algorithm to produce a high-quality tour, say of length U. Along the way, LKH discovers pairs of pubs that look promising to include in any short tour, so for these pairs we ask Google for the correct walking distances.

While this is going on, Concorde's cutting-plane method finds a fractional tour of value L. From the way this is constructed with linear programming, we know for sure that no TSP tour can have value less than L. During this process, Concorde also discovers pairs of pubs that look promising, in this case for fractional solutions, so we ask Google also for these distances.

Any new data obtained from Google is shared between LKH and Concorde, while both codes continue to look for better results. That is, we aim to decrease the value of U by finding better tours, and we aim to increase the value of L by adding further restrictions to the fractional linear-programming model. At any point, we know the optimal tour length is trapped between L and U, that is, we know the difference between the length of our tour and the length of an optimal tour is at most U - L. The name of the game is to reduce this gap U - L as quickly as we can.

Eventually, in the pubs computation, the algorithms inside LKH and Concorde became satisfied that they had an adequate collection of Google distances, LKH could find no further improvements in its tour, and the cutting-plane method in Concorde could only produce tiny improvements in the value of its fractional tour. At this point, we had L = 45,492,247 and U = 45,495,239, and thus a gap of 2,992 meters.

To finish off the problem, we then turned to Concorde's branch-and-bound search procedure. In this process, the collection of tours is repeatedly subdivided and the cutting-plane method is applied to the resulting TSP subproblems. The simplest form of the division is to select a pair of pubs, say The Black Dog and The Duke of Cornwall in Weymouth, and consider first only tours where the two pubs are visited consecutively, then consider only tours where, between the stops at The Dog and The Duke, we drop in on at least one other pub along the way. This selection divides the set of all tours neatly into two subsets.

In this this final phase of the computation, we processed a large, but manageable, collection of 4,231 subproblems. The total amount of computer time was 305.2 days on a single processor core of a Linux server. We didn't actually have to wait the full 10 months for the results, since we ran the search in parallel on up to 48 cores.

Click here to see a drawing of the search tree, where the position of a subproblem corresponds to the value of its fractional tour. For a closer look, here is a pdf file for the tree.

The computing time was reasonable enough to allow us to run the branch-and-bound phase a second time, using different settings to test the selection rule for the subdivisions. This second run processed a total of 5,687 subproblems in a total of 1381.1 days. Of course, it again produced the same optimal value of 45,495,239 meters.

Data

If you are interested in creating your own local pub tour, the best bet for data is to go back to the original sources, Pubs Galore for locations and Google Maps for up-to-date walking distances. But the information provided by these sources changes over time. Therefore, to document the 24,727-stop TSP instance we have solved, we provide the raw data needed to reproduce the travel distances on the data page.

Tour-Finding History

Early computational studies focused on the most natural class of salesman problems: select an interesting group of cities, look up the point-to-point distances in a road atlas, and have a go at finding the shortest tour. Record-setting solutions were found by legendary figures in applied mathematics and computer science.

49 USA cities in 1954 by George Dantzig, Ray Fulkerson, and Selmer Johnson.
57 USA cities in 1970 by Michael Held and Richard Karp.
120 German cities in 1977 by Martin Groetschel.

The first reference, in particular, is widely viewed as the most important paper in the history of the broad fields of discrete optimization and integer programming. The links are to technical research papers. For lighter viewing, have a quick look at a slide show of these record-breaking results.

In the late 1970s, the focus switched to geometric examples of the TSP, where cities are points drawn on a sheet of paper and travel is measured by straight-line distances. The reasons were twofold. First, with over 100 stops it became difficult to obtain driving distances along road networks: printed road atlases included distances only for major cities. Second, there were classes of industrial problems that neatly fit into the geometric TSP setting. Indeed, the next world record, set in 1980 by Harlan Crowder and Manfred Padberg, consisted of locations of 318 holes that had to be drilled into a printed-circuit board.

Geometric TSP instances, arising in applications or from geographic locations, were gathered together in the TSPLIB by Gerhard Reinelt. This collection became the standard testbed for researchers. The largest of the instances, having 85,900 points arising in a VLSI application, was solved by Applegate et al. in 2006.

The geometric data sets are worthy adversaries, but the large industrial instances have points clustered into straight lines. These examples are punching below their weight, likely missing aspects of the complexity of the road TSP problems.

So, why not take advantage of the modern day atlas provided by Google Maps? One of the first people to put together a TSP challenge instance with Google data was Randal Olson, who found a good tour to visit 50 USA landmarks. His tour was the subject of one of the most heavily reported math stories in 2015, including an article in the Washington Post and a radio interview on NPR's Weekend Edition.

Following Olson's work, a number of people created similar touring problems in the US and abroad. The largest of these was a route through the 200 Tesla Superchargers in the United States. When I wrote that the UK pubs problem was a factor of 100 larger than previously solved road-TSP examples, it was in reference to this work by Mortada Mehyar.

Okay, the factor of 100 is not really true. While building up the expertise, algorithms, and software to tackle the UK pubs example, we solved a number of smaller instances along the way. The largest of these has 3,100 points in the US. But these were solved with the bigger target in mind.

Our final goal is even larger: a shortest-possible walking tour through 49,603 stops from the US National Register of Historic Places. This problem is quite a beast. We currently have a tour of length 350,201,525 meters. That is a little less than the distance to the moon. But we don't know if this is actually the shortest tour. All we can say at this point (October 19, 2016) is that there you definitely have to walk at least 350,201,329 meters to reach all 49,603 stops. So there might possibly be a tour that is 196 meters shorter than our tour. Ouch! Close is just not good enough.

Update: It took 178 years of total computing time, but have finally solved the 49,603-stop US History Tour.

Acknowledgements

Google Maps provided the interface between the real world and the abstract mathematical model of the TSP. The engineers at Google do all of the heavy lifting in dealing with paths, roads, traffic circles, construction sites, closures, detours, and on and on.

Pubs Galore - The UK Pub Guide is the source for the locations of the stops on our TSP tour. No matter where you are in the UK, the Pubs Galore site will help you find a cozy place for a meal and a drink.

The huge number of linear-programming models that arose in the computation were solved with the IBM CPLEX Optimizer. Many thanks to IBM for making their great software freely available for academic research.

This research was carried out in part during the Combinatorial Optimization trimester program at the Hausdorff Research Institute for Mathematics in Bonn, Germany.