# 10,000,000 Stars

To gain experience with large TSP instances, on the way towards the 1.33-billion-point Gaia DR2 monster, we pulled out as separate challenges the 10,000,000 and 100,000,000 stars nearest to Earth.

Our best tour for the 10,000,000-point example has length 13,020,706.2 parsecs. You can find images of the route on the tour page, including interactive views (if you are willing to wait a couple of minutes to download the large data file).

The tour is likely not the shortest-possible route through the 10 million stars, but it's close. Using linear-programming duality, we proved there is no tour of length less than 13,020,337.6 parsecs. So the tour can be shortened by at most 368.6 parsecs. That's less than a factor of 0.00003.

Those with a keen eye will notice that although 0.00003 is a small number, it's four times larger than the 0.0000074 guarantee we achieved for the Gaia DR1. Well, size matters here. An immediate difficulty in this new computation is that the LP models are too large to handle with current implementations of the simplex algorithm. So we were not able to deploy a traditional cutting-plane method, involving the solution of a long sequence of LP models, obtained by gradually adding more and more linear constraints. (For example, the computations used to obtain the Gaia DR1 bound include the solution of north of 34,000 LP models with the simplex algorithm.)

For the 10,000,000-point example, we employed a barrier LP solver, implemented in the commercial codes CPLEX and Gurobi. These solvers are exceptionally good at handling large-scale LP models with parallel computation. Although barrier methods are not well-suited for the long sequences of closely related models that arise in the cutting-plane method (for those, the simplex algorithm is the clear champion), they are great for a single LP model, or, in our case, a small number of models.

To obtain the 0.00003 optimality guarantee, we solved a sequence of 8 LP models on the full star10m data set. The inequalities added in each iteration were obtained by running, in parallel, the cutting-plane method (with the simplex algorithm) on 300 or more subproblems with up to 500,000 points each, gathering the inequalities together, and handing them over to the barrier solver.