Slice of StarHorse Tour


265,637,087 Stars

StarHorse


A teaser poster provides quick answers to "How many stars will be in the second Gaia data release?" The central figure, from the TSP perspective, is 1,331,909,727. That's the amazing count of Gaia DR2 stars that come equipped with parallax data. With some fancy math and statistics, Bailer-Jones et al. used this parallax information to infer distances out to nearly all of them. Their data set is the source of our Gaia DR2 TSP instance.

In the abstract to their research paper, Bailer-Jones et al. write the following.

"Although more precise distances can be estimated for a subset of the stars using additional data (such as photometry), our goal is to provide purely geometric distance estimates ..."

This nicely sets up the work of the StarHorse 2019 team of F. Anders et al. Their project did just as Bailer-Jones et al. suggest, using several photometric catalogues, together with the Gaia DR2 parallaxes, to produce 3-dimensional position data for 265,637,087 of the Gaia DR2 stars. Brilliant. Gives another great challenge for TSP solvers.

You can find details of the StarHorse point set on the data page.

As you can guess, we have a tour. The heuristic-search algorithm we are employing continues to find shortcuts every day, gradually decreasing the total length. As of December 20, 2023, that length is 3,687,253,489.0 parsecs. You can be confident this is not a best-possible result. But how close is it to optimal?

To answer this we created a linear-programming model to produce a lower bound on the length of any tour. Best to point out that it is a large model. Maybe not the largest ever constructed, but likely on the podium for the most challenging LP model to be tackled by a general-purpose solver. It's statistics are frightening.
      • 3,496,444,221 variables
      • 476,514,827 constraints
      • 50,895,395,843 non-zeros in the constraint matrix
The only reason we hoped for a solution is the work of Google Reseach on a new LP solver called PDLP. It was a long wait for Dave and the Google team to make the finishing touches on the solver interface, but on April 3, 2023 we received the news that PDLP produced a (dual) solution. The computation at Google ran on a network of 1,000 worker computers, each handling parts of the constraint matrix. Dave wrote that this is the largest model ever solved by PDLP.

With the provided (dual) solution, we were able to prove that no tour through the 265,637,087 stars can have length less than 3,686,403,414.5 parsecs. So our tour is no more than 0.00023 times longer than a short-possible route. Very satisfying. Of course, it remains a dream to improve the tour and LP model to push the quality guarantee under 0.0001. Maybe one day.

Research team

  • David Applegate, Google Research
  • William Cook, University of Waterloo and Johns Hopkins University
  • Keld Helsgaun, Roskilde University

Notes

Links to lots of information on the mathematical techniques we have employed can be found on the TSP home page and on the LKH web site.

Details on the point set for the star TSP instance are given on the Data page; images can be found on the Tour page.

The PLDP solver is described in the research paper Practical large-scale linear programming using primal-dual hybrid gradient by D. Applegate, M. Díaz, O. Hinder, H. Lu, M. Lubin, B. O'Donoghue, W. Schudy.

Computations were carried out on systems located at Google Research, Roskilde University, and the University of Waterloo. We thank our institutions for their support.