Many thanks to the ESA Gaia mission and to the Max Planck Institute for Astronomy team for making their Gaia DR2 positioning estimates readily available. You can find links to the information on the "Estimating distances from parallaxes. IV. Distances to 1.33 billion stars in the Gaia data release 2" page, run by Coryn Bailer-Jones.
To build the Gaia DR2 TSP instance, we downloaded from the Bailer-Jones site their entire catalogue, extracting the source_id and rest point distance estimate for each of the 1,331,909,727 entries. Their research paper provides the following description.
The "essentially all" phrase refers to the fact that for 3,278 of the stars rest is labeled as NaN (not-a-number); they are included in the data base for completeness, but obviously cannot be part of the TSP instance. Excluding these, we are left with a whopping 1,331,906,450 points, including an entry for the Sun.
Using the source_id for each star, we paired the distance estimates with Gaia DR2 positioning data, described in Section 14.1.1 of the DR2 documentation.
The two values place the star at a point on a sphere centered at Earth. (Clear explanations are given on the Right ascension and Declination Wikipedia pages.) To obtain xyz coordinates, move on a straight line, starting at Earth, in the direction of the star's spherical location and stop when you reach the Bailer-Jones et al. distance estimate.
If you are replicating the Gaia DR2 TSP data, as a check here is a file with the xyz coordinates for the first 1,000 points, each with the corresponding DR2 source_id. Note that we scaled the coordinates to put them in units of 1/10th parsecs, due to the TSPLIB rounding, described below.
Keep in mind that the 3D positions of the stars are rough estimates. For example, the data set has 21 stars with estimated distance less than 1 parsec from Earth, even though our nearest neighbor, Proxima Centauri, is out there at 1.3 parsecs. Concerning this, Bailer-Jones et al. write "The smallest values of rest are almost certainly the result of spurious parallaxes." So if you are planning to go star hopping, better first contact The Culture for an up-to-date 3D map. But for the TSP, the data set gives an exciting optimization challenge at an almost unimaginable scale.
The Gaia DR2 point set contains a number of conical structures that seem unlikely to be real features of the galaxy. You can see several of these in the images displayed below.
In the following quote from Section 3.4 of the Bailer-Jones et al. paper, the authors explain that these points arise from observing stars in the Large and Small Magellanic Clouds, two dwarf galaxies that are actually much further away than any of the point estimates given by their mathematical model.
This might cause confusion for a future star traveler, but, for the TSP challenge, it adds extra spice to the point set.
To create an instance of the TSP, we need to specify precisely the point-to-point distances we use. For this we adopt the standard TSPLIB norm for 3D Euclidean data. This norm takes the straight-line distance between two points and rounds the resulting value to the nearest integer. In our case, the star-to-star distance is therefore measured to the nearest 1/10th parsec. Here is a simplified version of the computer code used in Concorde for the distance calculation.