In their paper "Photo-astrometric distances, extinctions, and astrophysical parameters for Gaia DR2 stars brighter than G = 18", Anders et al. provide the following quick description of the StarHorse 2019 data set.
A great touch is that their database contains explict xyz-coordinates for all 265,637,087 stars. Perfect for a 3D TSP challenge!
The full StarHorse 2019 catalogue is available for download on the AIP data service page (doi:10.17876/data/2019_1). It contains 36 fields of data for each of its stars, including lots of information other than the 3D positions we need for the TSP instance. Those positions are stored in fields 32, 33, and 34, labeled XGal, YGal, and ZGal. The values are given in kiloparsecs, so you will need to scale them by 10000 to change the units to 1/10th parsecs. As a check, here is a file with the resulting coordinates for the first 1,000 points.
Note that the coordinates are Galactocentric, that is, the (0,0,0) point is the center of the Milky Way. This is in contrast to the other Star TSP instances, where the coordinate system places the Sun at the (0,0,0) point. This shift does not impact the TSP solution.
Like the Gaia DR2 point set, the StarHorse collection contains several conical structures that are unlikely to be real features of the galaxy. The middle image in the Snapshots section below gives a dramatic view of two cones shooting out from the main grouping of stars. The authors of the study explain these artifacts in the following passage from Section 4.4 of their research paper.
The cones are just cool views of the Large and Small Magellanic Clouds, two dwarf galaxies that are actually much further away than any of the point estimates given by the models and available data.
To create an instance of the TSP, we need to specify precisely the point-to-point distances we use. For this, we adopt the standard TSPLIB norm for 3D Euclidean data. This norm takes the straight-line distance between two points and rounds the resulting value to the nearest integer. In our case, the star-to-star distance is therefore measured to the nearest 1/10th parsec. Here is a simplified version of the computer code used in Concorde for the distance calculation.
The video zooms out through the points, giving you a feeling for the size of the data set.