Slice of StarHorse tour

sh265637087 data

In their paper "Photo-astrometric distances, extinctions, and astrophysical parameters for Gaia DR2 stars brighter than G = 18", Anders et al. provide the following quick description of the StarHorse 2019 data set.

"Combining the precise parallaxes and optical photometry delivered by Gaia's second data release with the photometric catalogues of Pan-STARSS1, 2MASS, and A11WISE, we derived Bayesian stellar parameters, distances, and extinctions for 265 million of the 285 million objects brighter that G = 18."

A great touch is that their database contains explict xyz-coordinates for all 265,637,087 stars. Perfect for a 3D TSP challenge!

Getting the Data

The full StarHorse 2019 catalogue is available for download on the AIP data service page (doi:10.17876/data/2019_1). It contains 36 fields of data for each of its stars, including lots of information other than the 3D positions we need for the TSP instance. Those positions are stored in fields 32, 33, and 34, labeled XGal, YGal, and ZGal. The values are given in kiloparsecs, so you will need to scale them by 10000 to change the units to 1/10th parsecs. As a check, here is a file with the resulting coordinates for the first 1,000 points.

Note that the coordinates are Galactocentric, that is, the (0,0,0) point is the center of the Milky Way. This is in contrast to the other Star TSP instances, where the coordinate system places the Sun at the (0,0,0) point. This shift does not impact the TSP solution.

Magellanic Clouds

Like the Gaia DR2 point set, the StarHorse collection contains several conical structures that are unlikely to be real features of the galaxy. The middle image in the Snapshots section below gives a dramatic view of two cones shooting out from the main grouping of stars. The authors of the study explain these artifacts in the following passage from Section 4.4 of their research paper.

"Especially in this panel we note two overdensities in the direction of the Magellanic Clouds. These are mostly composed of stars belonging to the Clouds that have been forced to smaller distances by our Milky Way prior (which does not contain any extragalactic stellar population, only a smooth halo with a power-law density). The results for these stars have not been excluded from our analysis, but should be used with caution. The same is true for other nearby galaxies with resolved stellar populations, such as the Sagittarius dSph, Fornax, etc."

The cones are just cool views of the Large and Small Magellanic Clouds, two dwarf galaxies that are actually much further away than any of the point estimates given by the models and available data.

Computing Distances

To create an instance of the TSP, we need to specify precisely the point-to-point distances we use. For this, we adopt the standard TSPLIB norm for 3D Euclidean data. This norm takes the straight-line distance between two points and rounds the resulting value to the nearest integer. In our case, the star-to-star distance is therefore measured to the nearest 1/10th parsec. Here is a simplified version of the computer code used in Concorde for the distance calculation.


StarHorse points full view
StarHorse points zoom 1
StarHorse points zoom 2

MP4 Video

The video zooms out through the points, giving you a feeling for the size of the data set.


The point set is rendered with the three.js JavaScript 3D library.