Slice of Gaia DR2 tour segment

gaia1331906450 data

Many thanks to the ESA Gaia mission and to the Max Planck Institute for Astronomy team for making their Gaia DR2 positioning estimates readily available. You can find links to the information on the "Estimating distances from parallaxes. IV. Distances to 1.33 billion stars in the Gaia data release 2" page, run by Coryn Bailer-Jones.

To build the Gaia DR2 TSP instance, we downloaded from the Bailer-Jones site their entire catalogue, extracting the source_id and rest point distance estimate for each of the 1,331,909,727 entries. Their research paper provides the following description.

"Here we infer distances to essentially all 1.33 billion stars with parallaxes published in the second Gaia data release. This is done using a weak distance prior that varies smoothly as a function of Galactic longitude and latitude according to a Galaxy model. The irreducible uncertainty in the distance estimate is characterized by the lower and upper bounds of an asymmetric confidence interval. Although more precise distances can be estimated for a subset of the stars using additional data (such as photometry), our goal is to provide purely geometric distance estimates, independent of assumptions about the physical properties of, or interstellar extinction towards, individual stars."

The "essentially all" phrase refers to the fact that for 3,278 of the stars rest is labeled as NaN (not-a-number); they are included in the data base for completeness, but obviously cannot be part of the TSP instance. Excluding these, we are left with a whopping 1,331,906,450 points, including an entry for the Sun.

Using the source_id for each star, we paired the distance estimates with Gaia DR2 positioning data, described in Section 14.1.1 of the DR2 documentation.

RA: Right ascension (double, Angle[deg])
Barycentric right ascension α of the source in ICRS at the reference epoch ref_epoch

DEC: Declination (double, Angle[deg])
Barycentric declination δ of the source in ICRS at the reference epoch ref_epoch

The two values place the star at a point on a sphere centered at Earth. (Clear explanations are given on the Right ascension and Declination Wikipedia pages.) To obtain xyz coordinates, move on a straight line, starting at Earth, in the direction of the star's spherical location and stop when you reach the Bailer-Jones et al. distance estimate.

If you are replicating the Gaia DR2 TSP data, as a check here is a file with the xyz coordinates for the first 1,000 points, each with the corresponding DR2 source_id. Note that we scaled the coordinates to put them in units of 1/10th parsecs, due to the TSPLIB rounding, described below.

Keep in mind that the 3D positions of the stars are rough estimates. For example, the data set has 21 stars with estimated distance less than 1 parsec from Earth, even though our nearest neighbor, Proxima Centauri, is out there at 1.3 parsecs. Concerning this, Bailer-Jones et al. write "The smallest values of rest are almost certainly the result of spurious parallaxes." So if you are planning to go star hopping, better first contact The Culture for an up-to-date 3D map. But for the TSP, the data set gives an exciting optimization challenge at an almost unimaginable scale.

Magellanic Clouds

The Gaia DR2 point set contains a number of conical structures that seem unlikely to be real features of the galaxy. You can see several of these in the images displayed below.

In the following quote from Section 3.4 of the Bailer-Jones et al. paper, the authors explain that these points arise from observing stars in the Large and Small Magellanic Clouds, two dwarf galaxies that are actually much further away than any of the point estimates given by their mathematical model.

"Yet we clearly see features that are not present in the prior map, such as the Large and Small Magellanic Clouds. Our mean distances to these satellite galaxies are of course underestimated, because the stars have very large fractional parallax uncertainties, so our estimates are prior-dominated. The largest length scales in the prior model are only about 2.6 kp -- corresponding to prior modes of 5.2 kpc -- whereas these galaxies are in fact about 50-60 kpc away. The same problem applies to distant giants, because the prior will normally be dominated by the nearer dwarfs in the model."

This might cause confusion for a future star traveler, but, for the TSP challenge, it adds extra spice to the point set.

Computing Distances

To create an instance of the TSP, we need to specify precisely the point-to-point distances we use. For this we adopt the standard TSPLIB norm for 3D Euclidean data. This norm takes the straight-line distance between two points and rounds the resulting value to the nearest integer. In our case, the star-to-star distance is therefore measured to the nearest 1/10th parsec. Here is a simplified version of the computer code used in Concorde for the distance calculation.

Snapshots of the Point Set

Gaia DR2 points
Gaia DR2 points zoom
Gaia DR2 points zoom 2


The point set is rendered with the three.js JavaScript 3D library.