Slice of Gaia DR1 tour


To begin the data extraction, we downloaded the tgas_source data files at the Gaia Archive. The fields in these files are specified in Section 7.5 of the Gaia DR1 documentation. Of direct interest to us are Fields 5 and 7, labeled "Right ascension" and "Declination". These are described as follows in Section 1.1 of the documentation.

ra: Right ascension (double, Angle[deg])
Barycentric right ascension α of the source in ICRS at the reference epoch ref_epoch

dec: Declination (double, Angle[deg])
Barycentric declination δ of the source in ICRS at the reference epoch ref_epoch

The two values place the star at a point on a sphere centered at Earth, which is just what we need. (Clear explanations are given on the Right ascension and Declination Wikipedia pages.)

Now for the distance estimates. In their research paper Estimating distances from parallaxes. III. Distances of two million stars in the Gaia DR1 catalogue, Astraatmadija and Bailer-Jones write the following.

"We validate our distance estimates using more precise distances for Cepheid stars in TGAS taken from Groenewegen (2013). We found that for distances closer than 2000pc, the Milky Way prior performs better than the exponentially decreasing space density prior. Beyond 2000pc, the Milky Way prior performs worse for this sample (which are intrinsically bright and distant stars) because it assumes that stars are more likely to be closer in the disc than further away. Our exponentially decreasing space density prior has a longer scale length and thus performs better on this sample when faced with the same poor measurements. But overall the Milky Way prior performs better."

We therefore selected the estimates obtained with the Milky Way prior. We grabbed these from the Astraatmadija and Bailer-Jones site, downloading the raw data file tgas_dist_all_v01.csv.gz. Their README directs us to the 21st field (bytes 302-315), labeled "r50[pc]" with the description "50th percentile (i.e. the median) of the posterior, using Milky Way Prior". This is the value we used to place each star in 3-dimensional Euclidean space.

We don't make any claims about the data set, other than it gives an exciting large-scale challenge for the traveling salesman problem.