Minimal spanning tree based data configurations – construction and distributions

 

Adam Rahman & Wayne Oldford

 

Interesting low dimensional data structure can be determined from the geometric configuration of minimal spanning trees of projected data, as proposed by the ``scagnostic’’ measures of Wilkinson and Wills (2008).   We explore the `null’ distributions of these measures and, for non-null comparisons, propose and illustrate an algorithm that generates data configurations embedded in 2d having a given set of minimal spanning tree distances (and optionally constrained to exhibit other visually interesting structure).  These configurations are solutions of a more general constrained Euclidean Distance Matrix Completion Problem.  Applications and the problem of embedding in a higher dimension will also be discussed.

 

Building off the work of Wilkinson & Wills in the field of scagnostics, we pay particular attention to those scagnostics that are a product of the Minimum Spanning Tree of the point scatter. After a brief exploration of the distributions of these scagnostics, we propose an important modification to the skewed scagnostic, and use this as a springboard to explore the more general problem of Euclidean Distance Matrix Completion Problems. We propose a constructive visualisation method to solve the EDMCP in two dimensions, when the only specified distances are that of the Minimum Spanning Tree.  Utilizing this method, we are able to obtain many different completions of the EDM for comparison, as well as create structure in the point scatter based on certain scagnostic values.  We then extend this method to higher dimensions, and explore the added complexities of doing so.