Minimal spanning tree based data configurations – construction and distributions
Adam Rahman & Wayne
Oldford
Interesting low
dimensional data structure can be determined from the geometric configuration
of minimal spanning trees of projected data, as proposed by the ``scagnostic’’ measures of Wilkinson and Wills (2008). We explore the `null’ distributions of these
measures and, for non-null comparisons, propose and illustrate an algorithm
that generates data configurations embedded in 2d having a given set of minimal
spanning tree distances (and optionally constrained to exhibit other visually interesting
structure). These configurations are
solutions of a more general constrained Euclidean Distance Matrix Completion
Problem. Applications and the problem of
embedding in a higher dimension will also be discussed.
Building off the work of Wilkinson & Wills in the field of scagnostics, we pay particular attention to those scagnostics that are a product of the Minimum Spanning Tree of the point scatter. After a brief exploration of the distributions of these scagnostics, we propose an important modification to the skewed scagnostic, and use this as a springboard to explore the more general problem of Euclidean Distance Matrix Completion Problems. We propose a constructive visualisation method to solve the EDMCP in two dimensions, when the only specified distances are that of the Minimum Spanning Tree. Utilizing this method, we are able to obtain many different completions of the EDM for comparison, as well as create structure in the point scatter based on certain scagnostic values. We then extend this method to higher dimensions, and explore the added complexities of doing so.