This notebook contains the text, figures, and source code for my Scientific American blog post.
Note that I have hidden much of the code for this post at the end of the notebook. To execute the code this notebook, please start by running the functions defined at the end of the text, then return to the beginning.
The content of this notebook is BSD licensed, ©2014, Jake VanderPlas
One of the largest treasure troves of astronomical data comes from the Sloan Digital Sky Survey (SDSS), an ongoing scan of the firmament that began 15 years ago. Its catalogue covers 35 percent of the sky and contains multicolor observations of hundreds of millions of distinct galaxies, stars and quasars. If a person were to attempt to individually inspect each of these objects at a rate of one per second through the workday, it would be a full-time job lasting over 60 years!
Fortunately, such individual inspection is not how astronomers work. Instead, we use various specialized algorithms to automatically sift through and categorize this vast data set and dream up novel visualization schemes to make clear in a glance the relationships between thousands or millions of individual objects.
One of my favorite examples of this type of visualization involves a subset of the above data, the SDSS Moving Object Catalog, which my colleague Zeljko Ivezic has been instrumental in producing and collating. This catalogue contains detailed data on nearly half a million small asteroids orbiting our sun, allowing us not only to track the orbital path of individual asteroids but also to gain insight into the chemical composition and formation history of individual objects and the solar system as a whole. The following video simulation from Alex Parker gives a glimpse into the orbital characteristics of this data:
from IPython.display import YouTubeVideo YouTubeVideo('Xo7A-tKITlo')
The SDSS data gives us much more than just the orbital dynamics. Multiband imaging gives us detailed measurements of the color of reflected sunlight off each of these asteroids. Just as on Earth our eyes can distinguish white quartz from dark basalt based on how they each reflect sunlight, the SDSS telescope can distinguish among different chemical compositions of asteroids based on how their surfaces reflect sunlight.
We can summarize this chemical information with two "color" measurements: color in the optical range and color in the near-infrared range. Combining this with the semimajor axis (a measure of the size of the orbit around the sun) and the inclination (a measure of how "tilted" the orbit is compared with Earth’s orbital plane) gives us a four-dimensional data set: four properties of each asteroid that contain information about its orbit and chemical composition.
With this four-dimensional data set decided, we can now think about how to best visualize it. One-dimensional data fits on a number-line; two-dimensional data can be plotted on a flat page or screen; three-dimensional data can be conceived as a 3-D plot, perhaps rotating on a computer screen; but how do you effectively plot four-dimensional data?
We can start simple, by splitting the data into chemical indicators on one hand and orbital indicators on the other:
This visualization is full of information. The left panel is known as a color–color diagram and distinguishes between broad classes of asteroid chemistries. The left-most clump in this panel is primarily carbonaceous (C-type) asteroids whereas the right-most clump is primarily silicaceous (S-type) asteroids. The faint downward extension of the right-most clump is V-type asteroids, known to be associated with the asteroid Vesta.
The right panel showing the orbital characteristics offers even more insight. Immediately we see that there is some intriguing structure to the data: clumps and clusters as well as specific regions that are devoid of any asteroids at all. These clumps are known as asteroid families, and the vertical voids near 2.5 astronomical units (AU) and 2.8 AU are areas of orbital resonance with Jupiter: In this particular region of the solar system, these resonance effects quickly kick any asteroids out of their orbits. (An astronomical unit is the mean Earth–sun distance, or about 149.6 million kilometers.)
These are all interesting insights, but what we'd really like is to see intuitively how the chemistry reflected in the left panel is related to the orbital dynamics reflected in the right panel. Such a four-dimensional relationship is very difficult to capture.
One common way to visualize high-dimensional data is to use a grid of multiple two-dimensional plots. In this way we can plot each pair of features against one another and look at the correlations. Of course, the two panels from above are just a subset of the six distinct plots (each with a mirror-image) created by this method:
This plot conveys a lot of information, and there are some intriguing pieces. For example, in the panel comparing near-infrared color to orbital inclination (top row, second from the left) we see a distinct clump of data: These are points that are clustered both in color and inclination. Further investigation shows this clump reflects the Vesta family, a chemically similar group of asteroids that also shares the same orbital inclination. We'll return to these below.
Another common high-dimensional visualization technique is to treat color as an added dimension. This way a standard two-dimensional plot can reflect three-dimensions of information. Let's try visualizing the four dimensions in this way: We'll do two versions of the orbital inclination plot, using a different color scale in each plot: