This section is not about feature creation (for an ML algorithm), but to enrich the asteroid dataframe with more, additional information.
# Import standard libraries
import os
import pathlib
# Import installed libraries
import pandas as pd
# Let's mount the Google Drive, where we store files and models (if applicable, otherwise work
# locally)
try:
from google.colab import drive
drive.mount('/gdrive')
core_path = "/gdrive/MyDrive/Colab/asteroid_taxonomy/"
except ModuleNotFoundError:
core_path = ""
# Read the level 1 dataframe
asteroids_df = pd.read_pickle(os.path.join(core_path, "data/lvl1/", "asteroids_merged.pkl"))
A great summary of asteroid classification schemas, the science behind it and some historical context can be found here. One flow chart shows the link between miscellaneous classification schemas. On the right side the flow chart merges into a general "main group". These groups are:
# Create a dictionary that maps the Bus Classification with the main group
bus_to_main_dict = {
'A': 'Other',
'B': 'C',
'C': 'C',
'Cb': 'C',
'Cg': 'C',
'Cgh': 'C',
'Ch': 'C',
'D': 'Other',
'K': 'Other',
'L': 'Other',
'Ld': 'Other',
'O': 'Other',
'R': 'Other',
'S': 'S',
'Sa': 'S',
'Sk': 'S',
'Sl': 'S',
'Sq': 'S',
'Sr': 'S',
'T': 'Other',
'V': 'Other',
'X': 'X',
'Xc': 'X',
'Xe': 'X',
'Xk': 'X'
}
# Create a new "main group class"
asteroids_df.loc[:, "Main_Group"] = asteroids_df["Bus_Class"].apply(lambda x:
bus_to_main_dict.get(x, "None"))
# Remove the file path and Designation Number
asteroids_df.drop(columns=["DesNr", "FilePath"], inplace=True)
# Show the final data set for anyone who is interested ...
asteroids_df
Name | Bus_Class | SpectrumDF | Main_Group | |
---|---|---|---|---|
0 | 1 Ceres | C | Wavelength_in_microm Reflectance_norm550n... | C |
1 | 2 Pallas | B | Wavelength_in_microm Reflectance_norm550n... | C |
2 | 3 Juno | Sk | Wavelength_in_microm Reflectance_norm550n... | S |
3 | 4 Vesta | V | Wavelength_in_microm Reflectance_norm550n... | Other |
4 | 5 Astraea | S | Wavelength_in_microm Reflectance_norm550n... | S |
... | ... | ... | ... | ... |
1334 | 1996 UK | Sq | Wavelength_in_microm Reflectance_norm550n... | S |
1335 | 1996 VC | S | Wavelength_in_microm Reflectance_norm550n... | S |
1336 | 1997 CZ5 | S | Wavelength_in_microm Reflectance_norm550n... | S |
1337 | 1997 RD1 | Sq | Wavelength_in_microm Reflectance_norm550n... | S |
1338 | 1998 WS | Sr | Wavelength_in_microm Reflectance_norm550n... | S |
1339 rows × 4 columns
# ... and also the spectrum of Ceres
asteroids_df.loc[asteroids_df["Name"] == "1 Ceres"]["SpectrumDF"][0]
Wavelength_in_microm | Reflectance_norm550nm | |
---|---|---|
0 | 0.44 | 0.9281 |
1 | 0.45 | 0.9388 |
2 | 0.46 | 0.9488 |
3 | 0.47 | 0.9572 |
4 | 0.48 | 0.9643 |
5 | 0.49 | 0.9716 |
6 | 0.50 | 0.9788 |
7 | 0.51 | 0.9859 |
8 | 0.52 | 0.9923 |
9 | 0.53 | 0.9955 |
10 | 0.54 | 0.9969 |
11 | 0.55 | 1.0000 |
12 | 0.56 | 1.0040 |
13 | 0.57 | 1.0056 |
14 | 0.58 | 1.0037 |
15 | 0.59 | 1.0036 |
16 | 0.60 | 1.0044 |
17 | 0.61 | 1.0071 |
18 | 0.62 | 1.0107 |
19 | 0.63 | 1.0113 |
20 | 0.64 | 1.0117 |
21 | 0.65 | 1.0127 |
22 | 0.66 | 1.0128 |
23 | 0.67 | 1.0124 |
24 | 0.68 | 1.0151 |
25 | 0.69 | 1.0160 |
26 | 0.70 | 1.0146 |
27 | 0.71 | 1.0178 |
28 | 0.72 | 1.0222 |
29 | 0.73 | 1.0216 |
30 | 0.74 | 1.0191 |
31 | 0.75 | 1.0179 |
32 | 0.76 | 1.0167 |
33 | 0.77 | 1.0149 |
34 | 0.78 | 1.0161 |
35 | 0.79 | 1.0176 |
36 | 0.80 | 1.0178 |
37 | 0.81 | 1.0196 |
38 | 0.82 | 1.0200 |
39 | 0.83 | 1.0164 |
40 | 0.84 | 1.0135 |
41 | 0.85 | 1.0140 |
42 | 0.86 | 1.0147 |
43 | 0.87 | 1.0151 |
44 | 0.88 | 1.0142 |
45 | 0.89 | 1.0146 |
46 | 0.90 | 1.0165 |
47 | 0.91 | 1.0181 |
48 | 0.92 | 1.0200 |
# Create Level 2 directory and save the dataframe
pathlib.Path(os.path.join(core_path, "data/lvl2")).mkdir(parents=True, exist_ok=True)
# Save the dataframe as a pickle file
asteroids_df.to_pickle(os.path.join(core_path, "data/lvl2/", "asteroids.pkl"), protocol=4)