The goal here is to predict whether the Falcon 9 first stage will achieve a successful landing. SpaceX markets Falcon 9 rocket launches at a cost of 62 million USD, significantly lower than the 165 million USD or more charged by other providers. This cost advantage largely stems from SpaceX's ability to reuse the first stage of the rocket. By accurately predicting the likelihood of a successful first-stage landing, it becomes possible to estimate the overall cost of a launch. Such insights could be valuable for competing companies seeking to bid against SpaceX for rocket launch contracts.
The following cell will download launch data for SpaceX's Falcon 9, covering the period from 2010 to 2020.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data_raw = pd.read_csv("https://jonghank.github.io/ase3001/files/falcon9_launch_data_raw.csv")
data_raw
FlightNumber | Date | BoosterVersion | PayloadMass | Orbit | LaunchSite | Outcome | Flights | GridFins | Reused | Legs | LandingPad | Block | ReusedCount | Serial | Longitude | Latitude | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2010-06-04 | Falcon 9 | 6104.959412 | LEO | CCAFS SLC 40 | None None | 1 | False | False | False | NaN | 1.0 | 0 | B0003 | -80.577366 | 28.561857 | 0 |
1 | 2 | 2012-05-22 | Falcon 9 | 525.000000 | LEO | CCAFS SLC 40 | None None | 1 | False | False | False | NaN | 1.0 | 0 | B0005 | -80.577366 | 28.561857 | 0 |
2 | 3 | 2013-03-01 | Falcon 9 | 677.000000 | ISS | CCAFS SLC 40 | None None | 1 | False | False | False | NaN | 1.0 | 0 | B0007 | -80.577366 | 28.561857 | 0 |
3 | 4 | 2013-09-29 | Falcon 9 | 500.000000 | PO | VAFB SLC 4E | False Ocean | 1 | False | False | False | NaN | 1.0 | 0 | B1003 | -120.610829 | 34.632093 | 0 |
4 | 5 | 2013-12-03 | Falcon 9 | 3170.000000 | GTO | CCAFS SLC 40 | None None | 1 | False | False | False | NaN | 1.0 | 0 | B1004 | -80.577366 | 28.561857 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
85 | 86 | 2020-09-03 | Falcon 9 | 15400.000000 | VLEO | KSC LC 39A | True ASDS | 2 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 2 | B1060 | -80.603956 | 28.608058 | 1 |
86 | 87 | 2020-10-06 | Falcon 9 | 15400.000000 | VLEO | KSC LC 39A | True ASDS | 3 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 2 | B1058 | -80.603956 | 28.608058 | 1 |
87 | 88 | 2020-10-18 | Falcon 9 | 15400.000000 | VLEO | KSC LC 39A | True ASDS | 6 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 5 | B1051 | -80.603956 | 28.608058 | 1 |
88 | 89 | 2020-10-24 | Falcon 9 | 15400.000000 | VLEO | CCAFS SLC 40 | True ASDS | 3 | True | True | True | 5e9e3033383ecbb9e534e7cc | 5.0 | 2 | B1060 | -80.577366 | 28.561857 | 1 |
89 | 90 | 2020-11-05 | Falcon 9 | 3681.000000 | MEO | CCAFS SLC 40 | True ASDS | 1 | True | False | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 0 | B1062 | -80.577366 | 28.561857 | 1 |
90 rows × 18 columns
data_raw['LaunchSite'].value_counts()
count | |
---|---|
LaunchSite | |
CCAFS SLC 40 | 55 |
KSC LC 39A | 22 |
VAFB SLC 4E | 13 |
Each launch aims to an dedicated orbit, and here are some common orbit types:
LEO: Low Earth orbit (LEO)is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi) or less (approximately one-third of the radius of Earth), or with at least 11.25 periods per day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25. Most of the manmade objects in outer space are in LEO.
VLEO: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below 450 km. Operating in these orbits can provide a number of benefits to Earth observation spacecraft as the spacecraft operates closer to the observation.
GTO: A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a valuable spot for monitoring weather, communications and surveillance. Because the satellite orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory website.
SSO (or SO): It is a Sun-synchronous orbit also called a heliosynchronous orbit is a nearly polar orbit around a planet, in which the satellite passes over any given point of the planet's surface at the same local mean solar time.
ES-L1: At the Lagrange points the gravitational forces of the two large bodies cancel out in such a way that a small object placed in orbit there is in equilibrium relative to the center of mass of the large bodies. L1 is one such point between the sun and the earth.
HEO: A highly elliptical orbit, is an elliptic orbit with high eccentricity, usually referring to one around Earth.
ISS: A modular space station (habitable artificial satellite) in low Earth orbit. It is a multinational collaborative project between five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA (Japan), ESA (Europe), and CSA (Canada).
MEO: Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers (12,830 mi), with an orbital period of 12 hours.
HEO: Geocentric orbits above the altitude of geosynchronous orbit (35,786 km or 22,236 mi).
GEO: It is a circular geosynchronous orbit 35,786 kilometres (22,236 miles) above Earth's equator and following the direction of Earth's rotation.
PO It is one type of satellites in which a satellite passes above or nearly above both poles of the body being orbited (usually a planet such as the Earth).
data_raw['Orbit'].value_counts()
count | |
---|---|
Orbit | |
GTO | 27 |
ISS | 21 |
VLEO | 14 |
PO | 9 |
LEO | 7 |
SSO | 5 |
MEO | 3 |
ES-L1 | 1 |
HEO | 1 |
SO | 1 |
GEO | 1 |
In the dataset, there are multiple scenarios where the booster failed to land successfully. These outcomes are categorized as follows:
These distinctions provide detailed insight into the landing outcomes and help analyze the performance of SpaceX's landing strategies.
landing_outcomes = data_raw['Outcome'].value_counts()
landing_outcomes
count | |
---|---|
Outcome | |
True ASDS | 41 |
None None | 19 |
True RTLS | 14 |
False ASDS | 6 |
True Ocean | 5 |
False Ocean | 2 |
None ASDS | 2 |
False RTLS | 1 |
Among the landing outcomes listed, the following are considered successful soft landings:
These outcomes indicate that the booster landed safely and could potentially be recovered or reused, contributing to the cost-efficiency of SpaceX's operations.
The final column, Class, serves as the classification variable that indicates the outcome of each launch. A value of 0 means the first stage did not land successfully, and a value of 1 means the first stage landed successfully. This classification provides a binary outcome for evaluating and predicting the success of the first-stage landings.
sns.catplot(y="PayloadMass", x="FlightNumber", hue="Class", \
data=data_raw, aspect = 5)
plt.xlabel("Flight Number",fontsize=20)
plt.ylabel("Pay load Mass (kg)",fontsize=20)
plt.show()
sns.catplot(y='LaunchSite', x='FlightNumber', hue='Class', \
data=data_raw, aspect=5)
plt.xlabel("Flight Number",fontsize=20)
plt.ylabel("LaunchSite",fontsize=20)
plt.show()
sns.catplot(y='LaunchSite', x='PayloadMass', hue='Class', \
data=data_raw, aspect=5)
plt.xlabel("Pauload Mass(Kg)",fontsize=20)
plt.ylabel("LaunchSite",fontsize=20)
plt.show()
df_success=data_raw.groupby('Orbit')['Class'].mean()*100
df_success.plot(kind='bar', figsize=(10,6))
plt.xlabel('Orbit')
plt.ylabel('Success Rate')
plt.title('Relationship between Success Rate and Orbit')
plt.show()
sns.catplot(y='Orbit', x='FlightNumber', hue='Class', \
data=data_raw, aspect=5)
plt.xlabel("Flight Number",fontsize=20)
plt.ylabel("Orbit",fontsize=20)
plt.show()
sns.catplot(y='Orbit', x='PayloadMass', hue='Class', \
data=data_raw, aspect=5)
plt.xlabel("Pauload Mass(Kg)",fontsize=20)
plt.ylabel("Orbit",fontsize=20)
plt.show()
data_raw['Date'] = data_raw['Date'].astype(str)
year=[]
def Extract_year(df):
for i in data_raw["Date"]:
year.append(i.split("-")[0])
return year
Extract_year(data_raw)
data_raw['Year']=year
fig,ax=plt.subplots()
df_success1=data_raw.groupby('Year')['Class'].mean()*100
df_success1.plot(kind='line', figsize=(10,6))
plt.xlabel('Year')
plt.ylabel('Success Rate')
plt.title('Relationship between Success Rate and Year')
plt.grid()
plt.show()
We selected the following features from the dataset to be used for predicting the success of the launch:
FlightNumber
: A unique identifier for each launch.PayloadMass
: The mass of the payload (in kilograms) being carried by the rocket.Orbit
: The target orbit for the payload, useful for understanding mission objectives.LaunchSite
: The location from which the rocket was launched.Flights
: The total number of flights completed by the rocket booster.GridFins
: Whether grid fins were used during the flight (aerodynamic surfaces for control).Reused
: Whether the rocket booster had been reused in a prior mission.Legs
: Whether the rocket was equipped with landing legs for recovery.LandingPad
: The designated landing site for the booster.Block
: The version of the Falcon 9 rocket used for the launch.ReusedCount
: The number of times the booster has been reused.Serial
: The unique serial number of the rocket booster.These features encompass both numerical and categorical data, providing a comprehensive input set for predicting the success of Falcon 9 first-stage landings.
features = data_raw[['FlightNumber', 'PayloadMass', 'Orbit', \
'LaunchSite', 'Flights', 'GridFins', \
'Reused', 'Legs', 'LandingPad', 'Block', \
'ReusedCount', 'Serial']]
features
FlightNumber | PayloadMass | Orbit | LaunchSite | Flights | GridFins | Reused | Legs | LandingPad | Block | ReusedCount | Serial | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 6104.959412 | LEO | CCAFS SLC 40 | 1 | False | False | False | NaN | 1.0 | 0 | B0003 |
1 | 2 | 525.000000 | LEO | CCAFS SLC 40 | 1 | False | False | False | NaN | 1.0 | 0 | B0005 |
2 | 3 | 677.000000 | ISS | CCAFS SLC 40 | 1 | False | False | False | NaN | 1.0 | 0 | B0007 |
3 | 4 | 500.000000 | PO | VAFB SLC 4E | 1 | False | False | False | NaN | 1.0 | 0 | B1003 |
4 | 5 | 3170.000000 | GTO | CCAFS SLC 40 | 1 | False | False | False | NaN | 1.0 | 0 | B1004 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
85 | 86 | 15400.000000 | VLEO | KSC LC 39A | 2 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 2 | B1060 |
86 | 87 | 15400.000000 | VLEO | KSC LC 39A | 3 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 2 | B1058 |
87 | 88 | 15400.000000 | VLEO | KSC LC 39A | 6 | True | True | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 5 | B1051 |
88 | 89 | 15400.000000 | VLEO | CCAFS SLC 40 | 3 | True | True | True | 5e9e3033383ecbb9e534e7cc | 5.0 | 2 | B1060 |
89 | 90 | 3681.000000 | MEO | CCAFS SLC 40 | 1 | True | False | True | 5e9e3032383ecb6bb234e7ca | 5.0 | 0 | B1062 |
90 rows × 12 columns
In addition, one-hot encoding was applied to transform the categorical features: Orbits
, LaunchSite
, LandingPad
, and Serial
. This encoding ensures that these features are represented as numerical embeddings suitable for machine learning models.
You can download the dataframe containing the engineered features using the cell provided below. This dataframe includes the original data along with the one-hot encoded representations of the specified categorical variables.
data = pd.read_csv("https://jonghank.github.io/ase3001/files/falcon9_dataset.csv")
data
FlightNumber | PayloadMass | Flights | Block | ReusedCount | Orbit_ES-L1 | Orbit_GEO | Orbit_GTO | Orbit_HEO | Orbit_ISS | ... | Serial_B1059 | Serial_B1060 | Serial_B1062 | GridFins_False | GridFins_True | Reused_False | Reused_True | Legs_False | Legs_True | Landing_Successful | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 6104.959412 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
1 | 2 | 525.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
2 | 3 | 677.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
3 | 4 | 500.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
4 | 5 | 3170.000000 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
85 | 86 | 15400.000000 | 2 | 5 | 2 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
86 | 87 | 15400.000000 | 3 | 5 | 2 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
87 | 88 | 15400.000000 | 6 | 5 | 5 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
88 | 89 | 15400.000000 | 3 | 5 | 2 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
89 | 90 | 3681.000000 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
90 rows × 84 columns
X = data.iloc[:,:-1].to_numpy()
y = data.iloc[:,-1].to_numpy()*2 - 1
N = len(y)
rand_ind = list(range(N))
np.random.seed(3001)
np.random.shuffle(rand_ind)
N_train = int(N*0.8)
X_train = X[rand_ind[:N_train]]
y_train = y[rand_ind[:N_train]]
X_test = X[rand_ind[N_train:]]
y_test = y[rand_ind[N_train:]]
(Problem 1) Develop a Support Vector Machine (SVM) classifier using the training dataset. Select an appropriate regularization parameter ($\lambda$) to balance the trade-off between margin maximization and misclassification, ensuring the model achieves an acceptable test error. Evaluate the model by reporting the confusion matrix and calculating the misclassification error.
def solve_svm(X, y, lam=1):
import cvxpy as cp
w = cp.Variable(X.shape[1])
b = cp.Variable()
obj = cp.sum(cp.pos(1-cp.multiply(y,(X@w-b))))
reg = cp.sum_squares(w)
problem = cp.Problem(cp.Minimize(obj + lam*reg))
problem.solve(solver=cp.CLARABEL)
return w.value, b.value, obj.value, reg.value
# your code here
(Problem 2) Develop a Logistic Regression classifier using the training dataset. Select an appropriate regularization parameter ($\lambda$) to balance sensitivity and misclassification, ensuring the model achieves an acceptable test error. Evaluate the model by reporting the confusion matrix and calculating the misclassification error.
def solve_logistic_regression(X, y, lam=1):
import cvxpy as cp
w = cp.Variable(X.shape[1])
b = cp.Variable()
obj = cp.sum(cp.logistic(-cp.multiply(y,(X@w-b))))
reg = cp.sum_squares(w)
problem = cp.Problem(cp.Minimize(obj + lam*reg))
problem.solve(solver=cp.CLARABEL)
return w.value, b.value, obj.value, reg.value
# your code here