Author: Yury Kashnitsky. This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose.
Fill in the missing code ("You code here"). No need to select answers in a webform.
Competition Kaggle "Titanic: Machine Learning from Disaster".
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
import matplotlib.pyplot as plt
Read data
train_df = pd.read_csv("../../data/titanic_train.csv", index_col="PassengerId")
train_df.head(2)
train_df.describe(include="all")
train_df.info()
Let's dropCabin
, and then – all rows with missing values.
train_df = train_df.drop("Cabin", axis=1).dropna()
train_df.shape
1. Build a picture to visualize all scatter plots for each pair of features Age
, Fare
, SibSp
, Parch
and Survived
. ( scatter_matrix
from Pandas or pairplot
from Seaborn)
# You code here
2. How does ticket price (Fare
) depend on Pclass
? Build a boxplot.
# You code here
3. Let's build the same plot but restricting values of Fare
to be less than 95% quantile of the initial vector (to drop outliers that make the plot less clear).
# You code here
4. How is the percentage of surviving passengers dependent on passengers' gender? Depict it with Seaborn.countplot
using the hue
argument.
# You code here
5. How does the distribution of ticket prices differ for those who survived and those who didn't. Depict it with Seaborn.boxplot
# You code here
6. How does survival depend on passengers' age? Verify (graphically) an assumption that youngsters (< 30 y.o.) survived more frequently than old people (> 55 y.o.).
# You code here