This mini project or whatever you call it as was done with the help of the website down below, which helped me in understanding, analysing the data by scraping the data from the internet.
Getting started with Data Analysis with Python Pandas
More importantly the "Titanic" dataset was retrived from Kaggle and the link can be found below:
The following two lines of code must be entered in order to upload the file from the local drive, because google colab stores everything in your drive. So make sure that you that you read the article down below before you begin typing the code. Moreover, I dont want you guys to stuck and watch the screen when you get an error.
from google.colab import files
uploaded = files.upload()
Saving train.csv to train.csv
After entering the above two lines of code, wait till you get the 100% uploaded confirmation. Finally once you get that, please enter the two more lines (I really don't know what the two line does)
import io
df2 = pd.read_csv(io.BytesIO(uploaded['train.csv']))
import pandas as pd
import csv
import matplotlib.pyplot as plt
df = pd.read_csv("train.csv")
df
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C |
10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.7000 | G6 | S |
11 | 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58.0 | 0 | 0 | 113783 | 26.5500 | C103 | S |
12 | 13 | 0 | 3 | Saundercock, Mr. William Henry | male | 20.0 | 0 | 0 | A/5. 2151 | 8.0500 | NaN | S |
13 | 14 | 0 | 3 | Andersson, Mr. Anders Johan | male | 39.0 | 1 | 5 | 347082 | 31.2750 | NaN | S |
14 | 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | female | 14.0 | 0 | 0 | 350406 | 7.8542 | NaN | S |
15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S |
16 | 17 | 0 | 3 | Rice, Master. Eugene | male | 2.0 | 4 | 1 | 382652 | 29.1250 | NaN | Q |
17 | 18 | 1 | 2 | Williams, Mr. Charles Eugene | male | NaN | 0 | 0 | 244373 | 13.0000 | NaN | S |
18 | 19 | 0 | 3 | Vander Planke, Mrs. Julius (Emelia Maria Vande... | female | 31.0 | 1 | 0 | 345763 | 18.0000 | NaN | S |
19 | 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | NaN | 0 | 0 | 2649 | 7.2250 | NaN | C |
20 | 21 | 0 | 2 | Fynney, Mr. Joseph J | male | 35.0 | 0 | 0 | 239865 | 26.0000 | NaN | S |
21 | 22 | 1 | 2 | Beesley, Mr. Lawrence | male | 34.0 | 0 | 0 | 248698 | 13.0000 | D56 | S |
22 | 23 | 1 | 3 | McGowan, Miss. Anna "Annie" | female | 15.0 | 0 | 0 | 330923 | 8.0292 | NaN | Q |
23 | 24 | 1 | 1 | Sloper, Mr. William Thompson | male | 28.0 | 0 | 0 | 113788 | 35.5000 | A6 | S |
24 | 25 | 0 | 3 | Palsson, Miss. Torborg Danira | female | 8.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
25 | 26 | 1 | 3 | Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... | female | 38.0 | 1 | 5 | 347077 | 31.3875 | NaN | S |
26 | 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | NaN | 0 | 0 | 2631 | 7.2250 | NaN | C |
27 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S |
28 | 29 | 1 | 3 | O'Dwyer, Miss. Ellen "Nellie" | female | NaN | 0 | 0 | 330959 | 7.8792 | NaN | Q |
29 | 30 | 0 | 3 | Todoroff, Mr. Lalio | male | NaN | 0 | 0 | 349216 | 7.8958 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
861 | 862 | 0 | 2 | Giles, Mr. Frederick Edward | male | 21.0 | 1 | 0 | 28134 | 11.5000 | NaN | S |
862 | 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Ba... | female | 48.0 | 0 | 0 | 17466 | 25.9292 | D17 | S |
863 | 864 | 0 | 3 | Sage, Miss. Dorothy Edith "Dolly" | female | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
864 | 865 | 0 | 2 | Gill, Mr. John William | male | 24.0 | 0 | 0 | 233866 | 13.0000 | NaN | S |
865 | 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42.0 | 0 | 0 | 236852 | 13.0000 | NaN | S |
866 | 867 | 1 | 2 | Duran y More, Miss. Asuncion | female | 27.0 | 1 | 0 | SC/PARIS 2149 | 13.8583 | NaN | C |
867 | 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | male | 31.0 | 0 | 0 | PC 17590 | 50.4958 | A24 | S |
868 | 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | NaN | 0 | 0 | 345777 | 9.5000 | NaN | S |
869 | 870 | 1 | 3 | Johnson, Master. Harold Theodor | male | 4.0 | 1 | 1 | 347742 | 11.1333 | NaN | S |
870 | 871 | 0 | 3 | Balkic, Mr. Cerin | male | 26.0 | 0 | 0 | 349248 | 7.8958 | NaN | S |
871 | 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47.0 | 1 | 1 | 11751 | 52.5542 | D35 | S |
872 | 873 | 0 | 1 | Carlsson, Mr. Frans Olof | male | 33.0 | 0 | 0 | 695 | 5.0000 | B51 B53 B55 | S |
873 | 874 | 0 | 3 | Vander Cruyssen, Mr. Victor | male | 47.0 | 0 | 0 | 345765 | 9.0000 | NaN | S |
874 | 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C |
875 | 876 | 1 | 3 | Najib, Miss. Adele Kiamie "Jane" | female | 15.0 | 0 | 0 | 2667 | 7.2250 | NaN | C |
876 | 877 | 0 | 3 | Gustafsson, Mr. Alfred Ossian | male | 20.0 | 0 | 0 | 7534 | 9.8458 | NaN | S |
877 | 878 | 0 | 3 | Petroff, Mr. Nedelio | male | 19.0 | 0 | 0 | 349212 | 7.8958 | NaN | S |
878 | 879 | 0 | 3 | Laleff, Mr. Kristo | male | NaN | 0 | 0 | 349217 | 7.8958 | NaN | S |
879 | 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | female | 56.0 | 0 | 1 | 11767 | 83.1583 | C50 | C |
880 | 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | NaN | S |
881 | 882 | 0 | 3 | Markun, Mr. Johann | male | 33.0 | 0 | 0 | 349257 | 7.8958 | NaN | S |
882 | 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | NaN | S |
883 | 884 | 0 | 2 | Banfield, Mr. Frederick James | male | 28.0 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | NaN | S |
884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S |
885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
df.head(5) # Used to display top 5
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df.tail(5) # Used to display last 5
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.00 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.00 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.45 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.00 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.75 | NaN | Q |
df = pd.read_csv("train.csv", usecols= ["PassengerId", "Survived", "Pclass", "Name", "Sex","Age"])
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 |
df.describe()
PassengerId | Survived | Pclass | Age | |
---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 |
df.sort_values("Age")
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 |
df = df.sort_values("Age", ascending = False)
df.head(5)
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
630 | 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 |
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 |
493 | 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 |
96 | 97 | 0 | 1 | Goldschmidt, Mr. George B | male | 71.0 |
116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 |
result = df[df['Name'] == 'Svensson, Mr. Johan' ]
result
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 |
df["Sex"].value_counts()
male 577 female 314 Name: Sex, dtype: int64
df.nunique()
PassengerId 891 Survived 2 Pclass 3 Name 891 Sex 2 Age 88 dtype: int64
AND operator
df_age = df["Age"] < 50
df_sex_mask = df["Sex"] == "female"
df[df_age & df_sex_mask]
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
52 | 53 | 1 | 1 | Harper, Mrs. Henry Sleeper (Myna Haxtun) | female | 49.00 |
796 | 797 | 1 | 1 | Leader, Dr. Alice (Farnham) | female | 49.00 |
754 | 755 | 1 | 2 | Herman, Mrs. Samuel (Jane Laver) | female | 48.00 |
556 | 557 | 1 | 1 | Duff Gordon, Lady. (Lucille Christiana Sutherl... | female | 48.00 |
736 | 737 | 0 | 3 | Ford, Mrs. Edward (Margaret Ann Watson) | female | 48.00 |
862 | 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Ba... | female | 48.00 |
132 | 133 | 0 | 3 | Robins, Mrs. Alexander A (Grace Charity Laury) | female | 47.00 |
871 | 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47.00 |
706 | 707 | 1 | 2 | Kelly, Mrs. Florence "Fannie" | female | 45.00 |
276 | 277 | 0 | 3 | Lindblom, Miss. Augusta Charlotta | female | 45.00 |
167 | 168 | 0 | 3 | Skoog, Mrs. William (Anna Bernhardina Karlsson) | female | 45.00 |
362 | 363 | 0 | 3 | Barbara, Mrs. (Catherine David) | female | 45.00 |
856 | 857 | 1 | 1 | Wick, Mrs. George Dennick (Mary Hitchcock) | female | 45.00 |
440 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.00 |
523 | 524 | 1 | 1 | Hippach, Mrs. Louis Albert (Ida Sophia Fischer) | female | 44.00 |
194 | 195 | 1 | 1 | Brown, Mrs. James Joseph (Margaret Tobin) | female | 44.00 |
854 | 855 | 0 | 2 | Carter, Mrs. Ernest Courtenay (Lilian Hughes) | female | 44.00 |
779 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton Mc... | female | 43.00 |
678 | 679 | 0 | 3 | Goodwin, Mrs. Frederick (Augusta Tyler) | female | 43.00 |
432 | 433 | 1 | 2 | Louch, Mrs. Charles Alexander (Alice Adelaide ... | female | 42.00 |
865 | 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42.00 |
380 | 381 | 1 | 1 | Bidois, Miss. Rosalie | female | 42.00 |
272 | 273 | 1 | 2 | Mellinger, Mrs. (Elizabeth Anne Maidment) | female | 41.00 |
337 | 338 | 1 | 1 | Burns, Miss. Elizabeth Margaret | female | 41.00 |
254 | 255 | 0 | 3 | Rosblom, Mrs. Viktor (Helena Wilhelmina) | female | 41.00 |
638 | 639 | 0 | 3 | Panula, Mrs. Juha (Maria Emilia Ojala) | female | 41.00 |
609 | 610 | 1 | 1 | Shutes, Miss. Elizabeth W | female | 40.00 |
319 | 320 | 1 | 1 | Spedden, Mrs. Frederic Oakley (Margaretta Corn... | female | 40.00 |
161 | 162 | 1 | 2 | Watt, Mrs. James (Elizabeth "Bessie" Inglis Mi... | female | 40.00 |
346 | 347 | 1 | 2 | Smith, Miss. Marion Elsie | female | 40.00 |
... | ... | ... | ... | ... | ... | ... |
634 | 635 | 0 | 3 | Skoog, Miss. Mabel | female | 9.00 |
852 | 853 | 0 | 3 | Boulos, Miss. Nourelain | female | 9.00 |
541 | 542 | 0 | 3 | Andersson, Miss. Ingeborg Constanzia | female | 9.00 |
147 | 148 | 0 | 3 | Ford, Miss. Robina Maggie "Ruby" | female | 9.00 |
24 | 25 | 0 | 3 | Palsson, Miss. Torborg Danira | female | 8.00 |
237 | 238 | 1 | 2 | Collyer, Miss. Marjorie "Lottie" | female | 8.00 |
535 | 536 | 1 | 2 | Hart, Miss. Eva Miriam | female | 7.00 |
720 | 721 | 1 | 2 | Harper, Miss. Annie Jessie "Nina" | female | 6.00 |
813 | 814 | 0 | 3 | Andersson, Miss. Ebba Iris Alfrida | female | 6.00 |
777 | 778 | 1 | 3 | Emanuel, Miss. Virginia Ethel | female | 5.00 |
233 | 234 | 1 | 3 | Asplund, Miss. Lillian Gertrud | female | 5.00 |
58 | 59 | 1 | 2 | West, Miss. Constance Mirium | female | 5.00 |
448 | 449 | 1 | 3 | Baclini, Miss. Marie Catherine | female | 5.00 |
691 | 692 | 1 | 3 | Karun, Miss. Manca | female | 4.00 |
10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.00 |
750 | 751 | 1 | 2 | Wells, Miss. Joan | female | 4.00 |
184 | 185 | 1 | 3 | Kink-Heilmann, Miss. Luise Gretchen | female | 4.00 |
618 | 619 | 1 | 2 | Becker, Miss. Marion Louise | female | 4.00 |
43 | 44 | 1 | 2 | Laroche, Miss. Simonne Marie Anne Andree | female | 3.00 |
374 | 375 | 0 | 3 | Palsson, Miss. Stina Viola | female | 3.00 |
642 | 643 | 0 | 3 | Skoog, Miss. Margit Elizabeth | female | 2.00 |
205 | 206 | 0 | 3 | Strom, Miss. Telma Matilda | female | 2.00 |
530 | 531 | 1 | 2 | Quick, Miss. Phyllis May | female | 2.00 |
479 | 480 | 1 | 3 | Hirvonen, Miss. Hildur E | female | 2.00 |
297 | 298 | 0 | 1 | Allison, Miss. Helen Loraine | female | 2.00 |
119 | 120 | 0 | 3 | Andersson, Miss. Ellis Anna Maria | female | 2.00 |
381 | 382 | 1 | 3 | Nakid, Miss. Maria ("Mary") | female | 1.00 |
172 | 173 | 1 | 3 | Johnson, Miss. Eleanor Ileen | female | 1.00 |
644 | 645 | 1 | 3 | Baclini, Miss. Eugenie | female | 0.75 |
469 | 470 | 1 | 3 | Baclini, Miss. Helene Barbara | female | 0.75 |
239 rows × 6 columns
OR operator
df_sex = df["Sex"] == "Male"
df_age_mask = df["Age"] > 70
df[df_sex | df_age_mask]
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
630 | 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 |
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 |
493 | 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 |
96 | 97 | 0 | 1 | Goldschmidt, Mr. George B | male | 71.0 |
116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 |
null_mask = df["Age"].isnull()
df[null_mask]
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN |
17 | 18 | 1 | 2 | Williams, Mr. Charles Eugene | male | NaN |
19 | 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | NaN |
26 | 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | NaN |
28 | 29 | 1 | 3 | O'Dwyer, Miss. Ellen "Nellie" | female | NaN |
29 | 30 | 0 | 3 | Todoroff, Mr. Lalio | male | NaN |
31 | 32 | 1 | 1 | Spencer, Mrs. William Augustus (Marie Eugenie) | female | NaN |
32 | 33 | 1 | 3 | Glynn, Miss. Mary Agatha | female | NaN |
36 | 37 | 1 | 3 | Mamee, Mr. Hanna | male | NaN |
42 | 43 | 0 | 3 | Kraeff, Mr. Theodor | male | NaN |
45 | 46 | 0 | 3 | Rogers, Mr. William John | male | NaN |
46 | 47 | 0 | 3 | Lennon, Mr. Denis | male | NaN |
47 | 48 | 1 | 3 | O'Driscoll, Miss. Bridget | female | NaN |
48 | 49 | 0 | 3 | Samaan, Mr. Youssef | male | NaN |
55 | 56 | 1 | 1 | Woolner, Mr. Hugh | male | NaN |
64 | 65 | 0 | 1 | Stewart, Mr. Albert A | male | NaN |
65 | 66 | 1 | 3 | Moubarek, Master. Gerios | male | NaN |
76 | 77 | 0 | 3 | Staneff, Mr. Ivan | male | NaN |
77 | 78 | 0 | 3 | Moutal, Mr. Rahamin Haim | male | NaN |
82 | 83 | 1 | 3 | McDermott, Miss. Brigdet Delia | female | NaN |
87 | 88 | 0 | 3 | Slocovski, Mr. Selman Francis | male | NaN |
95 | 96 | 0 | 3 | Shorney, Mr. Charles Joseph | male | NaN |
101 | 102 | 0 | 3 | Petroff, Mr. Pastcho ("Pentcho") | male | NaN |
107 | 108 | 1 | 3 | Moss, Mr. Albert Johan | male | NaN |
109 | 110 | 1 | 3 | Moran, Miss. Bertha | female | NaN |
121 | 122 | 0 | 3 | Moore, Mr. Leonard Charles | male | NaN |
126 | 127 | 0 | 3 | McMahon, Mr. Martin | male | NaN |
128 | 129 | 1 | 3 | Peter, Miss. Anna | female | NaN |
140 | 141 | 0 | 3 | Boulos, Mrs. Joseph (Sultana) | female | NaN |
154 | 155 | 0 | 3 | Olsen, Mr. Ole Martin | male | NaN |
... | ... | ... | ... | ... | ... | ... |
718 | 719 | 0 | 3 | McEvoy, Mr. Michael | male | NaN |
727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN |
732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN |
738 | 739 | 0 | 3 | Ivanoff, Mr. Kanio | male | NaN |
739 | 740 | 0 | 3 | Nankoff, Mr. Minko | male | NaN |
740 | 741 | 1 | 1 | Hawksford, Mr. Walter James | male | NaN |
760 | 761 | 0 | 3 | Garfirth, Mr. John | male | NaN |
766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN |
768 | 769 | 0 | 3 | Moran, Mr. Daniel J | male | NaN |
773 | 774 | 0 | 3 | Elias, Mr. Dibo | male | NaN |
776 | 777 | 0 | 3 | Tobin, Mr. Roger | male | NaN |
778 | 779 | 0 | 3 | Kilgannon, Mr. Thomas J | male | NaN |
783 | 784 | 0 | 3 | Johnston, Mr. Andrew G | male | NaN |
790 | 791 | 0 | 3 | Keane, Mr. Andrew "Andy" | male | NaN |
792 | 793 | 0 | 3 | Sage, Miss. Stella Anna | female | NaN |
793 | 794 | 0 | 1 | Hoyt, Mr. William Fisher | male | NaN |
815 | 816 | 0 | 1 | Fry, Mr. Richard | male | NaN |
825 | 826 | 0 | 3 | Flynn, Mr. John | male | NaN |
826 | 827 | 0 | 3 | Lam, Mr. Len | male | NaN |
828 | 829 | 1 | 3 | McCormack, Mr. Thomas Joseph | male | NaN |
832 | 833 | 0 | 3 | Saad, Mr. Amin | male | NaN |
837 | 838 | 0 | 3 | Sirota, Mr. Maurice | male | NaN |
839 | 840 | 1 | 1 | Marechal, Mr. Pierre | male | NaN |
846 | 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN |
849 | 850 | 1 | 1 | Goldenberg, Mrs. Samuel L (Edwiga Grabowska) | female | NaN |
859 | 860 | 0 | 3 | Razi, Mr. Raihed | male | NaN |
863 | 864 | 0 | 3 | Sage, Miss. Dorothy Edith "Dolly" | female | NaN |
868 | 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | NaN |
878 | 879 | 0 | 3 | Laleff, Mr. Kristo | male | NaN |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN |
177 rows × 6 columns
df.isnull().sum()
PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 dtype: int64
df.drop(labels = ["Pclass"], axis=1).head()
PassengerId | Survived | Name | Sex | Age | |
---|---|---|---|---|---|
630 | 631 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 |
851 | 852 | 0 | Svensson, Mr. Johan | male | 74.0 |
493 | 494 | 0 | Artagaveytia, Mr. Ramon | male | 71.0 |
96 | 97 | 0 | Goldschmidt, Mr. George B | male | 71.0 |
116 | 117 | 0 | Connors, Mr. Patrick | male | 70.5 |
Replacing the values by using the replace method
df.replace("Nan",df["Age"].median())
df.replace("Masselmani, Mrs. Fatima", "Tanu")
PassengerId | Survived | Pclass | Name | Sex | Age | |
---|---|---|---|---|---|---|
630 | 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 |
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 |
493 | 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 |
96 | 97 | 0 | 1 | Goldschmidt, Mr. George B | male | 71.0 |
116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 |
672 | 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 |
745 | 746 | 0 | 1 | Crosby, Capt. Edward Gifford | male | 70.0 |
33 | 34 | 0 | 2 | Wheadon, Mr. Edward H | male | 66.0 |
54 | 55 | 0 | 1 | Ostby, Mr. Engelhart Cornelius | male | 65.0 |
280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 |
456 | 457 | 0 | 1 | Millet, Mr. Francis Davis | male | 65.0 |
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 |
545 | 546 | 0 | 1 | Nicholson, Mr. Arthur Ernest | male | 64.0 |
275 | 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 |
483 | 484 | 1 | 3 | Turkula, Mrs. (Hedwig) | female | 63.0 |
570 | 571 | 1 | 2 | Harris, Mr. George | male | 62.0 |
252 | 253 | 0 | 1 | Stead, Mr. William Thomas | male | 62.0 |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 |
555 | 556 | 0 | 1 | Wright, Mr. George | male | 62.0 |
625 | 626 | 0 | 1 | Sutton, Mr. Frederick | male | 61.0 |
326 | 327 | 0 | 3 | Nysveen, Mr. Johan Hansen | male | 61.0 |
170 | 171 | 0 | 1 | Van der hoef, Mr. Wyckoff | male | 61.0 |
684 | 685 | 0 | 2 | Brown, Mr. Thomas William Solomon | male | 60.0 |
694 | 695 | 0 | 1 | Weir, Col. John | male | 60.0 |
587 | 588 | 1 | 1 | Frolicher-Stehli, Mr. Maxmillian | male | 60.0 |
366 | 367 | 1 | 1 | Warren, Mrs. Frank Manley (Anna Sophia Atkinson) | female | 60.0 |
94 | 95 | 0 | 3 | Coxon, Mr. Daniel | male | 59.0 |
232 | 233 | 0 | 2 | Sjostedt, Mr. Ernst Adolf | male | 59.0 |
268 | 269 | 1 | 1 | Graham, Mrs. William Thompson (Edith Junkins) | female | 58.0 |
11 | 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58.0 |
... | ... | ... | ... | ... | ... | ... |
718 | 719 | 0 | 3 | McEvoy, Mr. Michael | male | NaN |
727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN |
732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN |
738 | 739 | 0 | 3 | Ivanoff, Mr. Kanio | male | NaN |
739 | 740 | 0 | 3 | Nankoff, Mr. Minko | male | NaN |
740 | 741 | 1 | 1 | Hawksford, Mr. Walter James | male | NaN |
760 | 761 | 0 | 3 | Garfirth, Mr. John | male | NaN |
766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN |
768 | 769 | 0 | 3 | Moran, Mr. Daniel J | male | NaN |
773 | 774 | 0 | 3 | Elias, Mr. Dibo | male | NaN |
776 | 777 | 0 | 3 | Tobin, Mr. Roger | male | NaN |
778 | 779 | 0 | 3 | Kilgannon, Mr. Thomas J | male | NaN |
783 | 784 | 0 | 3 | Johnston, Mr. Andrew G | male | NaN |
790 | 791 | 0 | 3 | Keane, Mr. Andrew "Andy" | male | NaN |
792 | 793 | 0 | 3 | Sage, Miss. Stella Anna | female | NaN |
793 | 794 | 0 | 1 | Hoyt, Mr. William Fisher | male | NaN |
815 | 816 | 0 | 1 | Fry, Mr. Richard | male | NaN |
825 | 826 | 0 | 3 | Flynn, Mr. John | male | NaN |
826 | 827 | 0 | 3 | Lam, Mr. Len | male | NaN |
828 | 829 | 1 | 3 | McCormack, Mr. Thomas Joseph | male | NaN |
832 | 833 | 0 | 3 | Saad, Mr. Amin | male | NaN |
837 | 838 | 0 | 3 | Sirota, Mr. Maurice | male | NaN |
839 | 840 | 1 | 1 | Marechal, Mr. Pierre | male | NaN |
846 | 847 | 0 | 3 | Sage, Mr. Douglas Bullen | male | NaN |
849 | 850 | 1 | 1 | Goldenberg, Mrs. Samuel L (Edwiga Grabowska) | female | NaN |
859 | 860 | 0 | 3 | Razi, Mr. Raihed | male | NaN |
863 | 864 | 0 | 3 | Sage, Miss. Dorothy Edith "Dolly" | female | NaN |
868 | 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | NaN |
878 | 879 | 0 | 3 | Laleff, Mr. Kristo | male | NaN |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN |
891 rows × 6 columns
Here 1 = survived, and 0 = Not survived.
count = df['Survived'].value_counts()
print(count)
# Let us see that in percentage.
percentage = df['Survived'].value_counts() * 100 / len(df)
print(percentage)
0 549 1 342 Name: Survived, dtype: int64 0 61.616162 1 38.383838 Name: Survived, dtype: float64
%matplotlib inline
color = 0.5
df['Survived'].value_counts().plot(kind = 'bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7f0fcd0e0710>
I think this much more than enough for a good start to just starting and master data analysis on the web. Further, I will add more concepts, snippets, and examples in class to make things clear