In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from IPython.display import Image
import warnings
warnings.filterwarnings('ignore') # ignores annoying warnings

Lecture 12:

  • Tricks with pandas
  • Filtering the pandas way
  • concatentating and merging pandas DataFrames

Locating and Editing Data

In the past few lectures we have learned about several different ways to filter data, from masking in Numpy arrays, to filter( ) in the last lecture. Today we are going to learn about filtering pandas DataFrames to accomplish some of the same tasks, but with more flexibility and ease. Pandas provides a powerful way to tease out useful information.

We will be looking at data on Holocene Eruptions from the Smithsonian Holocene Volcano Database: http://volcano.si.edu/list_volcano_holocene.cfm (Global Volcanism Program, 2013. Volcanoes of the World, v. 4.8.7. Venzke, E (ed.). Smithsonian Institution. Downloaded 14 Mar 2020. https://doi.org/10.5479/si.GVP.VOTW4-2013). This link will download an xml file, which Excel can read, but for Pandas to work on it, you have to convert it to a regular .xls file from within excel. I have done that for you for the following.

We will see how to filter these data to pull out interesting information on Holocene Eruptions. Let's read in this data and first look at its length. To read an Excel spreadsheet, we can use a new function pd.read_excel( ). If you look in the file, you will see that the header is in the second line (not the first), so header=1 in the argument list, after the PATH to the file.

In [2]:
EruptionData=pd.read_excel('Datasets/GVP_Volcano_List_Holocene.xls',header=1)
print('Number of Eruptions:',len(EruptionData))
EruptionData.head()
Number of Eruptions: 1424
Out[2]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
0 210010 West Eifel Volcanic Field Germany Maar(s) Eruption Dated 8300 BCE Mediterranean and Western Asia Western Europe 50.170 6.85 600 Foidite Rift zone / Continental crust (>25 km)
1 210020 Chaine des Puys France Lava dome(s) Eruption Dated 4040 BCE Mediterranean and Western Asia Western Europe 45.775 2.97 1464 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
2 210030 Olot Volcanic Field Spain Pyroclastic cone(s) Evidence Credible Unknown Mediterranean and Western Asia Western Europe 42.170 2.53 893 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
3 210040 Calatrava Volcanic Field Spain Pyroclastic cone(s) Eruption Dated 3600 BCE Mediterranean and Western Asia Western Europe 38.870 -4.02 1117 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
4 211003 Vulsini Italy Caldera Eruption Observed 104 BCE Mediterranean and Western Asia Italy 42.600 11.93 800 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)

Wow, that's a lot of Eruptions! However, the DataFrame has a lot of information we really aren't interested in. For example, there are many eruptions in this data for which the evidence isn't strong ('Evidence Uncertain'). You can verify this, by printing out the Series "Activity Evidence".

In [3]:
EruptionData['Activity Evidence']
Out[3]:
0           Eruption Dated
1           Eruption Dated
2        Evidence Credible
3           Eruption Dated
4        Eruption Observed
5       Evidence Uncertain
6        Eruption Observed
7        Eruption Observed
8        Eruption Observed
9           Eruption Dated
10       Eruption Observed
11      Evidence Uncertain
12          Eruption Dated
13       Eruption Observed
14       Eruption Observed
15       Eruption Observed
16       Eruption Observed
17          Eruption Dated
18       Eruption Observed
19          Eruption Dated
20       Eruption Observed
21       Eruption Observed
22       Evidence Credible
23       Evidence Credible
24       Evidence Credible
25       Evidence Credible
26      Evidence Uncertain
27          Eruption Dated
28          Eruption Dated
29       Evidence Credible
               ...        
1394        Eruption Dated
1395        Eruption Dated
1396    Evidence Uncertain
1397    Evidence Uncertain
1398     Eruption Observed
1399    Evidence Uncertain
1400        Eruption Dated
1401    Evidence Uncertain
1402    Evidence Uncertain
1403    Evidence Uncertain
1404    Evidence Uncertain
1405        Eruption Dated
1406        Eruption Dated
1407     Evidence Credible
1408     Eruption Observed
1409     Eruption Observed
1410    Evidence Uncertain
1411     Evidence Credible
1412     Evidence Credible
1413     Eruption Observed
1414     Eruption Observed
1415     Eruption Observed
1416     Eruption Observed
1417     Eruption Observed
1418     Evidence Credible
1419     Unrest / Holocene
1420     Eruption Observed
1421     Eruption Observed
1422     Evidence Credible
1423    Evidence Uncertain
Name: Activity Evidence, Length: 1424, dtype: object

In Lecture 9, we learned how to filter a DataFrame by putting what we wanted in a conditional statement enclosed in square brackets. Remembering from Lecture 4 that the conditional for "equal to" is "==", we can retrieve all the rows that contain 'Eruption Observed' in the column 'Activity Evidence' like this this:

In [4]:
#notice the conditional '==' which means 'equals to' from Lecture 4
ObservedEruptions=EruptionData[EruptionData['Activity Evidence']=='Eruption Observed']
ObservedEruptions.head()
Out[4]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
4 211003 Vulsini Italy Caldera Eruption Observed 104 BCE Mediterranean and Western Asia Italy 42.600 11.930 800 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
6 211010 Campi Flegrei Italy Caldera Eruption Observed 1538 CE Mediterranean and Western Asia Italy 40.827 14.139 458 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
7 211020 Vesuvius Italy Stratovolcano Eruption Observed 1944 CE Mediterranean and Western Asia Italy 40.821 14.426 1281 Phono-tephrite / Tephri-phonolite Subduction zone / Continental crust (>25 km)
8 211030 Ischia Italy Complex Eruption Observed 1302 CE Mediterranean and Western Asia Italy 40.730 13.897 789 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
10 211040 Stromboli Italy Stratovolcano Eruption Observed 2020 CE Mediterranean and Western Asia Italy 38.789 15.213 924 Trachyandesite / Basaltic Trachyandesite Subduction zone / Continental crust (>25 km)

Pandas DataFrames also have a method called .loc that allows for filtering of DataFrames in a similar way to the familiar conditional above.

In [6]:
EruptionData.loc[EruptionData['Activity Evidence']=='Eruption Observed'].head()
Out[6]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
4 211003 Vulsini Italy Caldera Eruption Observed 104 BCE Mediterranean and Western Asia Italy 42.600 11.930 800 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
6 211010 Campi Flegrei Italy Caldera Eruption Observed 1538 CE Mediterranean and Western Asia Italy 40.827 14.139 458 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
7 211020 Vesuvius Italy Stratovolcano Eruption Observed 1944 CE Mediterranean and Western Asia Italy 40.821 14.426 1281 Phono-tephrite / Tephri-phonolite Subduction zone / Continental crust (>25 km)
8 211030 Ischia Italy Complex Eruption Observed 1302 CE Mediterranean and Western Asia Italy 40.730 13.897 789 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
10 211040 Stromboli Italy Stratovolcano Eruption Observed 2020 CE Mediterranean and Western Asia Italy 38.789 15.213 924 Trachyandesite / Basaltic Trachyandesite Subduction zone / Continental crust (>25 km)

This statement does exactly the same thing as the conditional. The syntax of a .loc statement might look trickier, but trust us, it will make your life easier as things get more complicated. It is is computationally faster and has more tricks up its sleeve as we shall see soon. :)

Now let's look at some big eruptions we might be interested in (and who wouldn't be?). One of the most famous Volcanic eruptions is the 1980 Eruption of Mount St. Helens (Washington State). To find it, let's search for Holocene Eruptions of Mount St. Helens.

In [7]:
Image(filename='Figures/StHelens.jpg')
Out[7]:

Image from: Global Volcanism Program, 2013. St. Helens (321050) in Volcanoes of the World, v. 4.7.5. Venzke, E (ed.). Smithsonian Institution. Downloaded 31 Dec 2018 (https://volcano.si.edu/volcano.cfm?vn=321050)

In [33]:
EruptionData.loc[EruptionData['Volcano Name']=='St. Helens']
Out[33]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
937 321050 St. Helens United States Stratovolcano Eruption Observed 2008 CE Canada and Western USA USA (Washington) 46.2 -122.18 2549 Dacite Subduction zone / Continental crust (>25 km)

As we can see, simple conditional statements like this enable us to filter large datasets for the small amount of information we're interested in.

Although the above statement would work equally well without the .loc method, we can add some whistles and bells. The use of the .loc syntax allows you search through a particular column (Series) by putting a comma after your conditional statement followed by another Series name. Say we wanted the 'Last Known Eruption' of all stratovolcanoes. We could do this:

In [45]:
stratos=EruptionData.loc[EruptionData['Primary Volcano Type'].str.contains('Stratovolcano'),'Last Known Eruption']
stratos.head()
Out[45]:
7     1944 CE
10    2020 CE
11    Unknown
12    1230 CE
13    1890 CE
Name: Last Known Eruption, dtype: object

Here we have used the syntax DataFrame.Series.str.contains( ). This allows us to get not only the type "Stratovolcano", but also "Stratovolcano(es)" and anything that has "Stratovolcano" in it.

It is worth pointing out another way to accomplish the same thing using the method isin(). You can create a list of things you want (or don't want), then test if the string is in the list. Here is how it would work for this example to select things in the list:

In [12]:
volcano_types=['Stratovolcano','Stratovolcano(es)']
stratos=EruptionData[EruptionData['Primary Volcano Type'].isin(volcano_types)]
stratos.head()
Out[12]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
7 211020 Vesuvius Italy Stratovolcano Eruption Observed 1944 CE Mediterranean and Western Asia Italy 40.821 14.426 1281 Phono-tephrite / Tephri-phonolite Subduction zone / Continental crust (>25 km)
10 211040 Stromboli Italy Stratovolcano Eruption Observed 2020 CE Mediterranean and Western Asia Italy 38.789 15.213 924 Trachyandesite / Basaltic Trachyandesite Subduction zone / Continental crust (>25 km)
11 211041 Panarea Italy Stratovolcano Evidence Uncertain Unknown Mediterranean and Western Asia Italy 38.638 15.064 399 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
12 211042 Lipari Italy Stratovolcano(es) Eruption Dated 1230 CE Mediterranean and Western Asia Italy 38.490 14.933 590 Rhyolite Subduction zone / Continental crust (>25 km)
13 211050 Vulcano Italy Stratovolcano(es) Eruption Observed 1890 CE Mediterranean and Western Asia Italy 38.404 14.962 500 Trachybasalt / Tephrite Basanite Subduction zone / Continental crust (>25 km)

And here's how it works if you don't want things in the list:

In [13]:
volcano_types=['Stratovolcano','Stratovolcano(es)']
not_stratos=EruptionData[EruptionData['Primary Volcano Type'].isin(volcano_types)==False]
not_stratos.head()
Out[13]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
0 210010 West Eifel Volcanic Field Germany Maar(s) Eruption Dated 8300 BCE Mediterranean and Western Asia Western Europe 50.170 6.85 600 Foidite Rift zone / Continental crust (>25 km)
1 210020 Chaine des Puys France Lava dome(s) Eruption Dated 4040 BCE Mediterranean and Western Asia Western Europe 45.775 2.97 1464 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
2 210030 Olot Volcanic Field Spain Pyroclastic cone(s) Evidence Credible Unknown Mediterranean and Western Asia Western Europe 42.170 2.53 893 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
3 210040 Calatrava Volcanic Field Spain Pyroclastic cone(s) Eruption Dated 3600 BCE Mediterranean and Western Asia Western Europe 38.870 -4.02 1117 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
4 211003 Vulsini Italy Caldera Eruption Observed 104 BCE Mediterranean and Western Asia Italy 42.600 11.93 800 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)

Moving on.

Now we can do stuff to this filtered DataFrame stratos. The .loc syntax also allows you to take a slice through the columns list to select a specific range of column headers:

In [46]:
ColumnSlice=EruptionData.loc[EruptionData['Primary Volcano Type'].str.contains('Stratovolcano'),
                             'Volcano Name':'Last Known Eruption']
ColumnSlice.head()
Out[46]:
Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption
7 Vesuvius Italy Stratovolcano Eruption Observed 1944 CE
10 Stromboli Italy Stratovolcano Eruption Observed 2020 CE
11 Panarea Italy Stratovolcano Evidence Uncertain Unknown
12 Lipari Italy Stratovolcano(es) Eruption Dated 1230 CE
13 Vulcano Italy Stratovolcano(es) Eruption Observed 1890 CE

Something else .loc can do is to change the values inplace in DataFrames easily. Let's say we found a historical document that told us that the Unknown last eruption at Panarea was in 5000 BCE. We want to update the information in the DataFrame and can do it this way:

In [50]:
print ('before modifying:\n',EruptionData.loc[EruptionData['Volcano Name']=='Panarea']['Last Known Eruption'])
EruptionData.loc[EruptionData['Volcano Name']=='Panarea','Last Known Eruption']='5000 BCE'

# and let's take a look: 
print ('after modifying:\n',EruptionData.loc[EruptionData['Volcano Name']=='Panarea'])
before modifying:
 11    Unknown
Name: Last Known Eruption, dtype: object
after modifying:
     Volcano Number Volcano Name Country Primary Volcano Type  \
11          211041      Panarea   Italy        Stratovolcano   

     Activity Evidence Last Known Eruption                          Region  \
11  Evidence Uncertain            5000 BCE  Mediterranean and Western Asia   

   Subregion  Latitude  Longitude  Elevation (m)  \
11     Italy    38.638     15.064            399   

              Dominant Rock Type                              Tectonic Setting  
11  Andesite / Basaltic Andesite  Subduction zone / Continental crust (>25 km)  

As we can see, the syntax for this can get complicated quickly, but we can retrieve and/or modify lots of data using a few lines of code.

Sorting and Indexing

What if we wanted to sort our dataset so the most northerly eruptions come out on top? Pandas DataFrames have a method for this called sort_values. Normally, this will sort from lowest to highest (an "ascending" sort), but we can use the argument ascending=False to tell it to sort from highest to lowest.

In [137]:
# First read in our DataFrame again:
EruptionData=pd.read_excel('Datasets/GVP_Volcano_List_Holocene.xls',header=1)
NorthernToSouthern=EruptionData.sort_values(by='Latitude',ascending=False)
NorthernToSouthern.head()
Out[137]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
1361 377020 East Gakkel Ridge at 85°E Undersea Features Submarine Evidence Credible 1999 CE Iceland and Arctic Ocean Arctic Ocean 85.608 85.250 -3800 No Data (checked) Rift zone / Oceanic crust (< 15 km)
1360 376010 Jan Mayen Norway Stratovolcano Eruption Observed 1985 CE Iceland and Arctic Ocean Atlantic Ocean (Jan Mayen) 71.082 -8.155 2197 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1359 375010 Kolbeinsey Ridge Iceland Submarine Eruption Observed 1755 CE Iceland and Arctic Ocean North of Iceland 66.670 -18.500 5 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1355 373100 Tjornes Fracture Zone Iceland Submarine Eruption Observed 1868 CE Iceland and Arctic Ocean Iceland (northeastern) 66.309 -17.118 -75 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1354 373090 Theistareykir Iceland Shield Eruption Dated 900 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.833 -17.166 540 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)

Looks like the most northerly eruptions during the Holocene were at the East Gakkel Ridge and that the most northerly above sea-level eruption was on Jan Mayen in 1985. I bet I'm the only person you know who has actually been there!

Now let's try to get the first 10 rows in this DataFrame. We can do this using .loc, right?

In [138]:
NorthernToSouthern.loc[0:10]
Out[138]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
0 210010 West Eifel Volcanic Field Germany Maar(s) Eruption Dated 8300 BCE Mediterranean and Western Asia Western Europe 50.170 6.850 600 Foidite Rift zone / Continental crust (>25 km)
692 290350 Karpinsky Group Russia Cone(s) Eruption Observed 1952 CE Kuril Islands Kuril Islands 50.148 155.373 1326 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
932 320811 Cayley Volcanic Field Canada Volcanic field Evidence Credible Unknown Canada and Western USA Canada 50.120 -123.280 2375 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
930 320190 Garibaldi Lake Canada Volcanic field Evidence Credible Unknown Canada and Western USA Canada 49.933 -123.000 2316 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
931 320200 Garibaldi Canada Stratovolcano Eruption Dated 8060 BCE Canada and Western USA Canada 49.850 -123.000 2678 Dacite Subduction zone / Continental crust (>25 km)
689 290320 Nemo Peak Russia Caldera Eruption Observed 1938 CE Kuril Islands Kuril Islands 49.570 154.808 1018 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
817 305020 Keluo Group China Pyroclastic cone(s) Evidence Credible Unknown Kamchatka and Mainland Asia China (eastern) 49.370 125.920 670 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
688 290310 Tao-Rusyr Caldera Russia Stratovolcano Eruption Observed 1952 CE Kuril Islands Kuril Islands 49.350 154.700 1325 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
687 290300 Kharimkotan Russia Stratovolcano Eruption Observed 1933 CE Kuril Islands Kuril Islands 49.120 154.508 1145 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
684 290260 Chirinkotan Russia Stratovolcano Eruption Observed 2017 CE Kuril Islands Kuril Islands 48.980 153.480 724 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
685 290270 Ekarma Russia Stratovolcano Eruption Observed 2010 CE Kuril Islands Kuril Islands 48.958 153.930 1170 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
686 290290 Sinarka Russia Stratovolcano Eruption Observed 1878 CE Kuril Islands Kuril Islands 48.873 154.182 911 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
978 331005 West Valley Segment Canada Submarine Unrest / Holocene Unknown Hawaii and Pacific Ocean Pacific Ocean (northern) 48.780 -128.640 -2550 NaN Rift zone / Intermediate crust (15-25 km)
933 321010 Baker United States Stratovolcano(es) Eruption Observed 1880 CE Canada and Western USA USA (Washington) 48.777 -121.813 3285 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
818 305030 Wudalianchi China Volcanic field Eruption Observed 1776 CE Kamchatka and Mainland Asia China (eastern) 48.720 126.120 597 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
812 303020 Khanuy Gol Mongolia Volcanic field Evidence Credible Unknown Kamchatka and Mainland Asia Mongolia 48.670 102.750 1886 Trachyandesite / Basaltic Trachyandesite Intraplate / Continental crust (>25 km)
683 290250 Raikoke Russia Stratovolcano Eruption Observed 2019 CE Kuril Islands Kuril Islands 48.292 153.250 551 Basalt / Picro-Basalt Subduction zone / Intermediate crust (15-25 km)
811 303010 Taryatu-Chulutu Mongolia Volcanic field Eruption Dated 2980 BCE Kamchatka and Mainland Asia Mongolia 48.133 99.950 2326 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
934 321020 Glacier Peak United States Stratovolcano Eruption Dated 1700 CE Canada and Western USA USA (Washington) 48.112 -121.113 3213 Dacite Subduction zone / Continental crust (>25 km)
682 290240 Sarychev Peak Russia Stratovolcano Eruption Observed 2019 CE Kuril Islands Kuril Islands 48.092 153.200 1496 Andesite / Basaltic Andesite Subduction zone / Intermediate crust (15-25 km)
979 331010 Endeavour Segment Canada Submarine Eruption Dated 3490 BCE Hawaii and Pacific Ocean Pacific Ocean (northern) 47.950 -129.100 -2050 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
681 290220 Rasshua Russia Stratovolcano Eruption Observed 1957 CE Kuril Islands Kuril Islands 47.770 153.020 956 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km)
680 290211 Srednii Russia Submarine Evidence Credible Unknown Kuril Islands Kuril Islands 47.600 152.920 36 No Data (checked) Subduction zone / Oceanic crust (< 15 km)
679 290210 Ushishur Russia Caldera Eruption Observed 1884 CE Kuril Islands Kuril Islands 47.520 152.800 401 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km)
816 305011 Arxan-Chaihe China Pyroclastic cone(s) Eruption Dated 0 CE Kamchatka and Mainland Asia China (eastern) 47.450 120.800 1677 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
678 290200 Ketoi Russia Stratovolcano Eruption Observed 1960 CE Kuril Islands Kuril Islands 47.350 152.475 1172 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km)
677 290191 Uratman Russia Stratovolcano Evidence Credible Unknown Kuril Islands Kuril Islands 47.120 152.250 678 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km)
676 290190 Prevo Peak Russia Stratovolcano Eruption Observed 1825 CE Kuril Islands Kuril Islands 47.020 152.120 1360 Basalt / Picro-Basalt Subduction zone / Oceanic crust (< 15 km)
675 290180 Zavaritzki Caldera Russia Caldera Eruption Observed 1957 CE Kuril Islands Kuril Islands 46.925 151.950 624 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km)
980 331011 Cobb Segment Canada Submarine Eruption Dated 1180 BCE Hawaii and Pacific Ocean Pacific Ocean (northern) 46.880 -129.330 -2100 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
... ... ... ... ... ... ... ... ... ... ... ... ... ...
8 211030 Ischia Italy Complex Eruption Observed 1302 CE Mediterranean and Western Asia Italy 40.730 13.897 789 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
581 283280 Hakkodasan Japan Stratovolcano(es) Eruption Dated 1550 CE Japan, Taiwan, Marianas Honshu 40.659 140.877 1585 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
579 283270 Iwakisan Japan Stratovolcano Eruption Observed 1863 CE Japan, Taiwan, Marianas Honshu 40.656 140.303 1625 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
36 214060 Aragats Armenia Stratovolcano Evidence Credible Unknown Mediterranean and Western Asia Western Asia 40.530 44.200 4095 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
580 283271 Towada Japan Caldera Eruption Observed 915 CE Japan, Taiwan, Marianas Honshu 40.510 140.880 1011 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
956 323080 Lassen Volcanic Center United States Stratovolcano Eruption Observed 1917 CE Canada and Western USA USA (California) 40.492 -121.508 3187 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
37 214070 Ghegham Volcanic Ridge Armenia Volcanic field Eruption Dated 1900 BCE Mediterranean and Western Asia Western Asia 40.283 45.000 3597 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
39 214090 Porak Armenia-Azerbaijan Stratovolcano Eruption Dated 778 BCE Mediterranean and Western Asia Western Asia 40.028 45.740 3029 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
577 283260 Akita-Yakeyama Japan Stratovolcano Eruption Observed 1997 CE Japan, Taiwan, Marianas Honshu 39.964 140.757 1366 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
576 283250 Hachimantai Japan Stratovolcano Eruption Dated 5350 BCE Japan, Taiwan, Marianas Honshu 39.958 140.854 1613 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
578 283262 Megata Japan Maar(s) Eruption Dated 2050 BCE Japan, Taiwan, Marianas Honshu 39.950 139.730 291 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
575 283240 Iwatesan Japan Complex Eruption Observed 1919 CE Japan, Taiwan, Marianas Honshu 39.853 141.001 2038 Basalt / Picro-Basalt Subduction zone / Continental crust (>25 km)
38 214080 Vaiyots-Sar Armenia Pyroclastic cone(s) Eruption Dated 2000 BCE Mediterranean and Western Asia Western Asia 39.797 45.497 2575 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
574 283230 Akita-Komagatake Japan Stratovolcano(es) Eruption Observed 1971 CE Japan, Taiwan, Marianas Honshu 39.761 140.799 1637 Basalt / Picro-Basalt Subduction zone / Continental crust (>25 km)
40 214100 Tskhouk-Karckar Armenia-Azerbaijan Pyroclastic cone(s) Eruption Dated 3000 BCE Mediterranean and Western Asia Western Asia 39.742 45.992 3139 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
32 213040 Ararat Turkey Stratovolcano Eruption Observed 1840 CE Mediterranean and Western Asia Turkey 39.700 44.300 5165 Andesite / Basaltic Andesite Intraplate / Continental crust (>25 km)
1364 382002 Corvo Portugal Stratovolcano Evidence Credible Unknown Atlantic Ocean Azores 39.699 -31.111 718 Basalt / Picro-Basalt Intraplate / Oceanic crust (< 15 km)
975 328010 Dotsero United States Maar Eruption Dated 2200 BCE Canada and Western USA USA (Colorado) 39.661 -107.035 2230 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
969 326010 Soda Lakes United States Maar(s) Evidence Credible Unknown Canada and Western USA USA (Nevada) 39.530 -118.870 1251 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
9 211031 Palinuro Italy Submarine Eruption Dated 8040 BCE Mediterranean and Western Asia Italy 39.480 14.830 -70 Phonolite Subduction zone / Continental crust (>25 km)
1363 382001 Flores Portugal Stratovolcano(es) Eruption Dated 950 BCE Atlantic Ocean Azores 39.462 -31.216 914 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
31 213030 Tenduruk Dagi Turkey Shield Eruption Observed 1855 CE Mediterranean and Western Asia Turkey 39.356 43.874 3514 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
17 211080 Marsili Italy Submarine Eruption Dated 1050 BCE Mediterranean and Western Asia Italy 39.284 14.399 -779 NaN NaN
573 283220 Chokaisan Japan Stratovolcano(es) Eruption Observed 1974 CE Japan, Taiwan, Marianas Honshu 39.099 140.049 2236 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
1368 382040 Graciosa Portugal Stratovolcano Eruption Dated 1950 BCE Atlantic Ocean Azores 39.020 -27.970 402 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
957 323100 Clear Lake United States Volcanic field Evidence Credible Unknown Canada and Western USA USA (California) 38.970 -122.770 1439 Dacite Subduction zone / Continental crust (>25 km)
971 327050 Black Rock Desert United States Volcanic field Eruption Dated 1290 CE Canada and Western USA USA (Utah) 38.970 -112.500 1800 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
572 283210 Kurikomayama Japan Stratovolcano Eruption Observed 1950 CE Japan, Taiwan, Marianas Honshu 38.961 140.788 1627 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km)
3 210040 Calatrava Volcanic Field Spain Pyroclastic cone(s) Eruption Dated 3600 BCE Mediterranean and Western Asia Western Europe 38.870 -4.020 1117 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
10 211040 Stromboli Italy Stratovolcano Eruption Observed 2020 CE Mediterranean and Western Asia Italy 38.789 15.213 924 Trachyandesite / Basaltic Trachyandesite Subduction zone / Continental crust (>25 km)

149 rows × 13 columns

Oops! This didn't work as expected did it? Instead, we got the all the rows between the indices of 0 and 10 which are not in any particular order now. When we sorted by Latitude, Pandas did not assign new indices and put the records in no particular order within a particular Latitude value. This is a "feature" of sorting functions. So... to get what we really wanted, which was the first 10 records in the NorthernToSouthern DataFrame, we can use the method .iloc instead of .loc.

In [139]:
NorthernToSouthern.iloc[0:10]
Out[139]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
1361 377020 East Gakkel Ridge at 85°E Undersea Features Submarine Evidence Credible 1999 CE Iceland and Arctic Ocean Arctic Ocean 85.608 85.250 -3800 No Data (checked) Rift zone / Oceanic crust (< 15 km)
1360 376010 Jan Mayen Norway Stratovolcano Eruption Observed 1985 CE Iceland and Arctic Ocean Atlantic Ocean (Jan Mayen) 71.082 -8.155 2197 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1359 375010 Kolbeinsey Ridge Iceland Submarine Eruption Observed 1755 CE Iceland and Arctic Ocean North of Iceland 66.670 -18.500 5 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1355 373100 Tjornes Fracture Zone Iceland Submarine Eruption Observed 1868 CE Iceland and Arctic Ocean Iceland (northeastern) 66.309 -17.118 -75 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1354 373090 Theistareykir Iceland Shield Eruption Dated 900 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.833 -17.166 540 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1352 373080 Krafla Iceland Caldera Eruption Observed 1984 CE Iceland and Arctic Ocean Iceland (northeastern) 65.715 -16.728 800 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1353 373082 Heidarspordar Iceland Fissure vent Confirmed Eruption 300 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.583 -16.817 490 NaN Rift zone / Oceanic crust (< 15 km)
904 314060 Imuruk Lake United States Shield(s) Eruption Dated 300 CE Alaska Alaska (western) 65.517 -163.450 610 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
1351 373070 Fremrinamar Iceland Stratovolcano Eruption Dated 1200 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.416 -16.666 970 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
1350 373060 Askja Iceland Stratovolcano Eruption Observed 1961 CE Iceland and Arctic Ocean Iceland (northeastern) 65.033 -16.783 1080 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)

Much better. Now we can see that there were lots of eruptions during the Holocene on Iceland. Can you think of how to get names of all of the Icelandic volcanoes that erupted during the Holocene?

But to solve our indexing problem with .loc( ), by re-indexing our sorted DataFrame. To re-index a Pandas DataFrame, we use the .set_index( ) method.

This will set the index to a list of values from 0 to the length of the Dataframe.

In [140]:
# make a list of integers between zero up to (but not including) the length of the DataFrame
newIndexValues=list(range(len(NorthernToSouthern))) 
# reset the indices to this list
NorthernToSouthern=NorthernToSouthern.set_index([newIndexValues])
NorthernToSouthern.head()
Out[140]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
0 377020 East Gakkel Ridge at 85°E Undersea Features Submarine Evidence Credible 1999 CE Iceland and Arctic Ocean Arctic Ocean 85.608 85.250 -3800 No Data (checked) Rift zone / Oceanic crust (< 15 km)
1 376010 Jan Mayen Norway Stratovolcano Eruption Observed 1985 CE Iceland and Arctic Ocean Atlantic Ocean (Jan Mayen) 71.082 -8.155 2197 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
2 375010 Kolbeinsey Ridge Iceland Submarine Eruption Observed 1755 CE Iceland and Arctic Ocean North of Iceland 66.670 -18.500 5 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
3 373100 Tjornes Fracture Zone Iceland Submarine Eruption Observed 1868 CE Iceland and Arctic Ocean Iceland (northeastern) 66.309 -17.118 -75 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
4 373090 Theistareykir Iceland Shield Eruption Dated 900 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.833 -17.166 540 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)

Another thing about indices: We can set the indices to one of the other column names, for example the "Volcano Name".

In [141]:
NorthernToSouthern=NorthernToSouthern.set_index('Volcano Name')
NorthernToSouthern.head()
Out[141]:
Volcano Number Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
Volcano Name
East Gakkel Ridge at 85°E 377020 Undersea Features Submarine Evidence Credible 1999 CE Iceland and Arctic Ocean Arctic Ocean 85.608 85.250 -3800 No Data (checked) Rift zone / Oceanic crust (< 15 km)
Jan Mayen 376010 Norway Stratovolcano Eruption Observed 1985 CE Iceland and Arctic Ocean Atlantic Ocean (Jan Mayen) 71.082 -8.155 2197 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
Kolbeinsey Ridge 375010 Iceland Submarine Eruption Observed 1755 CE Iceland and Arctic Ocean North of Iceland 66.670 -18.500 5 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
Tjornes Fracture Zone 373100 Iceland Submarine Eruption Observed 1868 CE Iceland and Arctic Ocean Iceland (northeastern) 66.309 -17.118 -75 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)
Theistareykir 373090 Iceland Shield Eruption Dated 900 BCE Iceland and Arctic Ocean Iceland (northeastern) 65.833 -17.166 540 Basalt / Picro-Basalt Rift zone / Oceanic crust (< 15 km)

Sorting by awkward strings

In this example data set, the dates for the last known eruption are the dates in CE or BCE or unknown, so we cannot sort by that column header. But we can first drop all the rows where the 'Last Known Eruption' is 'unknown', then split the dates on the space, multiply all the dates with 'BCE' in them by -1 and sort by the resulting column.

Let's do that step by step:

  • to drop all the 'unknown' eruption dates, we can use the filtering statement:
In [169]:
EruptionData=pd.read_excel('Datasets/GVP_Volcano_List_Holocene.xls',header=1) # read this in again
KnownEruptionDates=EruptionData[EruptionData['Last Known Eruption'].str.contains('unknown')==False]

To see if this worked, we can look at all the unique eruption dates and see if 'unknown' is still there:

In [170]:
KnownEruptionDates[KnownEruptionDates['Last Known Eruption'].str.contains('unknown')]
Out[170]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting

Nope. So now step 2:

  • To split a string on a particular key, in this case a space, we can use the syntax:

EruptionData['Last Known Eruption'].str.split()

In [171]:
KnownEruptionDates['Last Known Eruption'].str.split().head()
Out[171]:
0    [8300, BCE]
1    [4040, BCE]
2      [Unknown]
3    [3600, BCE]
4     [104, BCE]
Name: Last Known Eruption, dtype: object

Whoa! some of the entries are 'Unknown' and not 'unknown'. Sloppy! But we can handle that:

In [172]:
KnownEruptionDates=KnownEruptionDates[KnownEruptionDates['Last Known Eruption'].str.contains('Unknown')==False]

More on splitting: It turns out that a space is the default for split. We don't really want to do this, but if we DID, we could split on the 'C' like so:

In [173]:
KnownEruptionDates['Last Known Eruption'].str.split('C').head()
Out[173]:
0    [8300 B, E]
1    [4040 B, E]
3    [3600 B, E]
4     [104 B, E]
6     [1538 , E]
Name: Last Known Eruption, dtype: object

See how the .str.split() method returns a list with the stuff before the split key as the first element and the stuff after the key is the second.

We can make two arrays, one for the date and one for the 'CE' or 'BCE' tag. We first make an array (using the DataFrame method .values, transpose the array and assign the first row to a dataframe column named 'date' and the second row to a column named 'CE/BCE'.

In [174]:
dates=KnownEruptionDates['Last Known Eruption'].str.split(' ',expand=True).transpose()
print (dates.values)
[['8300' '4040' '3600' ... '1911' '2016' '1962']
 ['BCE' 'BCE' 'BCE' ... 'CE' 'CE' 'CE']]
In [175]:
# put in the 'date' column:
KnownEruptionDates['date']=dates.values[0].astype('int')
# put in the 'CE/BCE' column:
KnownEruptionDates['CE/BCE']=dates.values[1].astype('str')

Because Pandas read in the 'Last Known Eruption' as a string, we need to convert the first part to an integer (that is the .astype('int') bit above). This means we can multiply the date in records with 'BCE' by -1.

In [176]:
KnownEruptionDates.loc[KnownEruptionDates['CE/BCE'].str.contains('BCE'),'date']=-1*KnownEruptionDates['date']
KnownEruptionDates.sort_values(by='date').head()
Out[176]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting date CE/BCE
127 222161 Igwisi Hills Tanzania Pyroclastic cone(s) Eruption Dated 10450 BCE Africa and Red Sea Africa (eastern) -4.889 31.933 1146 Foidite Rift zone / Continental crust (>25 km) -10450 BCE
561 283141 Nantaisan Japan Stratovolcano Eruption Dated 9540 BCE Japan, Taiwan, Marianas Honshu 36.765 139.491 2486 Andesite / Basaltic Andesite Subduction zone / Continental crust (>25 km) -9540 BCE
974 327812 Red Hill United States Volcanic field Eruption Dated 9450 BCE Canada and Western USA USA (New Mexico) 34.250 -108.830 2300 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km) -9450 BCE
964 324010 Black Butte Crater Lava Field United States Shield Eruption Dated 8400 BCE Canada and Western USA USA (Idaho) 43.183 -114.352 1478 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km) -8400 BCE
1400 390022 Berlin Antarctica Shield(s) Eruption Dated 8350 BCE Antarctica Antarctica and South Sandwich Islands -76.050 -136.000 3478 Trachyte / Trachydacite Intraplate / Continental crust (>25 km) -8350 BCE

Concatenation

We've been working on a data set that only had Holocene data in it, but the same Smithsonian website has a data set for Pleistocene volcanic eruptions too. We can concatentate both data sets into a single DataFrame (as long as they have the same columns) like this:

In [238]:
HoloceneEruptionData=pd.read_excel('Datasets/GVP_Volcano_List_Holocene.xls',header=1) # read this in again
PleistoceneEruptionData=pd.read_excel('Datasets/GVP_Volcano_List_Pleistocene.xls',header=1) # read this in again
# get both Unknown and unknown out!
RecentEruptionData=pd.concat([HoloceneEruptionData,PleistoceneEruptionData])
RecentEruptionData.head()
Out[238]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
0 210010 West Eifel Volcanic Field Germany Maar(s) Eruption Dated 8300 BCE Mediterranean and Western Asia Western Europe 50.170 6.85 600 Foidite Rift zone / Continental crust (>25 km)
1 210020 Chaine des Puys France Lava dome(s) Eruption Dated 4040 BCE Mediterranean and Western Asia Western Europe 45.775 2.97 1464 Basalt / Picro-Basalt Rift zone / Continental crust (>25 km)
2 210030 Olot Volcanic Field Spain Pyroclastic cone(s) Evidence Credible Unknown Mediterranean and Western Asia Western Europe 42.170 2.53 893 Trachybasalt / Tephrite Basanite Intraplate / Continental crust (>25 km)
3 210040 Calatrava Volcanic Field Spain Pyroclastic cone(s) Eruption Dated 3600 BCE Mediterranean and Western Asia Western Europe 38.870 -4.02 1117 Basalt / Picro-Basalt Intraplate / Continental crust (>25 km)
4 211003 Vulsini Italy Caldera Eruption Observed 104 BCE Mediterranean and Western Asia Italy 42.600 11.93 800 Trachyte / Trachydacite Subduction zone / Continental crust (>25 km)
In [185]:
RecentEruptionData.tail()
Out[185]:
Volcano Number Volcano Name Country Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting
1235 461838 Raja-Sabanda Indonesia Unknown NaN NaN Indonesia Sumatra -2.228 101.430 2527 NaN Unknown
1236 461839 Hulunilo Indonesia Stratovolcano NaN NaN Indonesia Sumatra -2.418 101.776 2469 NaN Unknown
1237 461840 Tungkat Indonesia Unknown NaN NaN Indonesia Sumatra -2.480 102.026 1576 NaN Unknown
1238 464807 Ramu-Labumbu Indonesia Unknown NaN NaN Indonesia Lesser Sunda Islands -8.416 118.207 1086 NaN Unknown
1239 590835 Discovery Antarctica Stratovolcano NaN NaN Antarctica Antarctica and South Sandwich Islands -78.374 165.015 2578 NaN Intraplate / Continental crust (>25 km)

So a lot less is known about the Pleistocene eruptions than the Holocene ones.

There is another data set on the Smithsonian Website that is interesting. It has a list of all the currently active volcanoes.

In [206]:
ActiveVolcanoes=pd.read_excel('Datasets/ActiveVolcanoes.xlsx')
ActiveVolcanoes.head()
Out[206]:
Volcano Country Eruption Start Date Eruption Stop Date Max VEI
0 Kuchinoerabujima Japan 2020 Jan 11 2020 Feb 13 (continuing) NaN
1 Semisopochnoi United States 2019 Dec 7 2020 Feb 16 (continuing) NaN
2 Nishinoshima Japan 2019 Dec 5 2020 Feb 15 (continuing) 1.0
3 Kikai Japan 2019 Nov 2 2020 Feb 11 (continuing) NaN
4 Klyuchevskoy Russia 2019 Oct 24 2020 Feb 19 (continuing) NaN

There's a lot more information about each volcano in our HoloceneEruptionData DataFrame, which we could attach to the use Pandas merge( ) method. There are many ways to use merge, but the idea is to identify what kind of join you want:

In [233]:
Image(filename='Figures/join-types.jpg')
Out[233]:

We want to pair all the stuff from the Holocene volcanoes database with the ActiveVolcanoes data if it isn't already there ('Country' for example is in both).

In [249]:
MergedVolcanoes=ActiveVolcanoes.merge(KnownEruptionDates,how='left',left_on='Volcano',right_on='Volcano Name')
MergedVolcanoes.head()
Out[249]:
Volcano Country_x Eruption Start Date Eruption Stop Date Max VEI Volcano Number Volcano Name Country_y Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting date CE/BCE
0 Kuchinoerabujima Japan 2020 Jan 11 2020 Feb 13 (continuing) NaN 282050.0 Kuchinoerabujima Japan Stratovolcano(es) Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.443 130.217 657.0 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
1 Semisopochnoi United States 2019 Dec 7 2020 Feb 16 (continuing) NaN 311060.0 Semisopochnoi United States Stratovolcano Eruption Observed 2020 CE Alaska Aleutian Islands 51.930 179.580 1221.0 Basalt / Picro-Basalt Subduction zone / Intermediate crust (15-25 km) 2020.0 CE
2 Nishinoshima Japan 2019 Dec 5 2020 Feb 15 (continuing) 1.0 284096.0 Nishinoshima Japan Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Izu, Volcano, and Mariana Islands 27.247 140.874 25.0 Andesite / Basaltic Andesite Subduction zone / Crustal thickness unknown 2020.0 CE
3 Kikai Japan 2019 Nov 2 2020 Feb 11 (continuing) NaN 282060.0 Kikai Japan Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.793 130.305 704.0 Rhyolite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
4 Klyuchevskoy Russia 2019 Oct 24 2020 Feb 19 (continuing) NaN 300260.0 Klyuchevskoy Russia Stratovolcano Eruption Observed 2020 CE Kamchatka and Mainland Asia Kamchatka Peninsula 56.056 160.642 4754.0 Basalt / Picro-Basalt Subduction zone / Continental crust (>25 km) 2020.0 CE

You see that if a particular column (say 'Country') is in both DataFrames, the first one gets renamed 'Country_x' and the other one 'Country_y'. To clean that up we can rename the first to 'Country', then delete (drop) the second one out of the DataFrame. We also want to delete the column 'Volcano Name' because it is redundant.

In [250]:
MergedVolcanoes.rename(columns={'Country_x':'Country'},inplace=True)
MergedVolcanoes.drop(['Country_y','Volcano Name'],axis=1,inplace=True)
MergedVolcanoes.head()
Out[250]:
Volcano Country Eruption Start Date Eruption Stop Date Max VEI Volcano Number Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting date CE/BCE
0 Kuchinoerabujima Japan 2020 Jan 11 2020 Feb 13 (continuing) NaN 282050.0 Stratovolcano(es) Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.443 130.217 657.0 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
1 Semisopochnoi United States 2019 Dec 7 2020 Feb 16 (continuing) NaN 311060.0 Stratovolcano Eruption Observed 2020 CE Alaska Aleutian Islands 51.930 179.580 1221.0 Basalt / Picro-Basalt Subduction zone / Intermediate crust (15-25 km) 2020.0 CE
2 Nishinoshima Japan 2019 Dec 5 2020 Feb 15 (continuing) 1.0 284096.0 Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Izu, Volcano, and Mariana Islands 27.247 140.874 25.0 Andesite / Basaltic Andesite Subduction zone / Crustal thickness unknown 2020.0 CE
3 Kikai Japan 2019 Nov 2 2020 Feb 11 (continuing) NaN 282060.0 Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.793 130.305 704.0 Rhyolite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
4 Klyuchevskoy Russia 2019 Oct 24 2020 Feb 19 (continuing) NaN 300260.0 Stratovolcano Eruption Observed 2020 CE Kamchatka and Mainland Asia Kamchatka Peninsula 56.056 160.642 4754.0 Basalt / Picro-Basalt Subduction zone / Continental crust (>25 km) 2020.0 CE

Looks like it worked.

Using .unique( ) to find a list of categories, and string operations

Now that we have more information, we can start classifying these eruptions by type. For example, what tectonic settings are represented in this dataset? Pandas has a method called .unique( ) that allows us to find all the unique values in a column.

In [251]:
list(MergedVolcanoes['Tectonic Setting'].unique())
Out[251]:
['Subduction zone / Oceanic crust (< 15 km)',
 'Subduction zone / Intermediate crust (15-25 km)',
 'Subduction zone / Crustal thickness unknown',
 'Subduction zone / Continental crust (>25 km)',
 'Intraplate / Oceanic crust (< 15 km)',
 'Rift zone / Continental crust (>25 km)',
 nan,
 'Intraplate / Continental crust (>25 km)',
 'Rift zone / Intermediate crust (15-25 km)']

This tells us some useful information, including that some of the values are not a number (or 'nan' in Pandish). We can get rid of these using the method .dropna( ). While we are at it, we can delete the rows with no information on 'Volcanic Explosivity Index' data (Max VEI).

In [252]:
MergedVolcanoes.dropna(subset=['Tectonic Setting'],inplace=True)
MergedVolcanoes.head()
# inplace=True does the method 'in place' so we don't have to assign it to a new DataFrame
Out[252]:
Volcano Country Eruption Start Date Eruption Stop Date Max VEI Volcano Number Primary Volcano Type Activity Evidence Last Known Eruption Region Subregion Latitude Longitude Elevation (m) Dominant Rock Type Tectonic Setting date CE/BCE
0 Kuchinoerabujima Japan 2020 Jan 11 2020 Feb 13 (continuing) NaN 282050.0 Stratovolcano(es) Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.443 130.217 657.0 Andesite / Basaltic Andesite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
1 Semisopochnoi United States 2019 Dec 7 2020 Feb 16 (continuing) NaN 311060.0 Stratovolcano Eruption Observed 2020 CE Alaska Aleutian Islands 51.930 179.580 1221.0 Basalt / Picro-Basalt Subduction zone / Intermediate crust (15-25 km) 2020.0 CE
2 Nishinoshima Japan 2019 Dec 5 2020 Feb 15 (continuing) 1.0 284096.0 Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Izu, Volcano, and Mariana Islands 27.247 140.874 25.0 Andesite / Basaltic Andesite Subduction zone / Crustal thickness unknown 2020.0 CE
3 Kikai Japan 2019 Nov 2 2020 Feb 11 (continuing) NaN 282060.0 Caldera Eruption Observed 2020 CE Japan, Taiwan, Marianas Ryukyu Islands and Kyushu 30.793 130.305 704.0 Rhyolite Subduction zone / Oceanic crust (< 15 km) 2020.0 CE
4 Klyuchevskoy Russia 2019 Oct 24 2020 Feb 19 (continuing) NaN 300260.0 Stratovolcano Eruption Observed 2020 CE Kamchatka and Mainland Asia Kamchatka Peninsula 56.056 160.642 4754.0 Basalt / Picro-Basalt Subduction zone / Continental crust (>25 km) 2020.0 CE

.groupby( ) and .describe( )

Pandas has a couple more methods that might be useful for looking at the distribution of these data. These are the .groupby( ) and .describe( ) methods. We can use these methods to look at the typical volcano types type for each tectonic setting in our dataset.

.groupby( ) groups things in your DataFrame by unique values in a Series, for example grouping everything by 'Tectonic Setting'. .describe( ) summarizes some useful statistics. So if we wanted to know basic statistics for each tectonic setting (and who wouldn't?), we would do:

In [255]:
MergedVolcanoes.groupby('Tectonic Setting')['Primary Volcano Type'].describe()
Out[255]:
count unique top freq
Tectonic Setting
Intraplate / Continental crust (>25 km) 1 1 Stratovolcano 1
Intraplate / Oceanic crust (< 15 km) 2 2 Stratovolcano 1
Rift zone / Continental crust (>25 km) 2 2 Stratovolcano 1
Rift zone / Intermediate crust (15-25 km) 1 1 Shield 1
Subduction zone / Continental crust (>25 km) 25 4 Stratovolcano 15
Subduction zone / Crustal thickness unknown 3 3 Stratovolcano 1
Subduction zone / Intermediate crust (15-25 km) 2 1 Stratovolcano 2
Subduction zone / Oceanic crust (< 15 km) 7 4 Stratovolcano 3

This tells us that around of 15 active shallow subduction zone volcanoes of which 15 were stratovolcanoes.

Assignment #4

Create a new notebook with the name format: Lastname_Inital_HomeworkNumber. For example, Cych_B_HW_4

Create a markdown block in which you will describe what the notebook does.

1.

  • Write a lambda function that returns the square of an input parameter $x$
    • Use map( ) to generate a list of squares for a sequence with 10 values
    • Use filter( ) and the lambda function to generate a list of numbers whose squares are between 5 and 50.
    • Use a list comprehension to generate the same list with only one line of code.

2.

  • Create a Class, it should include at least 3 attributes and 3 methods. Be creative! Here are a few possibilities- Card, Deck, Planet, Phone Contact, Ocean, Student, Cellphone, Dog, Car
    • Save your class in a module, import the module into your notebook
    • Create 3 instances of your class, change the value of at least one attribute for one instance of your class
    • Call all three of your methods- you can use any of the instances of your class

3.

  • Read in the ActiveVolcanoes.xlsx and GVP_Volcano_List_Holocene.xls files.
    • Merge the two on the name of the volcano.
    • Delete all the rows with no Max VEI field. (VEI stands for volcanic explosive index.)
    • Group by tectonic setting and describe by the 'Max VEI' field. Which volcanic setting has the largest explosive index?
In [ ]: