In [2]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import Image
import pandas as pd
import numpy as np
```

Learn how to filter data with

**Pandas**Write a program to calculate the great circle distances between two known points.

Learn how to generate formatted strings for output.

Returning to our original seismogram from Lecture 8 (which we cleverly saved and I moved to the Figures folder):

In [3]:

```
Image(filename='Figures/seismogram.png') # let's pull this back up.
```

Out[3]:

And we can pick up where we left off by reading in the data which we already converted to minutes and saved in Lecture 8.

In [4]:

```
EQ=pd.read_csv('Datasets/minutes_velocity.csv')
EQ.head()
```

Out[4]:

Considering our seismic record, you know that the P wave arrives in the first few minutes of the record. To find the exact time, we can filter the data to look for the maximum (positive or negative) velocity between two time intervals, say 1 and 1.37 minutes, and take this as the P-wave arrival time.

To look only at the interval between two times, we first need to understand some more about **Pandas** data structures. Remember that the **DataFrames** are made up of columns of data (with the indices) and each Column is itself a **Series**.

We can access a particular **Series** in two ways. One is to use the **Series** name as a key (like in Dictionaries).

In [5]:

```
print (EQ['Minutes'].head())
# I'm using head, because this is a really long series!
```

OR, we can use the **Series** name as an attribute of the **DataFrame**:

In [6]:

```
print (EQ.Minutes.head())
```

As you can see below, both of these are identical and are *instances* of a Pandas **Series** class:

In [7]:

```
print (type(EQ['Minutes']))
print (type(EQ.Minutes))
```

Although I said that these two ways of accessing **Series** were identical, they are really only identical if there are no spaces in the column name. If you (like most Excel users) like to put spaces in your column names, then you can only use the first form.

Let's take a closer look at one of these **Series**:

In [8]:

```
EQ.Minutes.head()
```

Out[8]:

The EQ.Minutes **Series** are floating point numbers but have indices (starting with 0) so they are like arrays and lists, but are not the same thing. No worries - you can turn a **Series** into an array with this command (I'm just printing the first 10 elements...):

In [9]:

```
EQ_array=EQ.Minutes.values
print (EQ_array[0:10])
```

Or we can make it a list:

In [10]:

```
EQ_list=EQ.Minutes.tolist()
print (EQ_list[0:10])
```

You can filter the EQ **DataFrame** by placing conditions (which evaluate to **True** or **False**) on one of the **Series** (Minutes or Velocity).

The simplest syntax of a basic **Pandas** filter is:

**DataFrame.Series[condition]**

The "P wave arrival" is by definition, the first wave recorded that exceeds "noise". So we need to calculate what the "noise" level is by getting the maximum and minimum values in the first part of the record, say the first minute.

To find all the records where the series **EQ.Minutes** is less than 1, we need to set the condition to:

EQ.Minutes<1

Let's just see what that does:

In [11]:

```
EQ.Minutes<1
```

Out[11]:

So the **EQ.Minutes<1** condition returned another Pandas **Series** with the **EQ.Minutes Series** being evaluated for each element in the **Series**.

If we then use the returned set of booleans as the "key" in the DataFrame **EQ**, Pandas will return all the indices in the **DataFrame** for which the condition is **True**:

In [12]:

```
EQ[EQ.Minutes<1]
```

Out[12]:

Notice how this is the same as our masking and filtering efforts with **Numpy** arrays that you already saw.

Now we assign this new **DataFrame** to **Noise**:

In [13]:

```
Noise=EQ[EQ.Minutes<1]
print (Noise.head())
print (Noise.tail())
```

Here we created a NEW **DataFrame**, *Noise*, that is a subset of the original *EQ* and contains only the first minute worth of data.

There are many methods for Panda **Series**. Two of these are **max( )** and **min( )**. To get the maximum and minimum values of the velocity **Series** in the first minute, we just apply **max( )** and **min( )** to the **Series**:

In [14]:

```
NoiseMax=Noise.Velocity.max()
print ('maximum value: ',NoiseMax)
NoiseMin=Noise.Velocity.min()
print ('minimum value: ',NoiseMin)
```

Now we can find the first velocity that exceeds (in the absolute sense) **NoiseMax** in the **EQ** DataFrame.

To do this, we can find the first peaks or troughs in the Velocity Series that are outside the bounds of **NoiseMax** and **NoiseMin** respectively. So, here are all the peaks and troughs in the data series that exceed the noise bounds:

In [15]:

```
Peaks=EQ[EQ.Velocity>NoiseMax]
print (Peaks.head())
Troughs=EQ[EQ.Velocity<NoiseMin]
print (Troughs.head())
```

We can get these both in one go by using the Pandas "or" trick, which has the general syntax of:

**DataFrame.Series[(condition_1) | (condition_2) ]**

where the '|' stands for 'or'.

While we are on the topic, other types of conditions, including 'and' and 'not'. 'and' is represented by a '&':

**DataFrame.Series[(condition_1) & (condition_2) ]**

which requires both conditions to be **True**, and "and not" is a '& ~':

**DataFrame.Series[(condition_1) & ~(condition_2) ]**

which requires condition_1 to be **True** and condition_2 to be **False**.

Anyway, we can get a single data frame with all the values exceeding bounds (both negative and positive) this way:

In [16]:

```
PeaksandTroughs=EQ[(EQ.Velocity>NoiseMax)|(EQ.Velocity<NoiseMin)]
print (PeaksandTroughs.head())
```

So the first Velocity that exceeds the noise is a peak that arrives at 1.025 minutes.

We can make a variable named **PwaveArrival** by converting the **Series** *PeaksandTroughs.Minutes* to an array (by using the **.values** method on the **Series**) and then selecting the first value in this array.

In [17]:

```
PwaveArrival=PeaksandTroughs.Minutes.values[0]
PwaveArrival
```

Out[17]:

In [18]:

```
Image(filename='Figures/seismogram.png') # let's pull this back up.
```

Out[18]:

After the P wave arrival, the earthquake rumbles along for a few minutes, so we need to re-characterize the noise floor, say between 5 and 10 minutes. Let's we repeat the noise exercise for the period between 5 and 10 minutes. In this case, we have TWO conditions: the first being Minutes$>$5 and the second being Minutes$<$10.

To impose TWO conditions, we can combine each (enclosed in parentheses) with '&':

In [19]:

```
Noise2=EQ[(EQ.Minutes >5) & (EQ.Minutes<10)]
print (Noise2.head()) # see, we only have data after 5 minutes
print (Noise2.tail()) # and before 10 minutes.
```

Following the same logic as before, we can find the bounds for the noise:

In [20]:

```
Noise2Max=Noise2.Velocity.max()
Noise2Min=Noise2.Velocity.min()
print (Noise2Max,Noise2Min)
```

To find the new S wave arrival, we find the peaks and troughs that exceed the maximum and minimum values in the time between 5 and 10 minutes. First we need to filter for data after the first 10 minutes.

In [21]:

```
PostP=EQ[EQ.Minutes>10]
```

Now we can use our handy 'or' operator '|' like this:

In [22]:

```
PeaksandTroughs2=PostP[(PostP.Velocity>Noise2Max)|(PostP.Velocity<Noise2Min)]
print(PeaksandTroughs2.head())
```

The S wave arrives as a trough at about 11.7 minutes into the earthquake.

Let's retrieve that value like we did before.

In [23]:

```
SwaveArrival=PeaksandTroughs2.Minutes.values[0]
SwaveArrival
```

Out[23]:

So the time delay between P and S wave arrivals can be calculated by the difference in their arrival times:

In [24]:

```
delayTime= SwaveArrival-PwaveArrival
delayTime
```

Out[24]:

From this we see that there was a 10.7 minute delay.

If you have already taken geophysics, you probably know that seismologists have figured out the velocity of various seismic waves as a function of depth in the Earth. From this model, there is a basic look up table that specifies the delay times as a function of distance (expressed as an angle) between the source and the receiver. This model is in the file **DeltaTimeData.csv** that we already looked at, munched on and saved in Lecture 8.

Let's take another look at this file:

In [25]:

```
DeltaTimeData=pd.read_csv('Datasets/DeltaTimeData.csv') # read this back in
DeltaTimeData.head()
```

Out[25]:

So, how many degrees away from the source is the reciever based on the DeltaTimeData data from the Reference Earth Model?

Here is the plot from our reference model (again).

In [26]:

```
Image(filename='Figures/TravelTime.png')
```

Out[26]:

We can bracket the angular distance this represents by filtering the **DeltaTimeData** DataFrame for those records within, say, .2 minutes of our delay time:

In [27]:

```
find_delta=DeltaTimeData[
(DeltaTimeData.SP_decimal_minutes>delayTime-.2) &
(DeltaTimeData.SP_decimal_minutes<delayTime+.2)]
find_delta # find_delta has all the records with SP_min less than 10
```

Out[27]:

So according the model, the earthquake occurred around 89 degrees away from the reciever.

We know the location of both the earthquake and the reciever, so we could calculate the actual separation. Let's do that now.

The earthquake occured in Chile and was recorded at Pinyon Flats (near us).

Aside: we will learn how to generate this map in the coming lectures!

In [28]:

```
Image(filename='Figures/greatCirc.png')
```

Out[28]:

First, we must calculate the angular distance (great circle distance in arc length - not kilometers) for the two points, B and C.

We'll need to use *spherical trigonometry* which is very useful in Earth Science because we live on a sphere.

In [29]:

```
Image(filename='Figures/strig.png',width=500,height=500)
```

Out[29]:

[Figure from Tauxe et al., 2010, Essentials of Paleomagnetism; https://earthref.org/MagIC/books/Tauxe/Essentials/WebBook3ap1.html#x20-211000A.3 ]

One of the laws of spherical trigonometry is the Law of Cosines:

$\cos a = \cos b \cos c + \sin b \sin c \cos \alpha$.

In our case, $a$ is the great circle distance between points on the globe, $B$ and $C$. Well, well, well, that could come in handy here. So to get $a$ we need to know $b, c$ and $\alpha$. $b$ and $c$ are the co-latitudes (90-latitude) of points B and C, respectively, and $\alpha$ is the difference in the two longitudes.

Now we have a formula, we can write a function to calculate the great circle distance from the latitudes and longitudes of the two points (lat_1,lon_1,lat_2,lon_2).

In [1]:

```
def great_circle(lat_1,lon_1,lat_2,lon_2):
"""
calculates great circle distance between two points on a globe
Parameters:
lat_1 : float
latitude of first point
lon_1 : float
longitude of first point
lat_2 : float
latitude of second point
lon_2 : float
longitude of second point
Returns:
a : float
great circle distance in degrees
"""
# first we have to convert the latitudes to colatitudes:
colat_1,colat_2=90.-lat_1,90.-lat_2
# and alpha is the difference betwee the two longitudes
alpha=lon_2-lon_1
# Then lets make life easy on us and convert degrees to radians
colat_1,colat_2,alpha= np.radians(colat_1),\
np.radians(colat_2),np.radians(alpha)# continued line from above
# from spherical trig we know that:
cosa=np.cos(colat_1)*np.cos(colat_2)+np.sin(colat_1)*np.sin(colat_2)*np.cos(alpha)
# solve for a
a=np.arccos(cosa)# take the arc cosine of cosa
# remember to convert back to degrees!
return np.degrees(a) # return the great circle distance in degrees.
```

In [31]:

```
PF_lat,PF_lon=33.3,-115.7
EQ_lat,EQ_lon=-43.42,-73.95
Delta=great_circle(PF_lat,PF_lon,EQ_lat,EQ_lon)
print ('Degrees separation between source and receiver: ',Delta)
```

Earlier, we guessed that it would be about 89 degrees. So this is pretty close given our quick and dirty guesstimate.

Now for a word about formatting strings. Notice how the output of the above print statement printed out all the decimal places. We can do better!
To show only the first decimal place we can use *string formatting*.

The structure of a formatting statement is:

**'%FMT'%(DATA)**,

where **FMT** is a 'format string' and **DATA** is the variable name whose value we want to format. Here is an example in which **FMT** is:

**3.1f**.

The first number (3) is the number of characters in the output. The second number (1) is the number of characters AFTER the decimal place. The 'f' means that **DATA** is a floating point variable.

Other format strings include: %s for a string, %i for an integer, %e for 'scientific notation'.

In [42]:

```
print ('no formatting: ',Delta) # no formatting
print ('formatted: ','%3.1f'%(Delta)) # with formatting
# or can use round(Delta,1)
print ('rounded: ',round(Delta,1))
```

Recall from Lecture 2:

In [43]:

```
number=1 # an integer
Number=1.0 # a floating point
NUMBER='1' # a string
```

we can use the formatted string trick on them like this:

In [44]:

```
print ('%i'%(number))
print ('%f'%(Number))
print ('%3.1f'%(Number))
print ('%s'%(NUMBER))
```

Or for a bigger number:

In [45]:

```
bignum=np.pi*1000
print (bignum)
print ('%6.5e'%(bignum))
print ('%3.2e'%(bignum))
```

- Make a notebook and change the name of the notebook to: YourLastName_Initial_HW_03 (for example, Cych_B_HW_03)
- Go to the IRIS website (US's national earthquake database, http://ds.iris.edu/ieb/index.html?format=text&nodata=404&starttime=1970-01-01&endtime=2025-01-01&minmag=0&maxmag=10&mindepth=0&maxdepth=900&orderby=time-desc&limit=1000&maxlat=89.18&minlat=-89.18&maxlon=180.00&minlon=-180.00&zm=1&mt=ter) and search for the last 10 earthquakes (set Max quakes to 10) and Priority to Newest). Click on "Apply". Then click on 'Download as Excel'. Upload the .csv file to the datahub site or to a directory on your computer with your notebook in it.
- Read this file into a Pandas DataFrame.
- Extract the latitudes, longitudes and depths as a
**NumPy**array. - Make an XY plot with longitude on the horizontal axis and latitudes on the Y axis. Use small red triangles as the symbols.
- Label your axes.
- Save your figure as a .png file
- write a module that has a great circle function in it that returns a value for the distance rounded to the nearest decimal. save your module to be re-used later.

Your code must be fully commented, the notebook must have the correct name and it must be turned in to Canvas before the deadline.

To turn it in, you must include the datafile you downloaded in a zip file with your solution notebook.

In [ ]:

```
```