In [4]:
# Pay no attention to this cell
# All will be revealed in due time.
import pandas as pd
import os
from pathlib import Path
from IPython.display import Image
syllabus=pd.read_csv('Datasets/syllabus_2020.csv',header=0)
syllabus=syllabus.fillna("")
syllabus.index = range(1,len(syllabus)+1)

Python Programming for Earth Science Students

Authors: Lisa Tauxe, [email protected], Hanna Asefaw, [email protected], & Brendan Cych, [email protected] Instructor: Lisa Tauxe, [email protected] TAs: Brendan Cych, [email protected], Shelby Jones, [email protected]

Computers in Earth Science

Computers are essential to all modern Earth Science research. We use them for compiling and analyzing data, preparing illustrations like maps or data plots, writing manuscripts, and so on. In this class, you will learn to write computer programs with special applications useful to Earth Scientists. We will learn Python, an object-oriented programming language, and use Jupyter notebooks to write our Python programs.

Python

So, why learn Python? Because it is:

  • Flexible, freely available, cross platform
  • Easier to learn than many other languages
  • It has many numerical, statistical and visualization packages
  • It is well supported and has lots of online documentation
  • The name 'Python' refers to 'Monty Python' - not the snake - and many examples in the Python documentation use jokes from the old Monty Python skits. If you have never heard of Monty Python, look it up on youtube; you are in for a treat.

Which Python?

  • Python underwent a transition from 2.7 to 3. The notebooks in this class, apart from a few exceptions, are compatible with both but they have only been tested on Python 3, so that is what you should be using.
  • If you decide to use a personal computer, we recommend that you install the most recent version of Anaconda python for your operating system: https://www.anaconda.com/download/ you will also need a few extra packages (cartopy, version 0.17.0 geopandas, version 0.7.0 and descartes, version 1.1.0) which can be installed with little hassle.
In [2]:
syllabus[['Topic','Date','Assignment']]
Out[2]:
Topic Date Assignment
1 Intro to notebooks, file systems and paths 30-Mar
2 Variables and Operations 1-Apr
3 Data structures 3-Apr #1
4 Dictionaries, program loops (if, while and for) 6-Apr
5 functions and modules 8-Apr
6 NumPy and matplotlib 10-Apr #2
7 NumPy arrays 13-Apr
8 Pandas, file I/O 15-Apr
9 data wrangling with Pandas 17-Apr #3
10 object oriented programming 20-Apr
11 lambda, map, filter reduce, list comprehension 22-Apr
12 Pandas filtering and exceptions 24-Apr #4
13 subplots, bar charts pie charts 27-Apr
14 histograms and cumulative distribution functions 29-Apr
15 statistics 101 1-May #5
16 line and curve fitting 4-May
17 visualization with seaborn 6-May
18 maps 8-May Project Proposal AND #6
19 gridding and contouring 11-May
20 geopandas 13-May
21 rose diagrams and equal area projections 15-May #7
22 matrix math - dot and cross products 18-May
23 plotting great and small circles 20-May
24 Machine Learning- Clustering 22-May #8
25 Memorial Day 25-May
26 Machine Learning- Classification, PCA 27-May
27 3D plots of points and surfaces 29-May #9
28 Time series - periodograms 6/1/20
29 Animations 6/3/20
30 6/5/20
31 Final Presentations 6/11/20

Lecture 1

Now we get down to business. In this lecture we will:

  • Learn to find your command line interface.
  • Learn how to launch a Jupyter notebook from the command line interface
  • Learn basic notebook anatomy.
  • Learn some basic python operating system commands
  • Learn about the concept of PATH
  • Turn in your first practice problem notebook.

Jupyter notebooks and Jupyter Hub

This class is entirely structured around a special programming environment called Jupyter notebooks. A Jupyter notebook is a development environment where you can write, debug, and execute your programs.

If you are taking this class through UCSD, you will be using the Jupyter Hub site. When working on practice problems, rename the practice problem notebooks by going to File > Rename. Once finished, save and go to File > Download As > Notebook (.ipynb) and upload this notebook to canvas. If you don't want to install Python on your computer, skip to the 'Jupyter Notebook Anatomy' section

Alternatively, you can install Anaconda Python on your machine (see below) and work on the lectures. If you want to be able to open and use notebooks after the class is over, you should do this.

If you are using the version cloned from github you already have everything. Some of the lectures might be updated in the future though, so the version you have may not be final.

OPTIONAL: Installing Anaconda Python and Opening Jupyter Notebooks

To install Anaconda Python, go to https://anaconda.org/ and follow the install instructions for your operating system. To do this, you will need to discover the hidden secret of your computer, the Terminal window. This little window provides a command line interface in which you can type commands to the operating system. You can find the terminal window through the program Terminal on a Mac by typing terminal.app into the search icon and double clicking on it. On Windows, use the start menu to search for the program 'Anaconda Prompt'. On Linux, press Ctrl+Alt+T to open the terminal.

In [3]:
Image(filename='Figures/terminal_mac.png')
Out[3]:

Let's open a terminal window and launch jupyter notebook. On PC, Mac or Linux, you can do this by typing

jupyter notebook and hitting return

In [4]:
Image(filename='Figures/terminal.png',width=500)
Out[4]:

When you fire up a terminal window, you are by default in your home directory (in MacOS UNIX, that would be /Users/YOURUSERNAME).

To launch a Jupyter notebook, simply type jupyter notebook as shown above. That will open up a Browser window. Find your class folder and click on Lecture_01.ipynb. You should now be looking at this notebook!

Make a copy of the lecture

You should not modify this lecture, or if you do it is quite likely that it will be over-written if you update your directory with a new version. To do that, open the File menu at the top of this page:

In [15]:
Image(filename='Figures/copy.png',width=600)
Out[15]:

Choose 'Make a Copy'. This will protect the original and you can goof around with this one all you like. But first, you need to know a few things about jupyter notebooks.

Jupyter notebook anatomy

Jupyter notebooks have two basic cells:

You can insert a new cell by selecting Insert Cell Below in the drop-down menu:

In [16]:
Image(filename='Figures/insertCell.png',width=600)
Out[16]:

Cell types default to 'Code' but you change the cell type to "Markdown" with the box labeled 'Markdown" on the menu bar. Click on the little downward arrow to change this cell to Code. Be sure to change it back!

You "execute" a cell (either typeset or run the code) by clicking on the run key (sideways triangle with vertical line) or select Run Cells under the Cell drop-down menu.

In [17]:
Image(filename='Figures/menuBar.png',width=600)
Out[17]:

In a code block, you can only type valid python statements EXCEPT after a pound sign (#) - everything after that will be ignored.
That is how you write "comments" in your code to remind yourself or tell others what you were thinking:

In [7]:
# I can type anything here
but not here
  File "<ipython-input-7-bee698e92c8a>", line 2
    but not here
               ^
SyntaxError: invalid syntax

That was an example of a bug which oculd be fixed by commenting out the second line, or making it a valid statement:

In [52]:
# I can type anything here
# but not here
print ("but not here")
but not here

Practice Problems

Now open the notebook called Lecture_01_Practice_Problems. To open it, click on "File" and select "Open", then if file called Lecture_01_Practice_Problems.ipynb is visible, just click on it. But you are using the datahub or github versions of the class (most of you), all the Practice Problems are in a folder called "Practice_Problems". Click on that icon, then open the Lecture_01_Practice_Problems notebook.

Complete the first three tasks. Then come back to this notebook.

Congratulations! You just wrote your first Python program.

Basic operating system commands

Now we will discuss file systems, paths, and the command line. Why? Because whenever you import an image, document, or spreadsheet into the Jupyter notebook you have to tell Jupyter where in the computer the file is located. Moreover, there are many command line functions that come in handy. For example, you can look at the first few lines of a file before you import it into the notebook. You could also write all of your programs in a text editor and run those programs from the command line. You could then run your programs from anywhere on your computer instead of a jupyter notebook. We will do that in Lecture 23, for example.

File systems

The organization of computers is based on a file system. The file system is hierarchical, so at the top you'll find the root directory or for Mac and PC users, a folder. The root directory contains files and other folders which may also contain files and folders and etc. This continues, resulting in a tree of files and folders that make up the file system. The following figure is an example of a computer's file system:

In [53]:
Image(filename='Figures/FileSystem.jpg')
Out[53]:

You are probably familiar with the images like that to the left. The text to the right shows the exact same thing - but from your computer's viewpoint. Both the image to the left and the text to the right show you how to access the folder "Desktop". On the left, you access the folder "Desktop" by clicking on 'icons' that represent different folders and sub-folders until you arrive at "Desktop". Later in this lecture, we'll show you how to access the same folder using its path (the text to the right).

Survival operating system commands

Macs and PCs both have functions that can be called from a command line, such as listing the contents of a folder or file, creating new folders, changing permissions on files or folders, combining the contents of files, moving files and folders around, and so on. These commands are directed to the operating system instead of the Python interpreter. To make these actions independent of your particular operating system, python has a built-in tool kit called "os" for operating system. We imported this in the first cell and will now figure out how to use this.

Let's learn our first operating system command, which lists the contents of a directory, os.listdir(). This returns a list (not in any particular order) of all the things in the directory containing this notebook:

In [12]:
os.listdir()
Out[12]:
['.DS_Store',
 'Datasets',
 'Lecture_01.ipynb',
 'Figures',
 '.ipynb_checkpoints',
 'Lecture_01_Practice_Problems.ipynb']

You can ignore anything with a '.' in front of it (.DS_Store and .ipynb_checkpoints in this example.)

Another useful command is os.mkdir() which creates a new directory. Please note that directory means the same thing as folder. It is just that in a graphical operating system with icons, the term folder makes sense. They look like folders. Whereas to the operating system, they are traditionally referred to as directories. Never mind!

In [15]:
os.mkdir('MYNEWDIRECTORY')

To see if that worked, list the contents again:

In [16]:
os.listdir()
Out[16]:
['MYNEWDIRECTORY',
 '.DS_Store',
 'Datasets',
 'Lecture_01.ipynb',
 'Figures',
 '.ipynb_checkpoints',
 'Lecture_01_Practice_Problems.ipynb']

And sure enough, there it is. The command os.rmdir() deletes a directory

In [17]:
os.rmdir('MYNEWDIRECTORY')

Make sure it was removed:

In [18]:
os.listdir()
Out[18]:
['.DS_Store',
 'Datasets',
 'Lecture_01.ipynb',
 'Figures',
 '.ipynb_checkpoints',
 'Lecture_01_Practice_Problems.ipynb']

Yup. It's gone.

Another handy thing is to view the contents of a file. To do this in python, we use the command open( ).readlines( ). This will spit the contents out for your viewing pleasure.

In [37]:
open('Datasets/myfile.txt').readlines()
Out[37]:
['Hi there students! Thanks for joining this class!\n']
In [38]:
contents=open('Datasets/myfile.txt').read()
output=open('newfile.txt','w') # open a file for writing
output.write(contents) # write the contents
output.close() # close the file

So what did we create? We created a copy of myfile.txt called newfile.txt. If you repeat the command, you will overwrite the existing output file.

In [39]:
open('newfile.txt').readlines()
Out[39]:
['Hi there students! Thanks for joining this class!\n']

To append to the end of a file, we use the 'a' argument instead of 'w' in the open( ) command

In [40]:
output=open('newfile.txt','a') # open a file for writing
output.write(contents) # write the contents
output.close() # close the file
In [41]:
open('newfile.txt').readlines()
Out[41]:
['Hi there students! Thanks for joining this class!\n',
 'Hi there students! Thanks for joining this class!\n']

To delete a file (analogous to deleting a directory), we use the command os.remove( ).

In [42]:
os.remove('newfile.txt')

Concept of path

So far, we have just looked at directories in our working directory (the one with this notebook in it) and subdirectories within the working directory. Earlier in the lecture you were shown a figure with icons on the left and text on the right. The text to the right was a series of directories separated by '/'. These are the paths to those files. A path is the unique location of a file or a directory in a file system of an OS.

Now that you know more about paths, let's take a detour and learn how to embed figures directly into a Jupyter notebook. You saw this in several lectures, but were told to ignore it. The Image class in the module Ipython.display allows us to embed many digital image types (png, jpg...) into a Jupyter notebook. If you take a look at the first cell of this lecture, we have already imported Image from Ipython.display.

If you want to display a figure, you will use Image and the path to the figure. The path to the figure we want to display is "Figures/FileSystem.jpg". This tells the operating system to find the folder labeled "Figures" and then grab the file inside that is labeled "FileSystem.jpg". This is a relative path because the location is with respect to the directory that the notebook is in.

In [21]:
Image(filename='Figures/FileSystem.jpg') 
Out[21]:

The paths in this figure are absolute paths which uniquely define the location of the file or directory from anywhere on the computer. The relative paths are handy short cuts. For example, we can refer to a directory above the current directory without knowing what that is necessarily, we use these conventions:

./ is the current directory

../ is the one above

../../ is the one above that

and so on.

Instead of using 'relative' directories, it is often desirable to refer to directories in an absolute sense, i.e., relative to the root directory '/'.

To find out what the absolute path for your current directory, use os.getcwd( ) to get the current working directory:

In [54]:
os.getcwd()
Out[54]:
'/Users/ltauxe/Dropbox/4Cych/SIO113_2020/Python_for_Earth_Science_Students/Lectures/Lecture_01'

To find the path to your home directory, we use another Python command, Path.home(). to use this, we sneaklily already imported the toolkit Path in the first cell of this notebook so we are allset (your results will vary).

In [56]:
Path.home()
Out[56]:
PosixPath('/Users/ltauxe')

And use that in the os.listdir( ) command to get a listing of our home directory:

In [57]:
os.listdir(Path.home())
Out[57]:
['TeXShop',
 'tmp.txt',
 '.config',
 'Music',
 '.meteor',
 'Cubit-13.0',
 '.parallels',
 '.condarc',
 'ccoptions.cfg',
 '.tmptmp.swp',
 '.tcshrc.macports-saved_2013-01-26_at_13:39:56',
 '.canopy_runtimes.json',
 '.anyconnect',
 'Untitled1.ipynb',
 '.vim',
 '.inkscape-etc',
 '.DS_Store',
 '.serverauth.6895',
 '.JxBrowser',
 '.pydistutils.cfg',
 'VirtualBox VMs',
 '.gmtcommands',
 '.CFUserTextEncoding',
 '.hgignore_global',
 '.meteorsession',
 'bin',
 'MagIC',
 '.profile-anaconda3.bak',
 'Python',
 '.profile-anaconda.bak',
 '.iprint',
 '.subversion',
 '.serverauth.757',
 '.bashrc',
 'Meetings_2020',
 'Untitled.ipynb',
 '.Sites',
 'fwoptions.cfg',
 '.adobe',
 '.ltauxe_HD_Quality.txt',
 '.gdb_history',
 '.plotly',
 '.local',
 'Creative Cloud Files',
 'Pictures',
 '.profile.save',
 '.python_history-02466.tmp',
 '.fontconfig',
 '.tcshrc.macports-saved_2011-05-16_at_17:34:03',
 '.texlive2008',
 '.fonts.cache-1',
 'webpasswords',
 '.tcshrc.macports-saved_2011-05-16_at_17:44:34',
 'thellier_GUI.log',
 '.gnome2',
 'tmp.py',
 '.recently-used.xbel',
 '.ipython',
 'Desktop',
 'Library',
 '.matplotlib',
 '.tgpskey',
 '.lesshst',
 '.GMT_bb_info',
 '.datathief',
 '.tcshrc.pysave',
 '.novell',
 '.enstaller4rc',
 'seaborn-data',
 '.claro',
 '.emacs.d',
 '.geomapapp-home',
 '.SIOExplorer',
 '.gitignore_global',
 '.pybld_start',
 '.python_history-13602.tmp',
 '.APKey.plist',
 '.cups',
 'Sites',
 '.bash_sessions',
 'Programs',
 'PmagPy',
 'Google Drive',
 'MultiDrive',
 '.matlab',
 'enthought',
 'Public',
 'logs',
 '.dropbox',
 '.idlerc',
 '.tcshrc',
 '.cisco',
 '.serverauth.29680',
 '.wapi',
 '.sh_history',
 'personal_stuff',
 '.cshrc',
 'profiles.bin',
 'tmp1',
 'gha.py',
 '.anaconda',
 '.GMA',
 '.theano',
 '.serverauth.37770',
 '.ssh',
 'Movies',
 'Applications',
 '.profile',
 'Dropbox',
 '.serverauth.24792',
 'Pdfs',
 '.gmtcommands4',
 '.Trash',
 '.zoomus',
 '.ipynb_checkpoints',
 '.jupyter',
 '.serverauth.81559',
 'SpareRoom',
 'Documents',
 '.parallels_settings',
 '.Xcode',
 '.profile.pysave',
 '.ttffont.cache',
 '.TemporaryItems',
 'log4j',
 '.bash_profile',
 '.Xauthority',
 'anaconda3',
 'Downloads',
 '.python_history',
 'reprints',
 'tmp',
 '.continuum',
 '.cache',
 '.serverauth.8051',
 '.gitconfig',
 '.pypirc',
 '.python_history-37074.tmp',
 '.serverauth.10853',
 '.putty',
 '.bash_history',
 '.viminfo',
 '.enthought-old',
 '.dodsrc',
 '.tmp.swo',
 '.python_history-85820.tmp',
 '.conda',
 '.canopy',
 '.matplotlib.bak',
 'AnacondaProjects',
 '.enthought',
 'src']

I guess I should clean up my Desktop!

To move change the name of a directory to another name, use the command os.rename( ) (for change directory).

In [63]:
os.mkdir('TEMP')
print (os.listdir())
os.rename('TEMP','NEWTEMP')
print (os.listdir())
os.rmdir('NEWTEMP')
print (os.listdir())
['TEMP', '.DS_Store', 'Datasets', 'Lecture_01.ipynb', 'Figures', '.ipynb_checkpoints', 'Lecture_01_Practice_Problems.ipynb']
['.DS_Store', 'Datasets', 'Lecture_01.ipynb', 'Figures', '.ipynb_checkpoints', 'Lecture_01_Practice_Problems.ipynb', 'NEWTEMP']
['.DS_Store', 'Datasets', 'Lecture_01.ipynb', 'Figures', '.ipynb_checkpoints', 'Lecture_01_Practice_Problems.ipynb']

Command line python scripts

As mentioned in the beginning of the lecture, you can run all the little programs you have been (and will be) writing, directly from the command line. Here's one way to do that that uses one of the many ["magic" commands] (https://ipython.readthedocs.io/en/stable/interactive/magics.html#cell-magics) that work with Jupyter notebooks. Our first is:

%%writefile PATH_TO_FILE.py

which writes the contents of a cell to the specified text file.

Running this cell will place the contents of it (without the magic command) into a file in this directory called hello.py.

In [64]:
%%writefile ./hello.py
print ("Hello World!")
Writing ./hello.py

Now you can run the program from your command line (after navigating to this directory) by typing:

$ python hello.py

or from within this notebook:

In [65]:
!python hello.py
Hello World!

Alternatively, you can use a different magic command: %run to execute an external file:

In [66]:
%run hello.py
Hello World!

The last thing you have to worry about is that the directory containing the script must be in your PATH. We have been talking about paths (all lower case), but PATH is an "environment variable". So to run a program it must be in your PATH. And to run a Python program from anywhere, it must be in your PYTHONPATH.

You can find out what your PATH is by using the program os.environ[PATH]

In [78]:
os.environ['PATH'] # your results will vary!
Out[78]:
'/Users/ltauxe/anaconda3/bin:/Users/ltauxe/anaconda3/bin:/usr/local/Cellar/cmake/3.9.0/bin:/Users/ltauxe/Programs:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/texbin:/opt/X11/bin::/Users/ltauxe/PmagPy/programs/__pycache__/:/Users/ltauxe/PmagPy/programs/conversion_scripts/:/Users/ltauxe/PmagPy/programs/conversion_scripts2/:/Users/ltauxe/PmagPy/programs/deprecated/:/Users/ltauxe/PmagPy/programs/images/:/Users/ltauxe/PmagPy/programs/:/Applications/GMT-5.3.1.app/Contents/Resources/bin'

By default, your working directory will not be in your path (some security reason), so to run a script that is in your working directory, you must either put it in your PATH (not recommended) or use the full path name or the relative path name, e.g.,

./hello.py

Changing your PATH depends a lot on your particular operating system and is beyond the scope of this lecture.

In [80]:
#clean up a bit
os.remove('hello.py')