This is part of Python for Geosciences notes.
================
set_printoptions(precision=3 , suppress= True) # this is just to make the output look better
I am going to use some real data as an example of array manipulations. This will be the AO index downloaded by wget through a system call (you have to be on Linux of course):
!wget www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
This is how data in the file look like (we again use system call for head command):
!head monthly.ao.index.b50.current.ascii
1950 1 -0.60310E-01 1950 2 0.62681E+00 1950 3 -0.81275E-02 1950 4 0.55510E+00 1950 5 0.71577E-01 1950 6 0.53857E+00 1950 7 -0.80248E+00 1950 8 -0.85101E+00 1950 9 0.35797E+00 1950 10 -0.37890E+00
Load data in to a variable:
ao = loadtxt('monthly.ao.index.b50.current.ascii')
ao
array([[ 1950. , 1. , -0.06 ], [ 1950. , 2. , 0.627], [ 1950. , 3. , -0.008], ..., [ 2013. , 7. , -0.011], [ 2013. , 8. , 0.154], [ 2013. , 9. , -0.461]])
ao.shape
(765, 3)
So it's a row-major order. Matlab and Fortran use column-major order for arrays.
type(ao)
numpy.ndarray
Numpy arrays are statically typed, which allow faster operations
ao.dtype
dtype('float64')
You can't assign value of different type to element of the numpy array:
ao[0,0] = 'Year'
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-12-5a47ddfa9232> in <module>() ----> 1 ao[0,0] = 'Year' ValueError: could not convert string to float: Year
Slicing works similarly to Matlab:
ao[0:5,:]
array([[ 1950. , 1. , -0.06 ], [ 1950. , 2. , 0.627], [ 1950. , 3. , -0.008], [ 1950. , 4. , 0.555], [ 1950. , 5. , 0.072]])
One can look at the data. This is done by matplotlib module and you have to start IPython with --pylab inline option to make it work:
plot(ao[:,2])
[<matplotlib.lines.Line2D at 0xa5d684c>]
In general it is similar to Matlab
First 12 elements of second column (months). Remember that indexing starts with 0:
ao[0:12,1]
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])
First raw:
ao[0,:]
array([ 1950. , 1. , -0.06])
We can create mask, selecting all raws where values in second raw (months) equals 10 (October):
mask = (ao[:,1]==10)
Here we apply this mask and show only first 5 rowd of the array:
ao[mask][:5,:]
array([[ 1950. , 10. , -0.379], [ 1951. , 10. , -0.213], [ 1952. , 10. , -0.437], [ 1953. , 10. , -0.194], [ 1954. , 10. , 0.513]])
You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:
ao[ao[:,1]==10][-5:,:]
array([[ 2008. , 10. , 1.676], [ 2009. , 10. , -1.54 ], [ 2010. , 10. , -0.467], [ 2011. , 10. , 0.8 ], [ 2012. , 10. , -1.514]])
You can combine conditions. In this case we select October-December data (only first 10 elements are shown):
ao[(ao[:,1]>=10)&(ao[:,1]<=12)][0:10,:]
array([[ 1950. , 10. , -0.379], [ 1950. , 11. , -0.515], [ 1950. , 12. , -1.928], [ 1951. , 10. , -0.213], [ 1951. , 11. , -0.069], [ 1951. , 12. , 1.987], [ 1952. , 10. , -0.437], [ 1952. , 11. , -1.891], [ 1952. , 12. , -1.827], [ 1953. , 10. , -0.194]])
Create example array from first 12 values of second column and perform some basic operations:
months = ao[0:12,1]
months
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])
months+10
array([ 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.])
months*20
array([ 20., 40., 60., 80., 100., 120., 140., 160., 180., 200., 220., 240.])
months*months
array([ 1., 4., 9., 16., 25., 36., 49., 64., 81., 100., 121., 144.])
Create ao_values that will contain onlu data values:
ao_values = ao[:,2]
Simple statistics:
ao_values.min()
-4.2656999999999998
ao_values.max()
3.4952999999999999
ao_values.mean()
-0.13462109949019607
ao_values.std()
1.0054168027600721
ao_values.sum()
-102.98514111
You can also use sum function:
sum(ao_values)
-102.98514111
One can make operations on the subsets:
mean(ao[ao[:,1]==1,2]) # January monthly mean
-0.40406150000000002
Result will be the same if we use method on our selected data:
ao[ao[:,1]==1,2].mean()
-0.40406150000000002
You can save your data as a text file
savetxt('ao_only_values.csv',ao[:, 2], fmt='%.4f')
Head of resulting file:
!head ao_only_values.csv
-0.0603 0.6268 -0.0081 0.5551 0.0716 0.5386 -0.8025 -0.8510 0.3580 -0.3789
You can also save it as binary:
f=open('ao_only_values.bin', 'w')
ao[:,2].tofile(f)
f.close()