Indexing is how we pull individual data items out of an array. Slicing extends this process to pulling out a regular set of the items.
# Convention for import to get shortened namespace
import numpy as np
# Create an array for testing
a = np.arange(12).reshape(3, 4)
a
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
Indexing in Python is 0-based, so the command below looks for the 2nd item along the first dimension (row) and the 3rd along the second dimension (column).
a[1, 2]
6
Can also just index on one dimension
a[2]
array([ 8, 9, 10, 11])
Negative indices are also allowed, which permit indexing relative to the end of the array.
a[0, -1]
3
Slicing syntax is written as start:stop[:step]
, where all numbers are optional.
It should be noted that end represents one past the last item; one can also think of it as a half open interval: [start, end)
# Get the 2nd and 3rd rows
a[1:3]
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11]])
# All rows and 3rd column
a[:, 2]
array([ 2, 6, 10])
# ... can be used to replace one or more full slices
a[..., 2]
array([ 2, 6, 10])
# Slice every other row
a[::2]
array([[ 0, 1, 2, 3], [ 8, 9, 10, 11]])
# Slice out every other column
a[:, ::2]
array([[ 0, 2], [ 4, 6], [ 8, 10]])
# Slice every other item along each dimension -- how would we do this
# Import MetPy's units registry
from metpy.units import units
length = 8 * units.feet
print(length * length)
64 foot ** 2
distance = 10 * units.mile
time = 15 * units.minute
avg_speed = distance / time
print(avg_speed)
print(avg_speed.to_base_units())
print(avg_speed.to('mph'))
0.6666666666666666 mile / minute 17.8816 meter / second 40.0 mph
np.random.seed(19990503) # So we all have the same data
u = np.random.randint(0, 45, 10) * units('m/s')
v = np.random.randint(0, 45, 10) * units('m/s')
print(u)
print(v)
[ 14. 2. 44. 37. 35. 37. 8. 25. 22. 10.] meter / second [ 23. 27. 5. 0. 38. 23. 27. 8. 8. 40.] meter / second
import metpy.calc as mpcalc
speed = mpcalc.get_wind_speed(u, v)
direction = mpcalc.get_wind_dir(u, v)
print(speed)
print(np.rad2deg(direction))
[ 26.92582404 27.07397274 44.28317965 37. 51.66236541 43.56604182 28.16025568 26.2488095 23.40939982 41.23105626] meter / second [ 211.32869287 184.2363948 263.51692631 270. 222.64670313 238.13402231 196.50436138 252.25532837 250.01689348 194.03624347] degree
print(np.mean(speed))
34.95609049170319 meter / second
print(np.mean(np.rad2deg(direction)))
print(np.std(np.rad2deg(direction)))
228.2675566116978 degree 29.27089778050201 degree
Let's use MetPy to calculate the dewpoint from the current temperature and relative humidity:
import metpy.calc as mpcalc
mpcalc.dewpoint_rh(25 * units.degC, 0.75)
Thanks to units, this can work with Fahrenheit as well:
td = mpcalc.dewpoint_rh(77 * units.degF, 0.75)
td
And you can get it back in Fahrenheit as:
td.to('degF')
MetPy also has a library of useful constants, similar to those in scipy.constants
, that are important for meteorology and have appropriate dimensionality included:
import metpy.constants as consts
We can look at the docstring for the module (or go to the web documentation) to see a list of the available contants:
consts?
So for the density of liquid water (nominally at 0C), we can use:
consts.density_water
Or for a more symbolic and shorter notation, you can also use:
consts.rho_l
# Create some synthetic data representing temperature and wind speed data
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
50 + 2 * np.random.randn(100)) * units.degC
spd = (np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) +
10 + 5 * np.random.randn(100))) * units('m/s')
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temp.m, 'tab:red')
plt.plot(spd.m, 'tab:blue');
By doing a comparision between a NumPy array and a value, we get an array of values representing the results of the comparison between each element and the value
temp > 45 * units.degC
array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True], dtype=bool)
We can take the resulting array and use this to index into the NumPy array and retrieve the values where the result was true
print(temp[temp > 45 * units.degC])
[ 69.89825854 71.52313905 69.90028363 66.73828667 66.77980233 72.91468564 69.34603239 69.09533591 68.27350814 64.33916721 67.49497791 67.05282372 63.51829518 63.54034678 65.46576463 62.99683836 59.27662304 61.29361272 60.51641586 57.46048995 55.19793004 53.07572989 54.47998158 53.09552107 54.59037269 47.84272747 49.1435589 45.87151534 45.11976794 45.009292 46.36021141 46.87557425 47.25668992 50.09599544 53.77789358 50.24073197 54.07629059 51.95065202 55.84827794 57.56967086 57.19572063 61.67658285 56.51474577 59.72166924 62.99403256 63.57569453 64.05984232 60.88258643 65.37759899 63.94115754 65.53070256 67.15175649 66.26468701 67.03811793 69.17773618 69.83571708 70.99586742 66.34971928 67.49905207 69.83593609] degC
So long as the size of the boolean array matches the data, the boolean array can come from anywhere
print(temp[spd > 10 * units('m/s')])
[ 66.73828667 66.77980233 69.34603239 69.09533591 68.27350814 64.33916721 67.49497791 67.05282372 63.51829518 63.54034678 65.46576463 62.99683836 59.27662304 61.29361272 60.51641586 57.46048995 55.19793004 53.07572989 54.47998158 53.09552107 54.59037269 47.84272747 49.1435589 45.87151534 43.95971516 42.72814762 42.45316175 39.2797517 40.23351938 36.77179678 34.43329229 31.42277612 38.97505745 34.10549575 35.70826448 29.01276068 30.31180935 29.31602671 32.84580454 30.76695309 29.11344716 30.16652571 29.91513049 39.51784389 69.17773618 69.83571708 69.83593609] degC
# Make a copy so we don't modify the original data
temp2 = temp.copy()
# Replace all places where spd is <10 with NaN (not a number) so matplotlib skips it
temp2[spd < 10 * units('m/s')] = np.nan * units.degC
plt.plot(temp2, 'tab:red')
[<matplotlib.lines.Line2D at 0x10f9f8160>]
Can also combine multiple boolean arrays using the syntax for bitwise operations. MUST HAVE PARENTHESES due to operator precedence.
print(temp[(temp < 45 * units.degC) & (spd > 10 * units('m/s'))])
[ 43.95971516 42.72814762 42.45316175 39.2797517 40.23351938 36.77179678 34.43329229 31.42277612 38.97505745 34.10549575 35.70826448 29.01276068 30.31180935 29.31602671 32.84580454 30.76695309 29.11344716 30.16652571 29.91513049 39.51784389] degC
You can also use a list or array of indices to extract particular values--this is a natural extension of the regular indexing. For instance, just as we can select the first element:
print(temp[0])
69.89825854468695 degC
We can also extract the first, fifth, and tenth elements:
print(temp[[0, 4, 9]])
[ 69.89825854 66.77980233 64.33916721] degC
One of the ways this comes into play is trying to sort numpy arrays using argsort
. This function returns the indices of the array that give the items in sorted order. So for our temp "data":
inds = np.argsort(temp)
print(inds)
[52 57 42 48 54 44 56 51 49 43 50 46 58 55 53 40 37 61 47 45 59 39 36 60 41 34 66 63 35 38 32 62 64 33 31 67 29 28 68 69 65 30 27 70 71 72 25 26 73 75 77 21 23 74 76 22 24 20 78 82 80 19 79 16 83 18 87 17 81 84 15 12 13 85 89 86 9 88 14 90 92 97 3 4 93 11 91 10 98 8 7 94 6 95 99 0 2 96 1 5]
We can use this array of indices to pass into temp to get it in sorted order:
print(temp[inds])
[ 28.71828204 28.85269149 29.01276068 29.11344716 29.25186164 29.31602671 29.42796381 29.91513049 30.16652571 30.31180935 30.48608715 30.76695309 30.93380275 30.95814392 31.07199963 31.1341411 31.42277612 32.27369636 32.44927684 32.84580454 33.37573713 34.10549575 34.43329229 34.95696914 35.70826448 36.77179678 37.06954335 37.39853293 37.7453367 38.97505745 39.2797517 39.34620461 39.51784389 40.23351938 42.45316175 42.69583703 42.72814762 43.95971516 44.03576453 44.45775806 45.009292 45.11976794 45.87151534 46.36021141 46.87557425 47.25668992 47.84272747 49.1435589 50.09599544 50.24073197 51.95065202 53.07572989 53.09552107 53.77789358 54.07629059 54.47998158 54.59037269 55.19793004 55.84827794 56.51474577 57.19572063 57.46048995 57.56967086 59.27662304 59.72166924 60.51641586 60.88258643 61.29361272 61.67658285 62.99403256 62.99683836 63.51829518 63.54034678 63.57569453 63.94115754 64.05984232 64.33916721 65.37759899 65.46576463 65.53070256 66.26468701 66.34971928 66.73828667 66.77980233 67.03811793 67.05282372 67.15175649 67.49497791 67.49905207 68.27350814 69.09533591 69.17773618 69.34603239 69.83571708 69.83593609 69.89825854 69.90028363 70.99586742 71.52313905 72.91468564] degC
Or we can slice inds
to only give the 10 highest temperatures:
ten_highest = inds[-10:]
print(temp[ten_highest])
[ 69.09533591 69.17773618 69.34603239 69.83571708 69.83593609 69.89825854 69.90028363 70.99586742 71.52313905 72.91468564] degC
There are other numpy arg functions that return indices for operating:
np.*arg*?