Another good example of a use for classes is when you have multiple functions that have similar, complicated signatures. A signature is the collection of arguments that get passed into the function. If you have one, single function with many required arguments, you might want to consider splitting the function up into multiple functions that are smaller and less complicated to call. If you have many functions, each with the same required arguments, you might want to consider grouping those functions into a class and making the common arguments in all of your functions class data.
We can see this start to happen using the same example we looked at in the previous section on Complicated Data Structures, but imagining it from a different perspective. Imagine we chose to represent the instrument data with many small data structures.
Imagine a similar instrument data file to what we had before, but now with variables A
, B
, C
, and D
.
%%writefile instrument2.csv
A,0.612,44.978
B,0.891,-0.0178
A,1.132,45.741
C,1.251,21.385
C,1.542,23.723
B,1.852,-0.6319
D,1.988,101.123
D,2.187,100.852
A,2.376,42.178
B,3.017,-2.7863
A,3.861,41.389
C,4.345,27.013
D,4.687,98.678
A,5.142,42.687
D,6.187,102.752
We can define a similar function to what we defined before, be this time we will be explicit about knowledge that the datafile only contains information about variables A
, B
, C
, and D
.
def read_instrument_data(filename):
values = {}
times = {}
with open(filename) as f:
for record in f:
var,time,value = record.split(',')
if var not in values:
values[var] = []
times[var] = []
values[var].append(float(value))
times[var].append(float(time))
return times['A'], values['A'], times['B'], values['B'], times['C'], values['C'], times['D'], values['D']
And we now are assumed to explicitly extract out the times and values into separate lists for each variable.
a_times, a_values, b_times, b_values, c_times, c_values, d_times, d_values = read_instrument_data('instrument2.csv')
Now, the print(data)
test doesn't have any problem! Everything is simple and explicit.
print(a_times)
print(a_values)
print(b_times)
print(b_values)
And so on.
Everything is clearly labeled and the data is "self-explanatory." So, no problem, right?
Imagine the myriad things you can compute from the above variables! Many of the computations may depend on any number of these variables. Some may need the time data, and some may not. Some may require only one variable's data, and some may require them all!
Since I haven't used variables that correspond to anything real in our world, I can't create actual computations that you can recognize, but I hope you can see where this is going. As an example, imagine the following functions.
def integrate_trapezoid(x, y):
return sum(0.5*(x[i] - x[i-1])*(y[i] + y[i-1]) for i in range(1, len(x)))
def compute_v1(a_t, a_v, b_t, b_v):
return integrate_trapezoid(a_t, a_v) - integrate_trapezoid(b_t, b_v)
def compute_v2(a_v, b_v, c_v, d_v):
all_v = a_v + b_v + c_v + d_v
return sum(all_v) / len(all_v)
def compute_v3(a_t, a_v, b_t, b_v, c_t, c_v, d_t, d_v):
a_b = integrate_trapezoid(a_t, a_v) - integrate_trapezoid(b_t, b_v)
c_d = integrate_trapezoid(c_t, c_v) + integrate_trapezoid(d_t, d_v)
return a_b / c_d
def compute_v4(a_t, b_t, c_t, d_t):
return sorted(a_t + b_t + c_t + d_t)
And so on...
You should be able to imagine how more functions can be added to the list, and that the number of functions can grow quickly and dramatically.
Let's try out what we've got.
compute_v1(a_times, a_values, b_times, b_values)
compute_v2(a_values, b_values, c_values, d_values)
compute_v3(a_times, a_values, b_times, b_values, c_times, c_values, d_times, d_values)
compute_v4(a_times, b_times, c_times, d_times)
Just writing out those functions takes time and effort to avoid errors!
What kinds of potential problems do you see with this approach? Are there any?
While the data is explicitly labeled and easy to understand (i.e., the print(data)
test), there is a lot to manage.
Because the number of variables is large, the signatures of the functions can be quite lengthy. Long signatures can be lead to errors because of simple typing mistakes, and those errors can be hard to spot. Especially since long signatures tend to lead to abbreviated variable names to prevent having to type so many characters!
Many of the functions have similar signatures, but not the same. Can you remember which function needs which variables as arguments? Or would you need to look it up every time?
What other problems can you think of?
The instrument that is providing the data is measuring different variables related to the same thing. Maybe it's measuring properties of the atmosphere, ocean, land, or surface ice. Maybe it's just measuring local properties of some "environment," whatever you want to call it. And regardless of what that "environment" is, the computations you perform must also describe the same environment. Hence, conceptually, the data and the computations (i.e., functions) are (again) intrinsically related to one another!
And you can relieve a lot of headaches in using all of these functions, if you encapsulated the data into a single object and simplified the signatures of the functions.
*How might you do that?*
# Try writing a class that groups the data and functions above