One good example of a use for classes and objects is when you have complicated data structures. What makes a data structure complicated? Well, think about it from the user's point of view. If the user can't simply print(data)
and understand what the data is and how to use it from the output, then it's complicated. Obviously, there are varying degrees of complicated, but the data should be simple to access and use.
Let's consider an example. Imagine that you have a large number of instruments that record a large number of different kinds of data with irregular temporal frequencies. Imagine that the instrument records each variable at different times, so there is no time data common to all variables.
Because the data is potentially numerous and varied, there are usually many ways of representing the data. One way of representing this kind of data is with nested lists, tuples, and dictionaries. Since the data record frequencies are irregular, and each variable has it's own time data, it makes some sense to group the time and variable data together in a tuple
, and then store multiple time-variable tuples
in a list
. Then, we can store each time-variable tuple-list
in a dict
with the name of each variable as the key in the dictionary. This might look like the following.
Let's assume that the instument writes its data to a CSV (comma-separated-value) file with the following format.
%%writefile instrument1.csv
B,0.891,-0.0178
A,1.132,45.741
B,1.852,-0.6319
A,2.376,42.178
B,3.017,-2.7863
A,3.861,41.389
A,5.142,42.687
We can read this data into our format described above with the following function:
def read_instrument_data(filename):
instrument_data = {}
with open(filename) as f:
for record in f:
var,time,value = record.split(',')
if var not in instrument_data:
instrument_data[var] = []
instrument_data[var].append((float(time), float(value)))
return instrument_data
instrument_data = read_instrument_data('instrument1.csv')
Now, let's try the print(data)
test...
print(instrument_data)
Can you figure out what this data represents just from the output above? If not, what kinds of problems might you expect someone to have with a data structure like this?
If a user simply looked at this output, how would they know what the numbers are? The variables are named, but there are two numbers for each variable. Perhaps they can figure out that one is time, but perhaps they cannot. This means there is information stored with the data that is not explicit. And that is when problems occur!
Now, what kinds of operations do we want to have for this data? Probably lots of different functions, and we won't cover all of them here. Let's just think about a couple examples.
For example, when plotting a variable's data, you probably want to extract just the time
data into its own list and just the variable
data into its own list, instead of having the data mixed together.
For example, you might want to compute an integral of a variable using the Trapezoid rule.
Let's just consider these examples for now.
def get_time(data, var):
return [tpl[0] for tpl in data[var]]
def get_variable(data, var):
return [tpl[1] for tpl in data[var]]
x = get_time(instrument_data, 'A')
x
y = get_variable(instrument_data, 'A')
y
We've encoded information about the data structure (i.e., which tuple element is time
and which is the variable
?) into the functions themselves. Now, the user of the data structure doesn't have to know the details of the data itself to access the data.
def integrate_trapezoid(data, var):
x, y = list(map(list, zip(*data[var])))
return sum(0.5*(x[i] - x[i-1])*(y[i] + y[i-1]) for i in range(1, len(x)))
integrate_trapezoid(instrument_data, 'A')
Is this reasonable? How could you check if this is correct?
What kinds of potential problems do you see with this approach?
What happens if someone tries to "add data" to the instrument_data
structure or modify the instrument_data
structure themselves? How do they do that? Where do they put new data? If anyone needs to modify the original data, they need to know the format for that data or all of the functions will break.
How does someone who is looking at the functions know what the data
argument is or how it should look? They have to read the functions to figure out the structure of the data
argument (and infer the structure of instrument_data
).
Are there any other problems?
This means that the implementation of the functions and the structure of the data are intrinsicly related to each other. And to be safe, and to avoid the problems suggested above, it is usually encouraged to group these functions and data together into a class
and to hide the structure of the data from the user entirely.
*How might you do that?*
# Try writing a class that groups the data and functions above