This page is available as interactive iPython Notebook as part of the following bitbucket repo:
https://bitbucket.org/oruebel/brainformat (see brain/examples/brainformat_brief_introduction.ipynb
)
The BrainFormat library is available on bitbucket at https://bitbucket.org/oruebel/brainformat.
Local Installation:
Installation at NERSC
The library is installed at NERSC at /project/projectdirs/m2043/brainformat
. A module file for using the library at NERSC is available. Simply execute:
`module use /global/project/projectdirs/m2043/brainformat/modulefiles`
`module load brainformat`
Alternatively you can also call:
`source /project/projectdirs/m2043/brainformat/setup_environment`
# Import basic packages needed for this notebook
import sys
import numpy as np
from IPython.display import Image
# If the brain lib is not installed then set the path to it
sys.path.append("/Users/oruebel/Devel/BrainFormat")
from brain.dataformat.brainformat import *
filename = '/Users/oruebel/Devel/BrainFormat/my_first_brainfile.h5'
multi_filename = '/Users/oruebel/Devel/BrainFormat/my_first_multi_brainfile.h5' # Used later
my_file = BrainDataFile.create(filename)
We now have a new HDF5 file that is initalized with all mandatory elements of the format. The figure below shows an overview of the file's structure. The file contains the base groups for storing data and metadata. Groups in HDF5 are similar to folders on your computer.
Data recordings and analysis are stored in the data group, which itself contains two subgroups i) data/internal for storage of data internal to the brain and ii) data/external for external data, e.g., observations recorded as part of an experiment.
Image(filename="brainformat_brief_introduction/my_first_brainfile_base.png", width=600)
The above figure illustrates the core strucutre of the HDF5 file. We see a number of HDF5 Groups (folders). In the BrainData API, semantic (high-level) objects in the file are associated with corresponding classes in the API which are responsible for creating, managing, and providing access to the content of that object. We refer to any HDF5 object (e.g., Group, Dataset, File) that has an associated manager class as a managed object. This concept is defined in the brain.dataformat.base module, which defines the following base classes:
The module brain.dataformat.brainformat builds on top of the managed object concept and defines the different modules for managing the content of BrainData files. Some of the main classes defined by brainformat API are:
Detailed documentation of the full API is available in the developer SPHINX documentation available in docs folder. Pre-build versions of the developer documentation in HTML (recommended) and PDF form are also available as part of the bitbucket repo here: https://bitbucket.org/oruebel/brainformat/downloads
All objects in the BrainFormat have a well-defined specification. The API provides basic functionality to verify that specific object or complete files follow the BrainFormat specification.
print my_file.check_format_compliance()
True
We can retrieve the specification for a given type by itself or ask the library to recursively resolve the specification, i.e., also include the specification of all managed objects contained in the spec.
# Specification for just the BrainDataFile object
object_spec = BrainDataFile.get_format_specification()
# Specifcation for the BrainDataFile and all contained objects
full_spec = BrainDataFile.get_format_specification_recursive()
# Note: The specifcations are class bound, i.e,. calling the function on my_file would yield the same result.
# Convert the spec to JSON if desired
import json
json.dumps(object_spec)
'{"prefix": "entry_", "datasets": {}, "group": null, "managed_objects": [{"optional": false, "format_type": "BrainDataData"}, {"optional": false, "format_type": "BrainDataDescriptors"}], "groups": {}, "attributes": [], "file_extension": ".h5", "optional": false, "file_prefix": null, "description": "Managed BRAIN file."}'
The above looks at the specification of the format as defined by the BrainFormat library. However, format specifications may change and evolve over time. As part of the core file format, the specification of a managed object is always saved as a JSON string in the format_specification attribute of the managed object in the file. This allows us to easily check whether the specification of an object as described in the file is consistent with the current spec given by the library.
print "Use the API to check if the format has changed: (expected False)"
print my_file.format_specification_changed()
print "Manually check if the specifications as given in the file and by the library are the same: (expected: True)"
temp1 = my_file.get_format_specification_from_file() # Get the format spec as indicated in the file.
# The spec is automatically converted to a python dict.
temp2 = my_file.get_format_specification() # Get the current format spec as defined by the library
# We can compare the two specs manually or we can simply use the helper function provided by the API
print temp1 == temp2 # Compare the two specs manually. We expect True, i.e., the specs match.
Use the API to check if the format has changed: (expected False) False Manually check if the specifications as given in the file and by the library are the same: (expected: True) True
Retrieval of managed objects from file is generally supported by manager classes via two mechanisms:
# 1) Access objects using getter functions
internal = my_file.data().internal()
print internal
<brain.dataformat.brainformat.BrainDataInternalData object at 0x105cfb450>
# 2) Access objects using dictionary-like key slicing
internal = my_file['data']['internal']
print internal
<brain.dataformat.brainformat.BrainDataInternalData object at 0x10c24b410>
Note, if an object in the HDF5 file is managed (i.e., if the object has a manager class associated with it), then the API will return a corresponding instance of the respective class. If the requested object is not managed---e.g., a dataset that is contained in a managed group but that does not have its own API class---then the corresponding h5py object will be returned. For managed objects we can always retrieve the corresponding h5py obect via the get_hpy(...) function:
i5 = internal.get_h5py()
print type(i5)
BrainDataFile.get_managed_object(i5)
<class 'h5py._hl.group.Group'>
<brain.dataformat.brainformat.BrainDataInternalData at 0x10c24b350>
In addition to the format specification, validation, and get_h5py access, the base ManagedObject class---from which all manager classes inherit---provides a number of additional helper functions:
print "Example1: Get all objects of a given type contained in a given parent group"
# Get all h5py objects managed by BrainDataInternal
temp1 = BrainDataInternalData.get_all(parent_group=my_file.data(), get_h5py=True)
# Get all BrainDataInternal objects for the parent
temp2 = BrainDataInternalData.get_all(parent_group=my_file.data(), get_h5py=False)
print temp1
print temp2
print ""
print "Example 2: Check if an h5py object is managed."
temp1 = BrainDataFile.is_managed(internal.get_h5py()) # Works also for ManagedObject instances
print temp1
print ""
print "Example 3: Check if an object is managed by a particular class"
# NOTE: Here we need to call the is_managed_by function from the class we want to check the object against
temp1 = BrainDataInternalData.is_managed_by(internal.get_h5py()) # Works also for ManagedObject instances
print temp1
print ""
print "Example 4: Given an h5py object, get an instance of the corresponding manager class"
# NOTE: Here it is recommended to call the get_managed_object function from any of the core BrainData API classes.
# Calling ManagedObject.get_managed_object(...) will work as well, however, if there are many
# import any derived file API's and as such does not know the classes associated with
# the BrainData API. The brain.dataformat.base may be used to define many different formats.
temp1 = BrainDataFile.get_managed_object(internal.get_h5py()) # Works also for ManagedObject instances
print temp1
print ""
print "Example 5: Inspect the user-defined, optional id of the object. This feature may be used, e.g., to asign a DOE number."
temp1 = internal.has_object_id()
temp2 = internal.get_object_id()
print temp1
print temp2
print ""
print "Example 6: Adding an object id and deleting the object id"
# Adding the object id
internal.set_object_id(10)
temp1 = internal.get_object_id()
print temp1
# Deleting the object id. By setting the object id to None it will be deleted
internal.set_object_id(None)
temp2 = internal.get_object_id()
print temp2
print ""
print "Example 7: Get name of the file where a managed object is stored."
print internal.get_filename(absolute_path=True) # Option 1: Use the helper function provided by ManagedObject
print internal.get_h5py().file.filename # Option 2: Get the h5py object and use its logic
print internal.file.filename # Option 3: The attributes of the managed h5py object are mapped to
# manager object so we can also shortcut Option 2
Example1: Get all objects of a given type contained in a given parent group [<HDF5 group "/data/internal" (0 members)>] [<brain.dataformat.brainformat.BrainDataInternalData object at 0x10c3bc590>] Example 2: Check if an h5py object is managed. True Example 3: Check if an object is managed by a particular class True Example 4: Given an h5py object, get an instance of the corresponding manager class <brain.dataformat.brainformat.BrainDataInternalData object at 0x10c22f5d0> Example 5: Inspect the user-defined, optional id of the object. This feature may be used, e.g., to asign a DOE number. False None Example 6: Adding an object id and deleting the object id 10 None Example 7: Get name of the file where a managed object is stored. /Users/oruebel/Devel/BrainFormat/my_first_brainfile.h5 /Users/oruebel/Devel/BrainFormat/my_first_brainfile.h5 /Users/oruebel/Devel/BrainFormat/my_first_brainfile.h5
Other possible interesting functions include:
>>> internal.close() # Close the file that contains the managed object
>>> internal.get_managed_object_type() # One of 'file', 'dataset', or 'group'
# Creating a dummy list of anatomy names
import random
import string
import numpy as np
def random_string(length):
char_set = string.ascii_uppercase + string.digits
return ''.join(random.sample(char_set*length, length))
my_ana_names = np.repeat( np.asarray([random_string(3) for x in range(16)]) , 16 )
# Creating a dummy layout
my_layout = np.arange(256).reshape(16,16)
# Creating dummy ephys data
my_ephys = np.arange(256*10000).reshape(256,10000)
# Create a basic Ephys data container, without specifying any data
ephys_data0 = BrainDataEphys.create(parent_object=internal,
ephys_data_shape=(256,10000),
ephys_data_type='float32',
sampling_rate=16)
# Create a full dataset populated with data
ephys_data_1 = BrainDataEphys.create(parent_object=internal,
ephys_data=my_ephys,
anatomy_names=my_ana_names, # We can ommit anatomy_ods as they will be autogenerated
layout=my_layout,
start_time=0, # Default value is 0
sampling_rate=16)
The above code illustrated two ways to generate an EcoG dataset: i) by specifying only the type and shape of the dataset we generated an empty dataset ephys_data0 (i.e., empty values are completed with numpy NaN values upon read) and ii) by providing an explicit data array we generated ephys_data1 which contains our user-defined data.
Optionally we may also specify the following items. We will discuss layout and anatomy later in more detail:
The figure below shows the structure of the file after we have added the two dummy EcOG datasets to our example file. The base EcOG data group here contains datasets for the raw_data, sampling_rate, and two datasets, time_axis and electrode_id, which describe the axes of the 2D raw_dataset (space $\times$ time). In addition to the electrode_id dataset, we may also have additional datasets to further describe the anatomy of the spatial dimension of the raw data. NOTE: the electrode_id and time_axis are directly linked to the raw_data via the concept of DimensionScales in HDF5 (more on this topic maybe later). In addition to all these components, we also find a Group (folder) which contains the datastructures for storing annotations of the raw data. An introduction to the concept of data annotations will be provided later. Next, lets see how we can interact with the EcoG data through reading and writing data.
Image(filename="brainformat_brief_introduction/my_first_brainfile_base_2.png", width=600)