In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.
Or study design configuration consists of:
Let's import all the required libraries
from time import time
import os
import json
## ISA-API related imports
from isatools.model import Investigation, Study
## ISA-API create mode related imports
from isatools.create.model import StudyDesign
from isatools.create.connectors import generate_study_design
# serializer from ISA Investigation to JSON
from isatools.isajson import ISAJSONEncoder
# ISA-Tab serialisation
from isatools import isatab
## ISA-API create mode related imports
from isatools.create import model
from isatools import isajson
First of all we load the study design configurator with all the specs defined above
with open(os.path.abspath(os.path.join(
"isa-study-design-as-json", "datascriptor", "crossover-study-human.json"
)), "r") as config_file:
study_design_config = json.load(config_file)
To perform the conversion we just need to use the function generate_isa_study_design()
(name possibly subject to change, should we drop the "isa" and "datascriptor" qualifiers?)
study_design = generate_study_design(study_design_config)
assert isinstance(study_design, StudyDesign)
The StudyDesign.generate_isa_study()
method returns the complete ISA-API Study
object.
start = time()
study = study_design.generate_isa_study()
end = time()
print('The generation of the study design took {:.2f} s.'.format(end - start))
assert isinstance(study, Study)
investigation = Investigation(identifier='inv01', studies=[study])
The generation of the study design took 2.02 s.
start = time()
inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))
end = time()
print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))
The JSON serialisation of the ISA investigation took 0.55 s.
directory = os.path.abspath(os.path.join('output', 'crossover-2-treatments-mice'))
os.makedirs(directory, exist_ok=True)
with open(os.path.abspath(os.path.join(directory, 'isa-investigation-crossover-2-treatments-mice.json')), 'w') as out_fp:
json.dump(json.loads(inv_json), out_fp)
start = time()
isatab.dump(investigation, directory)
end = time()
print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))
The Tab serialisation of the ISA investigation took 21.58 s.
To use them on the notebook we can also dump the tables to pandas DataFrames, using the dump_tables_to_dataframes
function rather than dump
dataframes = isatab.dump_tables_to_dataframes(investigation)
len(dataframes)
3
We have 1 study file and 2 assay files (one for MS and one for NMR). Let's check the names:
for key in dataframes.keys():
display(key)
's_study_01.txt'
'a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'
'a_AT11_clinical-chemistry_marker-panel.txt'
We have 10 subjects in the each of the 4 arms for a total of 40 subjects.
We collect:
Across the 4 study arms a total of 280 samples are collected (70 samples per arm)
study_frame = dataframes['s_study_01.txt']
count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP0' in el)])
count_arm1_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP1' in el)])
count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])
count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP3' in el)])
print("There are {} samples in the GRP0 arm (i.e. group)".format(count_arm0_samples))
print("There are {} samples in the GRP1 arm (i.e. group)".format(count_arm1_samples))
print("There are {} samples in the GRP2 arm (i.e. group)".format(count_arm2_samples))
print("There are {} samples in the GRP3 arm (i.e. group)".format(count_arm3_samples))
There are 70 samples in the GRP0 arm (i.e. group) There are 70 samples in the GRP1 arm (i.e. group) There are 70 samples in the GRP2 arm (i.e. group) There are 70 samples in the GRP3 arm (i.e. group)
The study table provides an overview of the subjects (sources) and samples
study_frame
Source Name | Characteristics[Study Subject] | Term Accession Number | Characteristics[status] | Protocol REF | Parameter Value[Sampling order] | Parameter Value[Study cell] | Date | Performer | Sample Name | Characteristics[organism part] | Term Accession Number.1 | Comment[study step with treatment] | Factor Value[Sequence Order] | Factor Value[DURATION] | Unit | Factor Value[AGENT] | Factor Value[INTENSITY] | Unit.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GRP0_SBJ01 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | healthy | sample collection | 031 | A0E3 | 2021-06-30 | Unknown | GRP0_SBJ01_A0E3_SMP-Blood-Sample-1 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | YES | 3 | 14 | days | placebo | 20.0 | mg/day |
1 | GRP0_SBJ01 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | healthy | sample collection | 001 | A0E1 | 2021-06-30 | Unknown | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | Saliva Sample | http://purl.obolibrary.org/obo/NCIT_C174119 | YES | 1 | 14 | days | hypertena | 20.0 | mg/day |
2 | GRP0_SBJ01 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | healthy | sample collection | 042 | A0E4 | 2021-06-30 | Unknown | GRP0_SBJ01_A0E4_SMP-Blood-Sample-2 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | NO | 4 | 180 | days | |||
3 | GRP0_SBJ01 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | healthy | sample collection | 011 | A0E1 | 2021-06-30 | Unknown | GRP0_SBJ01_A0E1_SMP-Blood-Sample-1 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | YES | 1 | 14 | days | hypertena | 20.0 | mg/day |
4 | GRP0_SBJ01 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | healthy | sample collection | 043 | A0E4 | 2021-06-30 | Unknown | GRP0_SBJ01_A0E4_SMP-Blood-Sample-3 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | NO | 4 | 180 | days | |||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
275 | GRP3_SBJ10 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | diseased | sample collection | 242 | A3E3 | 2021-06-30 | Unknown | GRP3_SBJ10_A3E3_SMP-Blood-Sample-1 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | YES | 3 | 14 | days | hypertena | 20.0 | mg/day |
276 | GRP3_SBJ10 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | diseased | sample collection | 256 | A3E4 | 2021-06-30 | Unknown | GRP3_SBJ10_A3E4_SMP-Blood-Sample-3 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | NO | 4 | 180 | days | |||
277 | GRP3_SBJ10 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | diseased | sample collection | 254 | A3E4 | 2021-06-30 | Unknown | GRP3_SBJ10_A3E4_SMP-Blood-Sample-1 | Blood Sample | http://purl.obolibrary.org/obo/NCIT_C17610 | NO | 4 | 180 | days | |||
278 | GRP3_SBJ10 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | diseased | sample collection | 232 | A3E3 | 2021-06-30 | Unknown | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | Saliva Sample | http://purl.obolibrary.org/obo/NCIT_C174119 | YES | 3 | 14 | days | hypertena | 20.0 | mg/day |
279 | GRP3_SBJ10 | Homo sapiens | http://purl.obolibrary.org/obo/NCBITaxon_9606 | diseased | sample collection | 212 | A3E1 | 2021-06-30 | Unknown | GRP3_SBJ10_A3E1_SMP-Saliva-Sample-1 | Saliva Sample | http://purl.obolibrary.org/obo/NCIT_C174119 | YES | 1 | 14 | days | placebo | 20.0 | mg/day |
280 rows × 19 columns
This assay takes urine samples as input
dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt']
Sample Name | Comment[study step with treatment] | Protocol REF | Parameter Value[cross linking] | Parameter Value[DNA fragmentation] | Parameter Value[DNA fragment size] | Parameter Value[immunoprecipitation antibody] | Performer | Extract Name | Characteristics[extract type] | Protocol REF.1 | Parameter Value[instrument] | Parameter Value[library_orientation] | Parameter Value[library_strategy] | Parameter Value[library_selection] | Parameter Value[multiplex identifier] | Performer.1 | Raw Data File | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S81-Extract-R1 | gDNA | library_preparation | GridION | single | MBD-Seq | MF | f | Unknown | AT5-S81-raw_data_file-R2.raw |
1 | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | YES | extraction | uv-light | nebulization | a | d | Unknown | AT5-S1-Extract-R1 | DNA | library_preparation | GridION | paired | MBD-Seq | MF | f | Unknown | AT5-S1-raw_data_file-R1.raw |
2 | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S81-Extract-R1 | gDNA | library_preparation | GridION | paired | MBD-Seq | MF | f | Unknown | AT5-S81-raw_data_file-R3.raw |
3 | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | YES | extraction | uv-light | nebulization | a | d | Unknown | AT5-S1-Extract-R1 | DNA | library_preparation | GridION | paired | MBD-Seq | MF | f | Unknown | AT5-S1-raw_data_file-R2.raw |
4 | GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S81-Extract-R1 | gDNA | library_preparation | GridION | paired | MBD-Seq | MF | f | Unknown | AT5-S81-raw_data_file-R4.raw |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1275 | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S152-Extract-R1 | gDNA | library_preparation | GridION | single | MBD-Seq | MF | f | Unknown | AT5-S152-raw_data_file-R1.raw |
1276 | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S152-Extract-R2 | DNA | library_preparation | GridION | single | MBD-Seq | MF | f | Unknown | AT5-S152-raw_data_file-R5.raw |
1277 | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S152-Extract-R2 | DNA | library_preparation | GridION | single | MBD-Seq | MF | f | Unknown | AT5-S152-raw_data_file-R6.raw |
1278 | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | YES | extraction | di-tert-butyl peroxide | nebulization | a | d | Unknown | AT5-S152-Extract-R1 | gDNA | library_preparation | GridION | paired | MBD-Seq | MF | f | Unknown | AT5-S152-raw_data_file-R3.raw |
1279 | GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 | YES | extraction | uv-light | nebulization | a | d | Unknown | AT5-S72-Extract-R2 | gDNA | library_preparation | GridION | single | MBD-Seq | MF | f | Unknown | AT5-S72-raw_data_file-R6.raw |
1280 rows × 18 columns
For this assay we have 280 urine samples. 280 DNA extracts are extracted from the samples. The 280 extracts are subsequently labeled. For each labeled extract, 4 mass.spec analyses are run (using Agilent QTQF 6510, positive acquisition mode, 2 replicates each for LC and FIA injection mode), for a total of 1120 mass. spec. processes and 1120 raw spectral data files
dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'].nunique(axis=0, dropna=True)
Sample Name 80 Comment[study step with treatment] 1 Protocol REF 1 Parameter Value[cross linking] 2 Parameter Value[DNA fragmentation] 1 Parameter Value[DNA fragment size] 1 Parameter Value[immunoprecipitation antibody] 1 Performer 1 Extract Name 320 Characteristics[extract type] 2 Protocol REF.1 1 Parameter Value[instrument] 1 Parameter Value[library_orientation] 2 Parameter Value[library_strategy] 1 Parameter Value[library_selection] 1 Parameter Value[multiplex identifier] 1 Performer.1 1 Raw Data File 1280 dtype: int64
This assay takes blood samples as input
dataframes['a_AT11_clinical-chemistry_marker-panel.txt']
Sample Name | Comment[study step with treatment] | Protocol REF | Performer | Raw Data File | |
---|---|---|---|---|---|
0 | GRP0_SBJ01_A0E1_SMP-Blood-Sample-1 | YES | sample preparation | Unknown | AT11-S1-raw_data_file-R1 |
1 | GRP0_SBJ01_A0E3_SMP-Blood-Sample-1 | YES | sample preparation | Unknown | AT11-S11-raw_data_file-R1 |
2 | GRP0_SBJ01_A0E4_SMP-Blood-Sample-1 | NO | sample preparation | Unknown | AT11-S21-raw_data_file-R1 |
3 | GRP0_SBJ01_A0E4_SMP-Blood-Sample-2 | NO | sample preparation | Unknown | AT11-S22-raw_data_file-R1 |
4 | GRP0_SBJ01_A0E4_SMP-Blood-Sample-3 | NO | sample preparation | Unknown | AT11-S23-raw_data_file-R1 |
... | ... | ... | ... | ... | ... |
195 | GRP3_SBJ10_A3E1_SMP-Blood-Sample-1 | YES | sample preparation | Unknown | AT11-S152-raw_data_file-R1 |
196 | GRP3_SBJ10_A3E3_SMP-Blood-Sample-1 | YES | sample preparation | Unknown | AT11-S162-raw_data_file-R1 |
197 | GRP3_SBJ10_A3E4_SMP-Blood-Sample-1 | NO | sample preparation | Unknown | AT11-S174-raw_data_file-R1 |
198 | GRP3_SBJ10_A3E4_SMP-Blood-Sample-2 | NO | sample preparation | Unknown | AT11-S175-raw_data_file-R1 |
199 | GRP3_SBJ10_A3E4_SMP-Blood-Sample-3 | NO | sample preparation | Unknown | AT11-S176-raw_data_file-R1 |
200 rows × 5 columns
For this assay we use 320 blood samples. For each sample three chemical marker assays are run, producing a total of 960 sample preparation processes and 960 raw data files
dataframes['a_AT11_clinical-chemistry_marker-panel.txt'].nunique(axis=0, dropna=True)
Sample Name 200 Comment[study step with treatment] 2 Protocol REF 1 Performer 1 Raw Data File 200 dtype: int64