Loading files into LIMIX

Internally, LIMIX uses the hdf5 file format (http://www.hdfgroup.org) to handle genotype and phenotype data. This file format is flexible and supported by a number of data analysis tools, including R (e.g. rhdf5) and python (e.g. h5py, pandas or perl hdf5).

There is also a growing list of Bioinformatics tools and pipelines that build on hdf5:

Limix file converter

Limix offers a simple conversion tool, which can be used to convert plinkbinary files (.bed), csv files and 0,1,2 files, which can be generated using VCFtools.

Importing of genotype data

limix_converter --outfile=./my_file.hdf5 --plink=./my_file

Note, the .bed ending is ommited. If the file my_file.hdf5 already exists, the genoytpe group (not the phenotypes) is deleted. An example plink file is included in the tutorial folder in "data/importer/genotype.(bed/bfam/bim)

Reading a VCF file:

VCF files need first to be converted into a G012 file. This can be achieved via vcftools:

vcftools --vcf INFILE --012 --out OUTFILE

If the vcf file is .gz compressed, you need to call

vcftools --vcfgz INFILE --012 --out OUTFILE

Subsequently, the file can be imported into a LIMX hdf5 file, using:

limix_converter --outfile=./my_file.hdf5 --g012=./OUTFILE

Note again that the endings are ommited. VCFtools will require several files in the export statement and both limix_converter and vcftools assume that any file ending is ommitted. An example vcf file is included in the tutorial folder in "data/importer/vcf_sample.vcf.gz.

Importing of phenotype data

Reading a phentoype CSV file:

limix_converter --outfile=./my_file.hdf5 -csv=./phenotype_sample.csv

Note, the phenotype file is expected to be in the format [samples (rows) x phenotypes (columns)], including column headers (phenotype IDs) and rowheader (sample IDs). An example CSV file is included in the tutorials folder in "data/importer/phenotype.csv".

In [ ]: