import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pheno = pd.read_csv('phenoTable.csv')
geno = pd.read_csv('genoTable.csv')
There are 1000 mouse femur bones which have been measured at high resolution and a number of shape analyses run on each sample. - Phenotypical Information - Each column represents a metric which was assessed in the images - CORT_DTO__C_TH for example is the mean thickness of the cortical bone.
pheno.head(5)
BMD | MECHANICS_STIFFNESS | CORT_DTO__C_TH | CORT_DTO__C_TH_SD | CORT_MOM__J | CT_TH_RAD | CT_TH_RAD_STD | CANAL_VOLUME | CANAL_COUNT | CANAL_DENSITY | ... | CANAL_THETA | CANAL_THETA_CV | CANAL_PCA1 | CANAL_PCA1_CV | CANAL_PCA2 | CANAL_PCA2_CV | CANAL_PCA3 | CANAL_PCA3_CV | FEMALE | ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.030221 | 57.163181 | 0.186455 | 0.019785 | 0.103288 | 78.558303 | 17.440679 | 18351.469264 | 31.0 | 72.458800 | ... | 59.576428 | 0.281042 | 443.537228 | 1.326217 | 120.150958 | 1.677884 | 30.294477 | 0.700402 | 0 | 351 |
1 | 0.032788 | 54.972011 | 0.183007 | 0.015696 | 0.126947 | 88.691516 | 22.238608 | 27002.217716 | 137.0 | 206.113056 | ... | 54.487601 | 0.401896 | 293.627859 | 1.272190 | 84.416139 | 1.541258 | 34.940901 | 0.804821 | 0 | 356 |
2 | 0.036075 | 73.590881 | 0.216930 | 0.028019 | 0.171012 | 79.973567 | 8.862339 | 18464.688139 | 128.0 | 177.921019 | ... | 56.120693 | 0.356876 | 326.470697 | 1.155693 | 87.714578 | 1.051160 | 32.911487 | 0.754326 | 0 | 357 |
3 | 0.031145 | 49.854823 | 0.193758 | 0.024087 | 0.099639 | 88.215056 | 23.288367 | 42840.614369 | 147.0 | 247.019809 | ... | 50.206993 | 0.445938 | 243.130372 | 1.014527 | 81.448541 | 1.162161 | 37.690527 | 0.944862 | 0 | 359 |
4 | 0.034226 | 66.578296 | 0.175598 | 0.018144 | 0.176490 | 79.330125 | 15.968669 | 25474.883270 | 271.0 | 349.344731 | ... | 53.561597 | 0.441762 | 243.212520 | 1.041145 | 80.598173 | 1.394151 | 39.716728 | 1.075045 | 1 | 360 |
5 rows × 35 columns
Genetic Information (genoTable.csv) Each animal has been tagged at a number of different regions of the genome (called markers: D1Mit236)
geno.head(5)
ID | D1Mit64 | D1Mit236 | D1Mit7 | D1Mit386 | D1Mit14 | D1Mit540 | D1Mit17 | D2Mit365 | D2Mit323 | ... | D18Mit64 | D18Mit147 | D18Mit123 | D18Mit9 | D18Mit4 | D19Mit68 | D19Mit40 | D19MIT88 | D19MIT17 | D19MIT108 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 351 | H | H | H | H | H | H | H | H | H | ... | H | H | H | H | H | H | H | H | H | H |
1 | 353 | B | B | B | B | H | H | H | H | A | ... | H | A | A | A | H | H | H | H | H | H |
2 | 354 | H | A | A | A | A | H | H | H | H | ... | H | - | H | H | A | H | H | A | H | H |
3 | 355 | A | A | H | H | - | H | H | A | A | ... | H | H | A | A | A | A | A | - | A | A |
4 | 356 | H | A | A | A | A | A | H | H | - | ... | B | H | B | B | H | B | B | B | - | H |
5 rows × 99 columns
fig,ax=plt.subplots(1,1,figsize=(15,15));
pheno.hist(ax=ax,bins=50);
/Users/kaestner/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
These are far too many variables to work with. At least for a start. We have to focus on some few e.g.
Explore the data and correlations between various metrics by using the ‘pairplot’ plotting component. Change the components to examine different variables.
sns.pairplot(pheno, vars = ['BMD', 'CORT_DTO__C_TH', 'CORT_DTO__C_TH_SD'])
For this example we will compare two real cortical bone samples taken from mice. The data will be downloaded in KNIME from the course website (KNIME can also download / upload to FTP servers making sharing results and data easier). - If you are using your own computer you will need to change the Target Folder in both of the “Download” nodes to something reasonable (just click Browse)
For the purpose of the analysis and keeping the data sizes small, we will use Kevin’s Crazy Camera again for simulating the noisy detection process. The assignment aims to be more integrative and you will combine a number of different lectures to get to the final answer.
library(ggplot2)
cur.df<-data.frame(
sample=as.factor(knime.in$"Image Number"),
measurement=knime.in$"Measurement_Number",
volume=knime.in$"Num Pix"
)
ggplot(cur.df,aes(x=volume))+
geom_density(aes(color=sample,group=measurement))+
theme_bw(25)