Integrating transcriptome profile with KEGG pathway¶

by Kozo Nishida (Riken, Japan)

This example demonstrates how to integrate transcriptome data (preprocessed with bioconductor packages) with KEGG pathways and visualize it in Cytoscape.

Software Requirments¶

For data pre-processing¶

Input and Output¶

Input - bioconductor ecoliLeucine package
Output - Cytoscape session file containing KEGG pathway with differentially expressed genes

Importing a KEGG Pathway¶

Glycine, serine and threonine metabolism - Escherichia coli K-12 MG1655)¶

In [1]:

import requests
import json

# Basic Setup
PORT_NUMBER = 1234
BASE_URL = "http://localhost:" + str(PORT_NUMBER) + "/v1/"

# Header for posting data to the server as JSON
HEADERS = {'Content-Type': 'application/json'}

# Delete all networks in current session
requests.delete(BASE_URL + 'session')

Out[1]:

<Response [200]>

In [2]:

pathway_location = "http://rest.kegg.jp/get/eco00260/kgml"
res1 = requests.post(BASE_URL + "networks?source=url", data=json.dumps([pathway_location]), headers=HEADERS)
result = json.loads(res1.content)
pathway_suid = result[0]["networkSUID"][0]
print("Pathway SUID = " + str(pathway_suid))

Pathway SUID = 70708

Pre-processing transcriptome data and testing differentially expressed genes with Bioconductor¶

You need to run the following code in R¶

source("http://bioconductor.org/biocLite.R")
biocLite(c("genefilter", "ecoliLeucine"))
library("ecoliLeucine")
library("genefilter")
data("ecoliLeucine")
eset = rma(ecoliLeucine)
r = rowttests(eset, eset$strain)
filtered = r[r$p.value < 0.05,]
write.csv(filtered, file="ttest.csv")

Loading ttest.csv as Pandas DataFrame¶

In [3]:

import pandas as pd

ttest_df = pd.read_csv('ttest.csv')
ttest_df.head()

Out[3]:

	Unnamed: 0	statistic	dm	p.value
0	IG_1070_1689385_1697378_fwd_f_st	2.459792	0.082383	0.049133
1	IG_10_10495_10642_rev_st	-3.009316	-0.046399	0.023721
2	IG_1110_1744617_1744723_fwd_st	-2.515037	-0.169626	0.045592
3	IG_1145_1805715_1805819_fwd_st	3.556263	0.368773	0.011981
4	IG_1189_1874879_1874911_fwd_st	-2.875842	-0.276748	0.028211

Getting node table from Cytoscape and merge with ttest.csv¶

In [4]:

deftable = requests.get('http://localhost:1234/v1/networks/' + str(pathway_suid) + '/tables/defaultnode.tsv')
handle = open('defaultnode.tsv','w')
handle.write(deftable.content)
handle.close()

deftable_df = pd.read_table('defaultnode.tsv')
deftable_df.head()

Out[4]:

	SUID	shared name	name	selected	KEGG_NODE_X	KEGG_NODE_Y	KEGG_NODE_WIDTH	KEGG_NODE_HEIGHT	KEGG_NODE_LABEL	KEGG_NODE_LABEL_LIST_FIRST	KEGG_NODE_LABEL_LIST	KEGG_ID	KEGG_NODE_LABEL_COLOR	KEGG_NODE_FILL_COLOR	KEGG_NODE_REACTIONID	KEGG_NODE_TYPE	KEGG_NODE_SHAPE	KEGG_LINK
0	70718	path:eco00260:46	path:eco00260:46	False	162	547	46	17	K17755	K17755	K17755	ko:K17755	#000000	#FFFFFF	rn:R08211	ortholog	rectangle	http://www.kegg.jp/dbget-bin/www_bget?K17755
1	70719	path:eco00260:47	path:eco00260:47	False	688	222	46	17	K12235	K12235	K12235	ko:K12235	#000000	#FFFFFF	rn:R00589	ortholog	rectangle	http://www.kegg.jp/dbget-bin/www_bget?K12235
2	70720	path:eco00260:48	path:eco00260:48	False	1079	930	8	8	C16432	5-Hydroxyectoine	C16432	cpd:C16432	#000000	#FFFFFF	NaN	compound	circle	http://www.kegg.jp/dbget-bin/www_bget?C16432
3	70721	path:eco00260:49	path:eco00260:49	False	1023	930	46	17	K10674	K10674	K10674	ko:K10674	#000000	#FFFFFF	rn:R08050	ortholog	rectangle	http://www.kegg.jp/dbget-bin/www_bget?K10674
4	70722	path:eco00260:50	path:eco00260:50	False	99	464	46	17	K00499	K00499	K00499	ko:K00499	#000000	#FFFFFF	rn:R07409	ortholog	rectangle	http://www.kegg.jp/dbget-bin/www_bget?K00499

In [5]:

import re
bnum_re = re.compile('b[0-9]{4}')

keggids = []
keggnode_labels = []
for index, probe in ttest_df['Unnamed: 0'].iteritems():
    m = bnum_re.search(probe)
    if m:
        keggids.append(None)
        keggnode_labels.append(None)
        for i, keggid in deftable_df['KEGG_ID'].iteritems():
            if m.group(0) in keggid:
                keggids.pop()
                keggids.append(keggid)
                keggnode_labels.pop()
                keggnode_labels.append(deftable_df['KEGG_NODE_LABEL'][i])
    else:
        keggids.append(None)
        keggnode_labels.append(None)

s1 = pd.Series(keggids, name='KEGG_ID_INPATHWAY')
s2 = pd.Series(keggnode_labels, name='KEGG_NODE_LABEL_INPATHWAY')

merged_df = pd.concat([ttest_df, s1, s2], axis=1)
merged_df.head()

Out[5]:

	Unnamed: 0	statistic	dm	p.value	KEGG_ID_INPATHWAY	KEGG_NODE_LABEL_INPATHWAY
0	IG_1070_1689385_1697378_fwd_f_st	2.459792	0.082383	0.049133	None	None
1	IG_10_10495_10642_rev_st	-3.009316	-0.046399	0.023721	None	None
2	IG_1110_1744617_1744723_fwd_st	-2.515037	-0.169626	0.045592	None	None
3	IG_1145_1805715_1805819_fwd_st	3.556263	0.368773	0.011981	None	None
4	IG_1189_1874879_1874911_fwd_st	-2.875842	-0.276748	0.028211	None	None

In [6]:

ttestjson = json.loads(merged_df.to_json(orient="records"))

new_table_data = {
    "key": "KEGG_NODE_LABEL",
    "dataKey": "KEGG_NODE_LABEL_INPATHWAY",
    "data" : ttestjson
}

update_table_url =  BASE_URL + "networks/" + str(pathway_suid) + "/tables/defaultnode"

print(update_table_url)

requests.put(update_table_url, data=json.dumps(new_table_data), headers=HEADERS)

http://localhost:1234/v1/networks/70708/tables/defaultnode

Out[6]:

<Response [200]>

You can see the t-test results in Cytoscape default node table!

Discussion¶

This workflow integrates data, but visualization part is not fully automated. This is a TODO item...