Notebook

Performing differential expression analysis on PTM data¶

Here, we will take phospho data processed with Spectronaut and

perform phosphosite inference and perform differential expression analysis on the phospho sites
take proteome data and normalize the differential expression results

Defining input files¶

As with the standard differential expression analysis, we need:

an input file from a proteomics search engine. We currently only support Spectronaut, DIA-NN will come soon
a sample mapping file that maps each sample to a condition (e.g. sample 'brain_replicate_1' is mapped to condition 'brain'). In the GUI, there is some functionality to help create such a file
(optional) a results directory can be defined on where to save the data
(optional) a list where we specify, which conditions we compare

Simple specifications on how to export the Spectronaut file for PTM analysis can be found in the README.

In [ ]:

PHOSPHO_FILE = "./data/phospho/phospho_subset.tsv"
SAMPLEMAP_PHOSPHO = "./data/phospho/samplemap_phospho.tsv"
RESULTS_DIR_PHOSPHO = "./data/phospho/results_phospho"

PROTEOME_FILE = "./data/phospho/proteome_subset.tsv"
SAMPLEMAP_PROTEOME = "./data/phospho/samplemap_proteome.tsv"
RESULTS_DIR_PROTEOME = "./data/phospho/results_proteome"

CONDPAIRS_LIST = [("egf_treated", "untreated")] #this means each fc is egf_treated - untreated

Let's quickly check what the phospho tables look like:

In [ ]:

import pandas as pd

phospho_df = pd.read_csv(PHOSPHO_FILE, sep="\t")
samplemap_phospho_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep="\t")
display(phospho_df.head())
#check the ptm columns
display([x for x in phospho_df.columns if "EG.PTM" in x])
#show the samplemap
display(samplemap_phospho_df.head())

importantly, here are site probability columns for different types of variable modifications including phospho

Running AlphaQuant on phospho¶

Calling AlphaQuant on ptm data, we additionally have to specify the modification we are interested in. In our case it is [Phospho (STY)] as listed in the headers above. And we have to set perform_ptm_mapping=True to perform the phosphosite inference.

In [ ]:

import alphaquant.run_pipeline as aq_pipeline

aq_pipeline.run_pipeline(input_file=PHOSPHO_FILE, samplemap_file=SAMPLEMAP_PHOSPHO, results_dir=RESULTS_DIR_PHOSPHO,
                        condpairs_list=CONDPAIRS_LIST, perform_ptm_mapping=True,modification_type="[Phospho (STY)]",organism="human", valid_values_filter_mode="both") #counting statistics together with PTM mapping is currently an experimental feature, so we set valid_values_filter_mode to "both"

Inspecting and visualizing phospho results¶

Let's check out the results table located in the results directory:

Volcano plot¶

In [ ]:

import pandas as pd
import alphaquant.plotting.pairwise as aq_plotting_pairwise

results_file_phospho = RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.results.tsv"
df_phospho = pd.read_csv(results_file_phospho, sep="\t")
display(df_phospho.head())

aq_plotting_pairwise.volcano_plot(df_phospho)

Normalization check¶

Here we see the distribution of each sample against the median across all samples. These distributions should be centered around 0. Samples from the same condition should have similar distributions.

In [ ]:

normalized_df = pd.read_csv(RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.normed.tsv", sep='\t')
samplemap_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep='\t')
aq_plotting_pairwise.plot_normalization_overview(normalized_df, samplemap_df)

Run AlphaQuant on proteome data¶

In [ ]:

import alphaquant.run_pipeline as aq_pipeline

aq_pipeline.run_pipeline(input_file=PROTEOME_FILE, samplemap_file=SAMPLEMAP_PROTEOME,
                         results_dir=RESULTS_DIR_PROTEOME, condpairs_list=CONDPAIRS_LIST)

Volcano plot¶

In [ ]:

results_file_proteome = RESULTS_DIR_PROTEOME + "/egf_treated_VS_untreated.results.tsv"

In [ ]:

import alphaquant.plotting.pairwise as aq_plotting_pairwise

df_proteome = pd.read_csv(results_file_proteome, sep="\t")
aq_plotting_pairwise.volcano_plot(df_proteome)

There is very little regulation on the protein level.

Normalization plot¶

In [ ]:

normalized_df = pd.read_csv(RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.normed.tsv", sep='\t')
samplemap_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep='\t')
aq_plotting_pairwise.plot_normalization_overview(normalized_df, samplemap_df)

Combining phospho and proteome data by proteome-normalization¶

The following command writes out the proteome normed files into a new results directory with the ending "_protnormed"

In [ ]:

import alphaquant.ptm.protein_ptm_normalization as aq_ptm_normalization

aq_ptm_normalization.PTMResultsNormalizer(results_dir_ptm=RESULTS_DIR_PHOSPHO,
                                          results_dir_proteome=RESULTS_DIR_PROTEOME, organism="human")

Volcano plot¶

In [ ]:

import alphaquant.plotting.pairwise as aq_plotting_pairwise
results_file_protnormed = f"{RESULTS_DIR_PHOSPHO}_protnormed/egf_treated_VS_untreated.results.tsv"
df_protnormed = pd.read_csv(results_file_protnormed, sep="\t")
aq_plotting_pairwise.volcano_plot(df_protnormed)

As expected, there is little qualitative change to this plot, because the protein regulation is low. Let's investigate this a bit futher:

In [ ]:

import matplotlib.pyplot as plt
merged_df = df_phospho.merge(df_protnormed, on="protein", how="inner", suffixes=("_phospho", "_proteome"))
plt.scatter(merged_df["log2fc_phospho"], merged_df["log2fc_proteome"])
plt.xlabel("log2fc_phospho")
plt.ylabel("log2fc_proteome")

Indeed we see few changes per phospo site, but some difference due to some protein fold changes, as expected.