Here, we will take phospho data processed with Spectronaut and
As with the standard differential expression analysis, we need:
Simple specifications on how to export the Spectronaut file for PTM analysis can be found in the README.
PHOSPHO_FILE = "./data/phospho/phospho_subset.tsv"
SAMPLEMAP_PHOSPHO = "./data/phospho/samplemap_phospho.tsv"
RESULTS_DIR_PHOSPHO = "./data/phospho/results_phospho"
PROTEOME_FILE = "./data/phospho/proteome_subset.tsv"
SAMPLEMAP_PROTEOME = "./data/phospho/samplemap_proteome.tsv"
RESULTS_DIR_PROTEOME = "./data/phospho/results_proteome"
CONDPAIRS_LIST = [("egf_treated", "untreated")] #this means each fc is egf_treated - untreated
Let's quickly check what the phospho tables look like:
import pandas as pd
phospho_df = pd.read_csv(PHOSPHO_FILE, sep="\t")
samplemap_phospho_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep="\t")
display(phospho_df.head())
#check the ptm columns
display([x for x in phospho_df.columns if "EG.PTM" in x])
#show the samplemap
display(samplemap_phospho_df.head())
importantly, here are site probability columns for different types of variable modifications including phospho
Calling AlphaQuant on ptm data, we additionally have to specify the modification we are interested in. In our case it is [Phospho (STY)]
as listed in the headers above. And we have to set perform_ptm_mapping=True
to perform the phosphosite inference.
import alphaquant.run_pipeline as aq_pipeline
aq_pipeline.run_pipeline(input_file=PHOSPHO_FILE, samplemap_file=SAMPLEMAP_PHOSPHO, results_dir=RESULTS_DIR_PHOSPHO,
condpairs_list=CONDPAIRS_LIST, perform_ptm_mapping=True,modification_type="[Phospho (STY)]",organism="human", valid_values_filter_mode="both") #counting statistics together with PTM mapping is currently an experimental feature, so we set valid_values_filter_mode to "both"
Let's check out the results table located in the results directory:
import pandas as pd
import alphaquant.plotting.pairwise as aq_plotting_pairwise
results_file_phospho = RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.results.tsv"
df_phospho = pd.read_csv(results_file_phospho, sep="\t")
display(df_phospho.head())
aq_plotting_pairwise.volcano_plot(df_phospho)
Here we see the distribution of each sample against the median across all samples. These distributions should be centered around 0. Samples from the same condition should have similar distributions.
normalized_df = pd.read_csv(RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.normed.tsv", sep='\t')
samplemap_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep='\t')
aq_plotting_pairwise.plot_normalization_overview(normalized_df, samplemap_df)
import alphaquant.run_pipeline as aq_pipeline
aq_pipeline.run_pipeline(input_file=PROTEOME_FILE, samplemap_file=SAMPLEMAP_PROTEOME,
results_dir=RESULTS_DIR_PROTEOME, condpairs_list=CONDPAIRS_LIST)
results_file_proteome = RESULTS_DIR_PROTEOME + "/egf_treated_VS_untreated.results.tsv"
import alphaquant.plotting.pairwise as aq_plotting_pairwise
df_proteome = pd.read_csv(results_file_proteome, sep="\t")
aq_plotting_pairwise.volcano_plot(df_proteome)
There is very little regulation on the protein level.
normalized_df = pd.read_csv(RESULTS_DIR_PHOSPHO + "/egf_treated_VS_untreated.normed.tsv", sep='\t')
samplemap_df = pd.read_csv(SAMPLEMAP_PHOSPHO, sep='\t')
aq_plotting_pairwise.plot_normalization_overview(normalized_df, samplemap_df)
The following command writes out the proteome normed files into a new results directory with the ending "_protnormed"
import alphaquant.ptm.protein_ptm_normalization as aq_ptm_normalization
aq_ptm_normalization.PTMResultsNormalizer(results_dir_ptm=RESULTS_DIR_PHOSPHO,
results_dir_proteome=RESULTS_DIR_PROTEOME, organism="human")
import alphaquant.plotting.pairwise as aq_plotting_pairwise
results_file_protnormed = f"{RESULTS_DIR_PHOSPHO}_protnormed/egf_treated_VS_untreated.results.tsv"
df_protnormed = pd.read_csv(results_file_protnormed, sep="\t")
aq_plotting_pairwise.volcano_plot(df_protnormed)
As expected, there is little qualitative change to this plot, because the protein regulation is low. Let's investigate this a bit futher:
import matplotlib.pyplot as plt
merged_df = df_phospho.merge(df_protnormed, on="protein", how="inner", suffixes=("_phospho", "_proteome"))
plt.scatter(merged_df["log2fc_phospho"], merged_df["log2fc_proteome"])
plt.xlabel("log2fc_phospho")
plt.ylabel("log2fc_proteome")
Indeed we see few changes per phospo site, but some difference due to some protein fold changes, as expected.