This is the tutorial about using how to use R package singscore
. We use the built-in data for this tutorial.
.
R package, singscore
, needs to be installed and loaded in the R session, this can be done easily with the following chunk of code:
install.packages("BiocManager")
BiocManager::install("singscore")
BiocManager::install("GSEABase")
suppressPackageStartupMessages(library(singscore))
suppressPackageStartupMessages(library(GSEABase))
The datasets used in this vignette have been built within the package. The tgfb_expr_10_se
dataset was obtained from (Foroutan et al. 2017) and it is a ten-sample subset of the original dataset. We are going to score the integrated TGFb-treated gene expression dataset (4 cases and 6 controls) against a TGFb gene signature with an up-regulated and down-regulated gene-set pair (Foroutan et al. 2017).
# The example expression dataset and gene signatures are included in the package
# distribution, one can directly access them using the variable names
# To see the description of 'tgfb_expr_10_se','tgfb_gs_up','tgfb_gs_dn', look at
# Have a look at the object `tgfb_expr_10_se` containing gene expression data for 10 samples
tgfb_expr_10_se
# View what tgfb_gs_up/dn contains
tgfb_gs_up
tgfb_gs_dn
# Get the size of the gene sets
length(GSEABase::geneIds(tgfb_gs_up))
length(GSEABase::geneIds(tgfb_gs_dn))
class: SummarizedExperiment dim: 11900 10 metadata(0): assays(1): counts rownames(11900): 2 9 ... 729164 752014 rowData names(0): colnames(10): D_Ctrl_R1 D_TGFb_R1 ... Hil_Ctrl_R1 Hil_Ctrl_R2 colData names(1): Treatment
setName: NA geneIds: 19, 87, ..., 402055 (total: 193) geneIdType: Null collectionType: Null details: use 'details(object)'
setName: NA geneIds: 136, 220, ..., 161291 (total: 108) geneIdType: Null collectionType: Null details: use 'details(object)'
To get the sample scores, rank the expression data via rankGenes()
at first. And then use simpleScore()
to compute.
# The recommended method for dealing with ties in ranking is 'min',
# you can change by specifying 'tiesMethod' parameter for rankGenes function.
rankData = rankGenes(tgfb_expr_10_se)
head(rankData)
# Given the ranked data and gene signature, simpleScore returns the scores and
# dispersions for each sample
scoredf = simpleScore(rankData, upSet = tgfb_gs_up, downSet = tgfb_gs_dn)
head(scoredf)
# To view more details of the simpleScore, use ?simpleScore
# Note that, when only one gene set is available in a gene signature, one can
# only input values for the upSet argument. In addition, a knownDirection
# argument can be set to FALSE if the direction of the gene set is unknown.
# simpleScore(rankData, upSet = tgfb_gs_up, knownDirection = FALSE)
D_Ctrl_R1 | D_TGFb_R1 | D_Ctrl_R2 | D_TGFb_R2 | Hes_Ctrl_R1 | Hes_TGFb_R1 | Hes_Ctrl_R2 | Hes_TGFb_R2 | Hil_Ctrl_R1 | Hil_Ctrl_R2 | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 1065 | 1255 | 1428 | 1269 | 1252 | 1570 | 1188 | 1122 | 1055 | 1227 |
9 | 6688 | 7611 | 6454 | 6975 | 6244 | 7330 | 7110 | 7052 | 6247 | 7315 |
10 | 1741 | 1599 | 1541 | 1686 | 1777 | 1638 | 1713 | 1418 | 1558 | 1849 |
12 | 5441 | 3682 | 6495 | 5105 | 3906 | 5533 | 4344 | 7036 | 4845 | 4830 |
13 | 3352 | 3599 | 3276 | 3459 | 3337 | 3042 | 3288 | 3906 | 3480 | 3062 |
14 | 10442 | 10013 | 10146 | 9917 | 10382 | 10085 | 10061 | 10029 | 9731 | 10268 |
TotalScore | TotalDispersion | UpScore | UpDispersion | DownScore | DownDispersion | |
---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
D_Ctrl_R1 | -0.088097993 | 2867.348 | 0.06096415 | 3119.390 | -0.14906214 | 2615.306 |
D_TGFb_R1 | 0.286994210 | 2217.970 | 0.24931565 | 2352.886 | 0.03767856 | 2083.053 |
D_Ctrl_R2 | -0.098964086 | 2861.418 | 0.06841242 | 3129.769 | -0.16737650 | 2593.067 |
D_TGFb_R2 | 0.270721958 | 2378.832 | 0.25035661 | 2470.012 | 0.02036534 | 2287.652 |
Hes_Ctrl_R1 | -0.002084788 | 2746.146 | 0.08046490 | 3134.216 | -0.08254969 | 2358.075 |
Hes_TGFb_R1 | 0.176122839 | 2597.515 | 0.22894035 | 2416.638 | -0.05281751 | 2778.392 |
plotRankDensity()
plots the ranks of genes in the gene-sets for a specific sample.
# You can provide the upSet alone when working with unpaired gene-sets
# We plot the second sample in rankData, view it by
head(rankData[, 2, drop = FALSE])
head(rankData[2, , drop = FALSE])
plotRankDensity(rankData[,2,drop = FALSE], upSet = tgfb_gs_up,
downSet = tgfb_gs_dn, isInteractive = FALSE)
## Setting `isInteractive = TRUE` generates an interactive plot using the `plotly` package.
## Hovering over the bars in the interactive plot allows you to get information
## such as the normalised rank (between 0 and 1) and ID of the gene represented by the bar.
## For the rest of the plotting functions, the isInteractive = TRUE argument has the same behavior.
D_TGFb_R1 | |
---|---|
2 | 1255 |
9 | 7611 |
10 | 1599 |
12 | 3682 |
13 | 3599 |
14 | 10013 |
D_Ctrl_R1 | D_TGFb_R1 | D_Ctrl_R2 | D_TGFb_R2 | Hes_Ctrl_R1 | Hes_TGFb_R1 | Hes_Ctrl_R2 | Hes_TGFb_R2 | Hil_Ctrl_R1 | Hil_Ctrl_R2 | |
---|---|---|---|---|---|---|---|---|---|---|
9 | 6688 | 7611 | 6454 | 6975 | 6244 | 7330 | 7110 | 7052 | 6247 | 7315 |
Warning message: "Ignoring unknown aesthetics: text"
plotDispersion()
generates the scatter plots of the score VS. dispersions
for the total scores, the up scores and the down score of samples.
# Get the annotations of samples by their sample names
tgfbAnnot = data.frame(SampleID = colnames(tgfb_expr_10_se),
Type = NA)
tgfbAnnot$Type[grepl("Ctrl", tgfbAnnot$SampleID)] = "Control"
tgfbAnnot$Type[grepl("TGFb", tgfbAnnot$SampleID)] = "TGFb"
# Sample annotations
tgfbAnnot$Type
plotDispersion(scoredf, annot = tgfbAnnot$Type, isInteractive = FALSE)
# To see an interactive version powered by 'plotly', simply set the
# 'isInteractive' = TRUE, i.e :
#
# plotDispersion(scoredf,annot = tgfbAnnot$Type,isInteractive = TRUE)
plotScoreLandscape()
plots the scores of the samples against two different gene signatures in a landscape for exploring their relationships.
plotScoreLandscape(scoredf_ccle_epi, scoredf_ccle_mes,
scorenames = c('ccle-EPI','ccle-MES'),hexMin = 10)
Similarly, pre-computed scores for the TCGA breast cancer RNA-seq dataset against epithelial and mesenchymal gene signatures are stored in scoredf_tcga_epi and scoredf_tcga_mes respectively (Tan et al. 2014). The utility of this function is enhanced when the number of samples is large.
tcgaLandscape = plotScoreLandscape(scoredf_tcga_epi, scoredf_tcga_mes,
scorenames = c('tcga_EPI','tcga_MES'), isInteractive = FALSE)
tcgaLandscape
You can also project new data points onto the landscape plot generated above by using the projectScoreLandscape function. For example, the plot below overlays 3 CCLE samples onto the TCGA epithelial-mesenchymal landscape. Points are labeled with their sample names by default
# Project on the above generated 'tcgaLandscape' plot
projectScoreLandscape(plotObj = tcgaLandscape, scoredf_ccle_epi,
scoredf_ccle_mes,
subSamples = rownames(scoredf_ccle_epi)[c(1,4,5)],
annot = rownames(scoredf_ccle_epi)[c(1,4,5)],
sampleLabels = NULL,
isInteractive = FALSE)
Warning message: "Ignoring unknown aesthetics: text"
# Custom labels can be provided by passing a character vector to the sampleLabels argument.
projectScoreLandscape(plotObj = tcgaLandscape, scoredf_ccle_epi, scoredf_ccle_mes,
subSamples = rownames(scoredf_ccle_epi)[c(1,4,5,8,9)],
sampleLabels = c('l1','l2','l3','l4','l5'),
annot = rownames(scoredf_ccle_epi)[c(1,4,5,8,9)],
isInteractive = FALSE)
Warning message: "Ignoring unknown aesthetics: text"
sessionInfo()
R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936 [3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] singscore_1.8.0 loaded via a namespace (and not attached): [1] ggrepel_0.8.2 locfit_1.5-9.4 [3] Rcpp_1.0.4.6 lattice_0.20-41 [5] tidyr_1.1.0 assertthat_0.2.1 [7] digest_0.6.25 IRdisplay_0.7.0 [9] R6_2.4.1 GenomeInfoDb_1.24.0 [11] plyr_1.8.6 repr_1.1.0 [13] stats4_4.0.0 RSQLite_2.2.0 [15] evaluate_0.14 ggplot2_3.3.0 [17] pillar_1.4.4 zlibbioc_1.34.0 [19] rlang_0.4.6 uuid_0.1-4 [21] annotate_1.66.0 hexbin_1.28.1 [23] blob_1.2.1 S4Vectors_0.26.1 [25] Matrix_1.2-18 labeling_0.3 [27] stringr_1.4.0 RCurl_1.98-1.2 [29] bit_1.1-15.2 munsell_0.5.0 [31] DelayedArray_0.14.0 compiler_4.0.0 [33] pkgconfig_2.0.3 BiocGenerics_0.34.0 [35] base64enc_0.1-3 htmltools_0.4.0 [37] tidyselect_1.1.0 SummarizedExperiment_1.18.1 [39] tibble_3.0.1 GenomeInfoDbData_1.2.3 [41] edgeR_3.30.0 IRanges_2.22.2 [43] matrixStats_0.56.0 XML_3.99-0.3 [45] crayon_1.3.4 dplyr_0.8.5 [47] bitops_1.0-6 grid_4.0.0 [49] jsonlite_1.6.1 xtable_1.8-4 [51] GSEABase_1.50.0 gtable_0.3.0 [53] lifecycle_0.2.0 DBI_1.1.0 [55] magrittr_1.5 scales_1.1.1 [57] graph_1.66.0 stringi_1.4.6 [59] farver_2.0.3 XVector_0.28.0 [61] reshape2_1.4.4 limma_3.44.1 [63] ellipsis_0.3.1 vctrs_0.3.0 [65] IRkernel_1.1 RColorBrewer_1.1-2 [67] tools_4.0.0 bit64_0.9-7 [69] Biobase_2.48.0 glue_1.4.1 [71] purrr_0.3.4 parallel_4.0.0 [73] AnnotationDbi_1.50.0 colorspace_1.4-1 [75] GenomicRanges_1.40.0 memoise_1.1.0 [77] pbdZMQ_0.3-3