In this post, we will learn how make scatter plots using R and the package ggplot2. This is the notebook for the scatter plot in R tutorial: https://www.marsja.se/how-to-make-a-scatter-plot-in-r-with-ggplot2/
You need to install the packages used in this tutorial packages before continuing.
You install packages with the install.packages()
function. Make sure to uncomment (remove the '#') if you actually need to install the packages!
# install.packages(c("tidyverse", "GGally"))
Here's the single packages used in the tutorial, if you only want those installed;
# to.install <- c("magittr", "purrr",
# "ggplot2", "dplyr", "broom", "GGally")
# install.packages(to.install)
Time to learn how to produce a scatter plot using R statistical programming environment and we start by using the mtcars dataset.
require(ggplot2)
head(mtcars)
Loading required package: ggplot2 Warning message: "package 'ggplot2' was built under R version 3.6.1"
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Data can, also, be stored in Excel files:
In this section we will learn how to make scattergraphs in R using ggplot2.
We will start by visualizing the variables wt (x-axis) and mpg (y-axis).
Before going on and creating the first scatter plot in R we will briefly cover ggplot2 and the plot functions we are going to use. First, we start by using ggplot to create a plot object.
Inside of the ggplot()
function, we’re calling the aes()
function that describe how variables in our data are mapped to visual properties . In this simple scatter plot in R example, we only use the x- and y-axis arguments and ggplot2 to put our variable wt on the x-axis, and put mpg on the y-axis.
require(ggplot2)
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars)
gp + geom_point()
Here we'' change the size of the markers size using the size argument.
gp + geom_point(size = 4)
Note, we used aes()
but added the size argument to the geom_point()
function.
gp + geom_point(aes(size = wt))
To change the x-axis we use the function scale_x_continuous
and to change the y-axis we use the function scale_y_continuous
. Furthermore, we use the arguments limits, which take a vector, and we can set the limits to change the ticks.
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars) +
geom_point()
gp + scale_y_continuous(limits=c(1, 40)) +
scale_x_continuous(limits=c(0, 6))
Next we also change the number of ticks by adding the breaks argument to the above functions. Furthermore, we add the seq
function to create a numeric vector.
gp + scale_y_continuous(limits=c(1, 35),
breaks=seq(1, 35, 5)) +
scale_x_continuous(limits=c(1.5, 5.5),
breaks=seq(1.5, 5.5, 1))
Here we group by using color argument and the factor
function to change the variable vs to a factor.
gp <- ggplot(aes(x=wt, y=mpg, color=factor(vs)),
data=mtcars)
gp + geom_point()
Another option is using the as.factor
function and change vs to a factor in the dataframe object.
mtcars$vs <- as.factor(mtcars$vs)
gp <-ggplot(aes(x=wt, y=mpg, color=vs),
data=mtcars)
gp + geom_point()
Here we are adding thea aes()
function in the geom_point()
function. In the aes()
function we are adding the color and shape arguments and add the class column (the categorical variable).
data(Burt, package = 'carData')
Burt$class <- as.factor(Burt$class)
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class,
shape = class))
We use the geom_smooth()
function and the method “lm” to add a regression line.
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class,
shape = class)) +
geom_smooth(method = "lm", se = FALSE)
In the next scatter plot example, we are going to add a regression line to the plot for each factor (category) also. Remember, we just add the color and shape arguments to the geom_point()
function:
gp + geom_point(aes(color = class,
shape = class)) +
geom_smooth(aes(color = class), method = "lm", se = FALSE)
We are adding a bivariate distribution on the scatter plot in R using the geom_density2d()
function.
gp <- ggplot(aes(x=wt, y=mpg),
data=mtcars)
gp + geom_point() + geom_density2d()
Let's carry out correlation analysis using R, extract the r– and p-values, and later learn how to add this as text to our scatter plot.
require(dplyr)
require(broom)
corr <- mtcars %$%
cor.test(mpg, wt) %>%
tidy %>%
mutate_if(is.numeric, round, 4)
corr
Loading required package: dplyr Attaching package: 'dplyr' The following objects are masked from 'package:stats': filter, lag The following objects are masked from 'package:base': intersect, setdiff, setequal, union Loading required package: broom Warning message: "package 'broom' was built under R version 3.6.1"
estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|
-0.8677 | -9.559 | 0 | 30 | -0.9338 | -0.7441 | Pearson's product-moment correlation | two.sided |
text = paste0('r = ', corr$estimate, ', ',
ifelse(corr$p.value <= 0,
'p < 0.05',
paste('p = ', corr$p.value))
)
text
We add text using theannotate
function.
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars)
gp + geom_point() + geom_smooth(method = "lm", se = FALSE) +
annotate('text', x = 4.5, y = 35, label=text)
require(tidyr)
require(purrr)
data(Burt, package = 'carData')
corr <- Burt %>% group_by(class) %>%
nest() %>%
mutate(Cor = map(data, ~ cor.test(.$IQbio, .$IQfoster)),
p = map_dbl(Cor, 'p.value'),
est = map_dbl(Cor, 'estimate')
) %>%
mutate_if(is.numeric, round, 4) %>%
select(class, p, est, Cor)
text <- corr %>%
mutate(
text = paste0('r = ', est, ', ',
ifelse(p <= 0.01,
'p < 0.05',
paste('p = ', p))))
Burt$class <- as.factor(Burt$class)
gp <- ggplot(aes(x = IQbio, y = IQfoster),
data = Burt)
corrp <- gp + geom_point(aes(color = class,
shape=class)) +
geom_smooth(aes(color = class), method = "lm", se = FALSE) +
geom_text(aes(x = 120, y = 137, color="high",
label=subset(text, class == "high")$text)) +
geom_text(aes(x = 118, y = 109, color="medium",
label=subset(text, class == "medium")$text)) +
geom_text(aes(x = 124, y = 103, color="low",
label=subset(text, class == "low")$text))
corrp
Loading required package: tidyr Loading required package: purrr
Here's how to rotate the axis labels
data(Salaries, package = "carData")
Salaries$rank <- as.factor(Salaries$rank)
gp <- ggplot(aes(x = salary, y = yrs.since.phd),
data = Salaries) +
geom_point(aes(color = rank,
shape = rank)) +
geom_smooth(method = "lm") +
scale_y_continuous(limits = c(0, 60)) +
scale_x_continuous(limits = c(50000, 240000),
breaks = seq(50000, 240000, by = 10000))
To rotate the axis do this:
gp + theme(axis.text.x =
element_text(angle = 90, hjust = 1))
Here we use the theme_bw()
function to get a dark-light themed plot. Then, we are going to make the scatter plot in black and grey colors using the scale_colour_grey()
function. Finally, we add a theme layer using the function theme()
.
The function element_blank()
will make draw “nothing” at that particular parameter. For instance, plot.background = element_blank()
will give the plot a blank (white) background.
corrp + theme_bw() + scale_colour_grey() +
theme(axis.line = element_line(colour = "black")
,plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
,strip.background = element_blank()
,panel.border = element_blank()
,legend.title=element_blank()
,legend.key = element_blank())
Let's create the pairplots using the package GGally.
require(GGally)
cols = c('mpg', 'wt', 'hp', 'qsec')
ggpairs(mtcars, columns = cols)
Loading required package: GGally Warning message: "package 'GGally' was built under R version 3.6.1"Registered S3 method overwritten by 'GGally': method from +.gg ggplot2 Attaching package: 'GGally' The following object is masked from 'package:dplyr': nasa
In this section, we are going to learn how to save ggplot2 plots as PDF and TIFF files.
data(Salaries, package = "carData")
gp <- ggplot(aes(x=yrs.since.phd, y=salary),
data=Salaries) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, colour="gray") +
theme_bw() +
theme(axis.line = element_line(colour = "black")
,plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
,strip.background = element_blank()
,panel.border = element_blank()
,legend.title=element_blank()
,legend.key = element_blank()) +
xlab('Years since Ph.D.') +
ylab('Salary')
Now we can use the ggsave()
function to save the scatter plot.
Let's save a pdf!
ggsave("salaries_by_year_scatterplot.pdf", device = "pdf",
width = 12, height = 8,
units = "cm", dpi = 300)
Let's save a TIFF!
ggsave("salaries_by_year_scatterplot.tiff", device = "tiff",
width = 12, height = 8,
units = "cm", dpi = 300)