In this post, we will learn how make scatter plots using R and the package ggplot2. This is the notebook for the scatter plot in R tutorial: https://www.marsja.se/how-to-make-a-scatter-plot-in-r-with-ggplot2/

- Required r-packages
- How to Install R-packages
- How to Make a Scatter Plot in R
- How to use Ggplot2 to Produce Scatter Plots in R
- How to Make a Scatter Plot in R
- How to Change the Size of the Dots in a Scatter Plot
- How to Add a Trend Line to a Scatter Plot in R
- How to Add Text to Scatter Plot in R
- How to Rotate the Axis using Ggplot2
- How to Style a Scatter plot in R
- Saving a High Resolution Plot in R
- How to Save a Scatter Plot to PDF in R
- How to Save a Scatter Plot to TIFF in R

- How to Make a Scatter Plot in R
- How to Change the Size of the Dots in a Scatter Plot
- How to Add a Trend Line to a Scatter Plot in R
- How to Add Text to Scatter Plot in R
- How to Rotate the Axis using Ggplot2
- How to Style a Scatter plot in R
- Saving a High Resolution Plot in R
- How to Save a Scatter Plot to PDF in R
- How to Save a Scatter Plot to TIFF in R
- Reproducible Data Visualization
- Conclusion

You need to install the packages used in this tutorial packages before continuing.

- The easiest method to get all of the packages is to install the tidyverse packages.

You install packages with the `install.packages()`

function. Make sure to uncomment (remove the '#') if you actually need to install the packages!

In [1]:

```
# install.packages(c("tidyverse", "GGally"))
```

Here's the single packages used in the tutorial, if you only want those installed;

In [2]:

```
# to.install <- c("magittr", "purrr",
# "ggplot2", "dplyr", "broom", "GGally")
# install.packages(to.install)
```

Time to learn how to produce a scatter plot using R statistical programming environment and we start by using the mtcars dataset.

In [3]:

```
require(ggplot2)
head(mtcars)
```

Data can, also, be stored in Excel files:

In this section we will learn how to make scattergraphs in R using ggplot2.

We will start by visualizing the variables **wt **(x-axis) and **mpg **(y-axis).

Before going on and creating the first scatter plot in R we will briefly cover ggplot2 and the plot functions we are going to use. First, we start by using *ggplot* to create a plot object.

Inside of the `ggplot()`

function, we’re calling the `aes()`

function that describe how variables in our data are mapped to visual properties . In this simple scatter plot in R example, we only use the x- and y-axis arguments and ggplot2 to put our variable wt on the x-axis, and put mpg on the y-axis.

In [4]:

```
require(ggplot2)
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars)
gp + geom_point()
```

Here we'' change the size of the markers size using the *size* argument.

In [5]:

```
gp + geom_point(size = 4)
```

Note, we used `aes()`

but added the size argument to the `geom_point()`

function.

In [6]:

```
gp + geom_point(aes(size = wt))
```

To change the x-axis we use the function `scale_x_continuous`

and to change the y-axis we use the function `scale_y_continuous`

. Furthermore, we use the arguments limits, which take a vector, and we can set the limits to change the ticks.

In [7]:

```
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars) +
geom_point()
gp + scale_y_continuous(limits=c(1, 40)) +
scale_x_continuous(limits=c(0, 6))
```

Next we also change the number of ticks by adding the breaks argument to the above functions. Furthermore, we add the `seq`

function to create a numeric vector.

In [8]:

```
gp + scale_y_continuous(limits=c(1, 35),
breaks=seq(1, 35, 5)) +
scale_x_continuous(limits=c(1.5, 5.5),
breaks=seq(1.5, 5.5, 1))
```

Here we group by using *color *argument and the `factor`

function to change the variable vs to a factor.

In [9]:

```
gp <- ggplot(aes(x=wt, y=mpg, color=factor(vs)),
data=mtcars)
gp + geom_point()
```

Another option is using the `as.factor`

function and change *vs* to a factor in the dataframe object.

In [10]:

```
mtcars$vs <- as.factor(mtcars$vs)
gp <-ggplot(aes(x=wt, y=mpg, color=vs),
data=mtcars)
gp + geom_point()
```

Here we are adding thea `aes()`

function in the `geom_point()`

function. In the `aes()`

function we are adding the *color* and *shape* arguments and add the class column (the categorical variable).

In [11]:

```
data(Burt, package = 'carData')
Burt$class <- as.factor(Burt$class)
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class,
shape = class))
```

We use the `geom_smooth()`

function and the method “lm” to add a regression line.

In [12]:

```
gp <- ggplot(aes(x = IQbio, y = IQfoster), data = Burt)
gp + geom_point(aes(color = class,
shape = class)) +
geom_smooth(method = "lm", se = FALSE)
```

In the next scatter plot example, we are going to add a regression line to the plot for each factor (category) also. Remember, we just add the color and shape arguments to the `geom_point()`

function:

In [13]:

```
gp + geom_point(aes(color = class,
shape = class)) +
geom_smooth(aes(color = class), method = "lm", se = FALSE)
```

We are adding a bivariate distribution on the scatter plot in R using the `geom_density2d()`

function.

In [14]:

```
gp <- ggplot(aes(x=wt, y=mpg),
data=mtcars)
gp + geom_point() + geom_density2d()
```

Let's carry out correlation analysis using R, extract the *r*– and *p*-values, and later learn how to add this as text to our scatter plot.

In [15]:

```
require(dplyr)
require(broom)
corr <- mtcars %$%
cor.test(mpg, wt) %>%
tidy %>%
mutate_if(is.numeric, round, 4)
corr
```

In [16]:

```
text = paste0('r = ', corr$estimate, ', ',
ifelse(corr$p.value <= 0,
'p < 0.05',
paste('p = ', corr$p.value))
)
text
```

We add text using the`annotate`

function.

In [17]:

```
gp <- ggplot(aes(x = wt, y = mpg),
data = mtcars)
gp + geom_point() + geom_smooth(method = "lm", se = FALSE) +
annotate('text', x = 4.5, y = 35, label=text)
```

In [18]:

```
require(tidyr)
require(purrr)
data(Burt, package = 'carData')
corr <- Burt %>% group_by(class) %>%
nest() %>%
mutate(Cor = map(data, ~ cor.test(.$IQbio, .$IQfoster)),
p = map_dbl(Cor, 'p.value'),
est = map_dbl(Cor, 'estimate')
) %>%
mutate_if(is.numeric, round, 4) %>%
select(class, p, est, Cor)
text <- corr %>%
mutate(
text = paste0('r = ', est, ', ',
ifelse(p <= 0.01,
'p < 0.05',
paste('p = ', p))))
Burt$class <- as.factor(Burt$class)
gp <- ggplot(aes(x = IQbio, y = IQfoster),
data = Burt)
corrp <- gp + geom_point(aes(color = class,
shape=class)) +
geom_smooth(aes(color = class), method = "lm", se = FALSE) +
geom_text(aes(x = 120, y = 137, color="high",
label=subset(text, class == "high")$text)) +
geom_text(aes(x = 118, y = 109, color="medium",
label=subset(text, class == "medium")$text)) +
geom_text(aes(x = 124, y = 103, color="low",
label=subset(text, class == "low")$text))
corrp
```

Here's how to rotate the axis labels

In [19]:

```
data(Salaries, package = "carData")
Salaries$rank <- as.factor(Salaries$rank)
gp <- ggplot(aes(x = salary, y = yrs.since.phd),
data = Salaries) +
geom_point(aes(color = rank,
shape = rank)) +
geom_smooth(method = "lm") +
scale_y_continuous(limits = c(0, 60)) +
scale_x_continuous(limits = c(50000, 240000),
breaks = seq(50000, 240000, by = 10000))
```

To rotate the axis do this:

In [20]:

```
gp + theme(axis.text.x =
element_text(angle = 90, hjust = 1))
```

Here we use the `theme_bw()`

function to get a dark-light themed plot. Then, we are going to make the scatter plot in black and grey colors using the `scale_colour_grey()`

function. Finally, we add a theme layer using the function `theme()`

.

The function `element_blank()`

will make draw “nothing” at that particular parameter. For instance, `plot.background = element_blank()`

will give the plot a blank (white) background.

In [21]:

```
corrp + theme_bw() + scale_colour_grey() +
theme(axis.line = element_line(colour = "black")
,plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
,strip.background = element_blank()
,panel.border = element_blank()
,legend.title=element_blank()
,legend.key = element_blank())
```

Let's create the *pairplots* using the package GGally.

In [22]:

```
require(GGally)
cols = c('mpg', 'wt', 'hp', 'qsec')
ggpairs(mtcars, columns = cols)
```

In this section, we are going to learn how to save ggplot2 plots as PDF and TIFF files.

In [23]:

```
data(Salaries, package = "carData")
gp <- ggplot(aes(x=yrs.since.phd, y=salary),
data=Salaries) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, colour="gray") +
theme_bw() +
theme(axis.line = element_line(colour = "black")
,plot.background = element_blank()
,panel.grid.major = element_blank()
,panel.grid.minor = element_blank()
,strip.background = element_blank()
,panel.border = element_blank()
,legend.title=element_blank()
,legend.key = element_blank()) +
xlab('Years since Ph.D.') +
ylab('Salary')
```

Now we can use the `ggsave()`

function to save the scatter plot.

Let's save a pdf!

In [24]:

```
ggsave("salaries_by_year_scatterplot.pdf", device = "pdf",
width = 12, height = 8,
units = "cm", dpi = 300)
```

Let's save a TIFF!

In [25]:

```
ggsave("salaries_by_year_scatterplot.tiff", device = "tiff",
width = 12, height = 8,
units = "cm", dpi = 300)
```