In this brief tutorial, we are going to see how to make use of the different probability distributions in R.
There is one simple rule here, that no matter what distribution we’re talking about, there will ALWAYS be a d function, a p function, a q function and a r function, each representing the following:
We'll understand all these better now with several examples, which you guys will complete in class.
The binomial distribution models the distribution of the number of sucesses of a given outcome after a certain number of trials.
In R, for this distribution we will have:
dbinom
pbinom
rbinom
qbinom
args(dbinom)
args(pbinom)
args(rbinom)
args(qbinom)
function (x, size, prob, log = FALSE)
NULL
function (q, size, prob, lower.tail = TRUE, log.p = FALSE)
NULL
function (n, size, prob)
NULL
function (p, size, prob, lower.tail = TRUE, log.p = FALSE)
NULL
Here, size and prob are the parameters of the binomial distribution. Let's see documentation to see the meaning of the rest of arguments.
?dbinom
For example, say that we have a coin that we toss three times. Then, the probability of getting two heads in these tosses would be:
(0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5)
This is the same as using dbinom
as follows:
# i.e. 0.5 * 0.5, as expected
dbinom(2, size=3, prob=0.5)
For example, say we want to calculate the probability of getting two heads or lower. For this, instead of the previous probability, we should be summing the probabilities of getting no heads and one head.
p.0<- (0.5 * 0.5 * 0.5) # prbability of no heads
p.1<- (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) # prability of one head
p.2<- (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) # probability of two heads
p.0 + p.1 + p.2
But the above is the same as computing the cumulative probability, so we should be able to get it with pbinom
.
pbinom(2, size=3, prob=0.5)
In the following, we are going to put into practice all this considering a dice instead of a coin, and later a gaussian distribution, but the logic behind the use of the above functions will be always the same.
dnorm
pnorm
rnorm
qnorm
args(dnorm)
args(pnorm)
args(rnorm)
args(qnorm)
function (x, mean = 0, sd = 1, log = FALSE)
NULL
function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
NULL
function (n, mean = 0, sd = 1)
NULL
function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
NULL
m<-0 # mean
s<-5 # standard deviation
# Just to visualize what we are saying
hist(rnorm(1000, mean = m, sd = s))
Finally, we said in the lectures that the student´s t-distribution looks like a gaussian distribution but heavier tails. Let's see this by generating 1000 random numbers under both distributions and plotting their density curves together, using the geom_density function we´ve just learned.