This notebook summarises the data available via the rnbn
package. The vignette provided with the package provides a better overview of the package --- this notebook just demonstrates what needs to be done to get started with it.
rnbn
is straight forward to get; we just need to install it from CRAN.
library(ggplot2)
library(ggmap)
library(dplyr)
install.packages('rnbn')
library(rnbn)
The NBN data service needs a username and password. The vignette contains instructions on how to get one. It's not strictly necessary to run nbnLogin
; all the functions in the package will call nbnLogin
automatically if you aren't already logged in.
In this case I'm hiding my details in my environment to stop them getting printed out to the console.
#Change these if running.
nbn.user <- Sys.getenv("NBN_USER")
nbn.password <- Sys.getenv("NBN_PASSWORD")
nbnLogin(nbn.user,nbn.password)
The NBN set is principally designed to log observations of species in different locations at different times. This is an abstraction over the underlying data generated from studies and data sets.
Data are principally organised by TVKs (Taxon Version Keys). A Taxon is simply a thing that can have observations logged about it. The package includes a function to get the TVKs for names of species to make the mapping easier.
This shows one of several views of the NBN data we can access; the taxon-level information which includes more details on the species and some aggregate stats.
head(getTVKQuery(query="badger"))
Lists are OK, but usually we just want the top result:
getTVKQuery(query="badger", top=T)
Armed with a Taxon Version Key, you can then interrogate the NBN database for observations of that species.
mushroom <- getTVKQuery(query="mushroom", top=T)
occurrences <- getOccurrences(gridRef='SK38',silent=T,acceptTandC=T)
nrow(occurrences)
We can plot this a little nicer using ggmap. Before we do, we need densities (there is a lot of overlap between points). We'll make a 10x10 grid and count the observations in each grid. To start with, compute the centres of each grid rectangle:
bins <- 10
x <- seq(from=min(occurrences$longitude), to=max(occurrences$longitude), length.out=bins)
y <- seq(from=min(occurrences$latitude), to=max(occurrences$latitude), length.out=bins)
binwidth.x <- (max(occurrences$longitude) - min(occurrences$longitude) ) / bins
binwidth.y <- (max(occurrences$latitude) - min(occurrences$latitude) ) / bins
binwidth <- c(binwidth.x, binwidth.y)
grid <- expand.grid(x=x,y=y)
head(grid)
Functions to give us the upper and lower bound of a bin given its centre-point:
bin.centre <- function(coord,d){ binwidth * floor(coord/binwidth[d]) }
bin.lower <- function(coord,d){ coord - binwidth[d]/2 }
bin.upper <- function(coord,d){ coord + binwidth[d]/2 }
bin.lower(x,1)
Perform the count for each square. We do this by subsetting occurrences by each region and taking the size of it (using sum
on the logical is faster than actually subsetting and calling length
).
density <- mapply(function(x,y){
with(occurrences,
list(x=x,y=y,z=sum(
longitude > bin.lower(x,1) &
longitude <= bin.upper(x,1) &
latitude > bin.lower(y,2) &
latitude <= bin.upper(y,2))
))}, x=grid$x,y=grid$y) %>% t
density <- as.data.frame(cbind(grid, z=unlist(density[,3])))
# density <- within(density, {
# lx = bin.lower(x)
# ly = bin.lower(y)
# ux = bin.upper(x)
# uy = bin.upper(y)
# }
# )
head(density)
How does that look?
ggplot(density, aes(fill=z)) +
geom_rect(aes(xmin=bin.lower(x,1),ymin=bin.lower(y,2),xmax=bin.upper(x,1),ymax=bin.upper(y,2))) +
scale_fill_distiller(palette = "Spectral")
Now we have our counts for each rectangle, we can plot them over a map. We'll use ggmap
for this:
location <- c(median(x),median(y))
map <- get_map(location, zoom=12, source="google")
ggmap(map) +
geom_rect(data=density,
inherit.aes=F,
aes(xmin=bin.lower(x,1),ymin=bin.lower(y,2),xmax=bin.upper(x,1),ymax=bin.upper(y,2),fill=z,alpha=z)) +
scale_fill_distiller(palette = "Spectral")