This notebook assumes that you've already created the file roman.db
in part one of the [intro to sql](intro to sql.ipynb) notebook. It shows how to load a sqlite database into R.
# first we get the packages we need:
library(DBI)
library(RSQLite)
# now we open the connection:
con = dbConnect(SQLite(), dbname="roman.db")
# we can see what's inside, eg, what tables are in the database?
alltables = dbListTables(con)
alltables
# write the query to get the information you want
myQuery <- dbSendQuery(con, "SELECT * FROM amphi WHERE chronogroup = 'flavian'")
# pass that information to an R object. The n = -1 bit means grab everything until there's nothing left to grab. Otherwise, you can specify how many rows etc.
my_data <- dbFetch(myQuery, n = -1)
# You will get some warning messages as R makes sure that the data in each column is of the same type.
# Now to view your dataframe, we just ask:
my_data
# now that we're done, clear cache
# so that we don't eat up all the memory
dbClearResult(myQuery)
# now carry on and begin manipulating my_data
# for more information see
# http://tiffanytimbers.com/querying-sqlite-databases-from-r/
# also perhaps this https://www.r-bloggers.com/using-sqlite-in-r/
# let's make a new dataframe with just the elevation and capacity columns
# you could have done this with an SQL query from our original database.
# here we're just going to extract from the dataframe we created above
amphi_stats <- subset(my_data, select=c("id","elevation","capacity"))
#"create new datafram called amph_stats and pass into it a subset of my_data, returning the array id, elevation, capacity"
amphi_stats
id | elevation | capacity |
---|---|---|
arlesAmphitheater | 21 | 20000 |
romeFlavianAmphitheater | 22 | 50000 |
newsteadAmphitheater | 83 | 1500 |
pozzuoliFlavianAmphitheater | 41 | 35700 |
segobrigaAmphitheater | 817 | 7500 |
parisAmphitheater | 46 | 15000 |
nimesAmphitheater | 50 | 24000 |
chesterAmphitheater | 23 | 0 |
arezzoAmphitheater | 263 | 8000 |
londonAmphitheater | 29 | 6500 |
urbisagliaAmphitheater | 219 | 5150 |
vindonissaAmphitheater | 361 | 15142 |
frejusAmphitheater | 12 | 12000 |
tebessaAmphitheater | 877 | 9400 |
aquileiaAmphitheater | 2 | 0 |
terminiImereseAmphitheater | 76 | 0 |
bolsenaAmphitheater | 417 | 0 |
narbonneAmphitheater | 14 | 0 |
capuaImperialAmphitheater | 34 | 37000 |
dorchesterAmphitheater | 78 | 0 |
autunAmphitheater | 317 | 0 |
bobadelaAmphitheater | 383 | 0 |
chichesterAmphitheater | 14 | 0 |
At this point, now that you know how to get your data out of the database, you could do a number of other kinds of data manipulations; it really depends on the kind of investigation you wish to do. Consult Marwick et al How to Do Archaeological Science Using R.
We'll end this notebook with a simple visual plot of the data.
# a simple x,y plot
# we can call for the columns in the dataframe by refering to the name of the frame and using the $ to specify the column therein.
plot(amphi_stats$elevation, amphi_stats$capacity)
# but that's ugly. let's make it pretty
# and write the result to file.
# Get the input values.
input <- amphi_stats[,c('elevation','capacity')]
# Give the chart file a name.
png(file = "R-scatterplot.png")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$elevation,y = input$capacity,
xlab = "Elevation",
ylab = "Capacity",
xlim = c(0,1000),
ylim = c(500,60000),
main = "Elevation vs Capacity in Flavian Amphitheatres"
)
# Save the file.
dev.off()
The scatterplot is now available if you right-click on 'jupyter' at the top of the window and open in a new browser window; you'll see it in the list of files.