%matplotlib inline
from IPython.display import HTML
import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style('white')
sns.set_context('talk', font_scale=1.2)
HTML('<blockquote class="twitter-tweet" lang="he"><p lang="en" dir="ltr">Hmm, I don't know about this caterpillar rearing manual. I thought P.rapae had an obligate association w/ Brassica. <a href="http://t.co/M10dqbOYlN">pic.twitter.com/M10dqbOYlN</a></p>— Christie Bahlai (@cbahlai) <a href="https://twitter.com/cbahlai/status/597462491166150656">מאי 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
Hmm, I don't know about this caterpillar rearing manual. I thought P.rapae had an obligate association w/ Brassica. pic.twitter.com/M10dqbOYlN
— Christie Bahlai (@cbahlai) מאי 10, 2015
HTML('<blockquote class="twitter-tweet" lang="he"><p lang="en" dir="ltr">This is a terrible dataset about caterpillar diet. How did it got published? <a href="http://t.co/XkAq51HxEP">pic.twitter.com/XkAq51HxEP</a></p>— Timothée Poisot (@tpoi) <a href="https://twitter.com/tpoi/status/591041490618552320">אפריל 23, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
This is a terrible dataset about caterpillar diet. How did it got published? pic.twitter.com/XkAq51HxEP
— Timothée Poisot (@tpoi) אפריל 23, 2015
HTML('<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="und" dir="ltr"><a href="https://twitter.com/tpoi">@tpoi</a> <a href="https://twitter.com/kara_woo">@kara_woo</a> <a href="https://twitter.com/cbahlai">@cbahlai</a> <a href="http://t.co/5lj9EzuKjW">pic.twitter.com/5lj9EzuKjW</a></p>— Yoav Ram (@yoavram) <a href="https://twitter.com/yoavram/status/597518650082365440">May 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
@tpoi @kara_woo @cbahlai pic.twitter.com/5lj9EzuKjW
— Yoav Ram (@yoavram) May 10, 2015
HTML('<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="en" dir="ltr">[blog] How hungry are caterpillars anyway? <a href="http://t.co/SvImkHYHhR">http://t.co/SvImkHYHhR</a> <a href="https://twitter.com/hashtag/opendata?src=hash">#opendata</a></p>— Timothée Poisot (@tpoi) <a href="https://twitter.com/tpoi/status/597518409203589122">May 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
[blog] How hungry are caterpillars anyway? http://t.co/SvImkHYHhR #opendata
— Timothée Poisot (@tpoi) May 10, 2015
We will learn how to use the Global Biotic Interactions (globi) API with Python to check How hungry are caterpillars anyway? (sort of).
First, have a look at the API and the API docs. It is a RESTful API that returns responses in JSON format over HTTP.
dict
.Let's try it, following Poisot's lead on The Very Hungry Caterpillar.
We will use requests - a Python HTTP library for humans.
response = requests.get("http://api.globalbioticinteractions.org/interaction?sourceTaxon=Pieris&interactionType=eats")
print("OK:", response.ok)
OK: True
The respose payload is in JSON format. Calling the json
method will return the payload as a dict
:
payload = response.json()
print(len(payload))
print(payload.keys())
2 dict_keys(['columns', 'data'])
The response has two fields, columns
and data
, corresponding to the data frame's column names and rows. That's great because we can push it right into a pandas.DataFrame
:
print(payload['columns'])
['source_taxon_external_id', 'source_taxon_name', 'source_taxon_path', 'source_specimen_life_stage', 'source_specimen_basis_of_record', 'interaction_type', 'target_taxon_external_id', 'target_taxon_name', 'target_taxon_path', 'target_specimen_life_stage', 'target_specimen_basis_of_record', 'latitude', 'longitude', 'study_title']
print(payload['data'][0])
['EOL:174006', 'Pieris marginalis', 'Animalia | Bilateria | Protostomia | Ecdysozoa | Arthropoda | Hexapoda | Insecta | Pterygota | Neoptera | Holometabola | Lepidoptera | Papilionoidea | Pieridae | Pierinae | Pierini | Pierina | Pieris | Pieris marginalis', None, None, 'eats', 'EOL:29914', 'Rubus', 'Plantae | Tracheophyta | Magnoliopsida | Rosales | Rosaceae | Rubus | Rubus status', None, None, None, None, None]
df = pd.DataFrame(data['data'], columns=data['columns'])
print(df.shape)
df.head()
(232, 14)
source_taxon_external_id | source_taxon_name | source_taxon_path | source_specimen_life_stage | source_specimen_basis_of_record | interaction_type | target_taxon_external_id | target_taxon_name | target_taxon_path | target_specimen_life_stage | target_specimen_basis_of_record | latitude | longitude | study_title | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EOL:174006 | Pieris marginalis | Animalia | Bilateria | Protostomia | Ecdysozoa... | None | None | eats | EOL:29914 | Rubus | Plantae | Tracheophyta | Magnoliopsida | Rosal... | None | None | None | None | None |
1 | EOL:174006 | Pieris marginalis | Animalia | Bilateria | Protostomia | Ecdysozoa... | None | None | eats | EOL:37457 | Arabis | Plantae | Tracheophyta | Magnoliopsida | Brass... | None | None | None | None | None |
2 | EOL:174006 | Pieris marginalis | Animalia | Bilateria | Protostomia | Ecdysozoa... | None | None | eats | EOL:37718 | Rorippa | Plantae | Tracheophyta | Magnoliopsida | Brass... | None | None | None | None | None |
3 | EOL:174006 | Pieris marginalis | Animalia | Bilateria | Protostomia | Ecdysozoa... | None | None | eats | EOL:37667 | Cardamine | Plantae | Tracheophyta | Magnoliopsida | Brass... | None | None | None | None | None |
4 | EOL:176683 | Pieris rapae | Animalia | Arthropoda | Insecta | Lepidoptera ... | None | None | eats | EOL:467679 | Centaurea melitensis | Plantae | Tracheophyta | Magnoliopsida | Aster... | None | None | None | None | None |
Let's see what each caterpillar eats. We got the eats
interactions, so let's just leave the source and target taxons:
cols = df.columns.tolist()
cols.remove('source_taxon_name')
cols.remove('target_taxon_name')
print(cols)
['source_taxon_external_id', 'source_taxon_path', 'source_specimen_life_stage', 'source_specimen_basis_of_record', 'interaction_type', 'target_taxon_external_id', 'target_taxon_path', 'target_specimen_life_stage', 'target_specimen_basis_of_record', 'latitude', 'longitude', 'study_title']
df.drop(labels=cols, axis=1, inplace=True)
df.head()
source_taxon_name | target_taxon_name | |
---|---|---|
0 | Pieris marginalis | Rubus |
1 | Pieris marginalis | Arabis |
2 | Pieris marginalis | Rorippa |
3 | Pieris marginalis | Cardamine |
4 | Pieris rapae | Centaurea melitensis |
Next, we count how many target taxons occur for each source taxon. For that, we group by source and aggregate by length (I made sure before that each source-target pair appears only once. How??).
The groupby
made source_taxon_name
become an index rather than a column and that's why we call reset_index
.
table = df.groupby(by='source_taxon_name').aggregate(len).reset_index()
table.head()
source_taxon_name | target_taxon_name | |
---|---|---|
0 | Pieris brassicae | 55 |
1 | Pieris brassicoides | 3 |
2 | Pieris canidia | 10 |
3 | Pieris cheiranthi | 1 |
4 | Pieris deota | 1 |
Finally we rename the columns to make them more meaningful and we sort the table by the number of target taxons. Then we print and plot:
table = table.rename(columns={'source_taxon_name':'Pieris species', 'target_taxon_name': 'Number of known items in diet'})
table = table.sort('Number of known items in diet', ascending=False)
table
Pieris species | Number of known items in diet | |
---|---|---|
12 | Pieris rapae | 91 |
0 | Pieris brassicae | 55 |
11 | Pieris napi | 51 |
2 | Pieris canidia | 10 |
13 | Pieris virginiensis | 6 |
8 | Pieris marginalis | 4 |
1 | Pieris brassicoides | 3 |
6 | Pieris krueperi | 3 |
5 | Pieris ergane | 2 |
7 | Pieris mannii | 2 |
10 | Pieris naganum | 2 |
3 | Pieris cheiranthi | 1 |
4 | Pieris deota | 1 |
9 | Pieris melete | 1 |
table.plot(x="Pieris species", y="Number of known items in diet", kind="barh", legend=False)
plt.ylabel('Number of known items in diet')
plt.grid(False)
sns.despine()
Pieris rapae
Pieris brassicae