This is this first in a series of notebooks designed to show you how to download social media data. I assume you have already acquired your API keys from the Twitter developer site, as shown in this tutorial:
If you are new to Python, you may wish to go through a series of tutorials I have created in order. If you don't wish to do all the tutorials you should at least ensure you have your Twitter API key and you've set up Python on your computer as shown in the tutorial Setting up Your Computer to Use My Python Code.
In this notebook I will show you how to use the API keys you've acquired. I'll also show you the difference between OAuth1 and OAuth2 authentication.
I use the twython package as my Python interface with the Twitter API: https://twython.readthedocs.io/en/latest/usage/starting_out.html
Let's import the twython package, which we installed earlier using pip install twython from the command line.
from twython import Twython
Twitter and other social media platforms use a form of authentication known as 'OAuth' authentication. There are two types. OAuth1 authentication on Twitter is a user-level authentication. For this you'll use the four-part password you generated in the tutorial Setting up Access to the Twitter API
Your APP_KEY and APP_SECRET will not change
APP_KEY = 'YOUR APP KEY'
APP_SECRET = 'YOUR APP SECRET'
These two change whenever you 'regenerate' them. You can use the values you generated from the prior tutorial noted above.
OAUTH_TOKEN = 'YOUR OAUTH TOKEN'
OAUTH_TOKEN_SECRET = 'YOUR OAUTH SECRET'
twitter = Twython(APP_KEY, APP_SECRET,
OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
We can now use our API keys to access the Twitter API. We'll do this with a simple example. Namely, let's check our API rate limit. Most APIs apply a window-based rate limit -- a limit on the amount of data you can download in any time time span. On Twitter, these are 15-minute windows.
What I do in the following code block is use the twitter variable set above (which contains our four-part password) combined with the get_application_rate_limit_status() function in order to see how many API calls we have remaining in the current window. There are many different parts of the API; I'll access one of them here -- the 'search' API, which you would use, for example, to download all tweets containing a specific search term or hashtag, etc. Don't worry about learning all of this for now; instead, just go with the flow. For details on what twython is doing you can look here: https://twython.readthedocs.io/en/latest/usage/starting_out.html
Note the rate limit of 180 API calls. This means we can make 180 different calls to the API within the current 15-minute window. With the search API we can access 100 tweets per call. This means that, if we were downloading tweets with a specific hashtag, such as #arnova16, we could download 180 $\times$ 100 or 18,000 tweets per window. We will eventually get into this in a later tutorial.
twitter.get_application_rate_limit_status()['resources']['search']
{u'/search/tweets': {u'limit': 180, u'remaining': 180, u'reset': 1479421815}}
The other authentication method is called OAuth2; this is app-level authention. It's a simpler method and will generally also allow you better rate limits on Twitter. First, though, you'll have to generate an ACCESS_TOKEN.
APP_KEY = 'YOUR APP KEY' #SAME AS ABOVE
APP_SECRET = 'YOUR APP SECRET' #SAME AS ABOVE
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
I don't show it here, but you're ACCESS_TOKEN will print out. Copy and paste into the 'YOUR ACCESS TOKEN' code block below.
print ACCESS_TOKEN
APP_KEY = 'YOUR APP KEY' #SAME AS ABOVE
ACCESS_TOKEN = 'YOUR ACCESS TOKEN' #COPY AND PASTE FROM OUTPUT FROM ABOVE COMMAND
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)
OAuth1 will give you user access to the API, whereas OAuth2 will give you app access. For academic use the rate limits are generally better for OAuth2 (app) authentication, with a few exceptions. For a chart showing the API limits for user and app authentication for the various parts of the Twitter API, see this chart: https://dev.twitter.com/rest/public/rate-limits
Running the code block below shows that we now have a rate limit of 450 API calls. This means we can make 450 different calls to the API within the current 15-minute window. With the search API we can access 100 tweets per call. This means that, if we were downloading tweets with a specific hashtag, such as #arnova16, we could download 450 $\times$ 100 or 45,000 tweets per window. This is much better than the 18,000 tweets we could access using the OAuth1 or user authentication.
twitter.get_application_rate_limit_status()['resources']['search']
{u'/search/tweets': {u'limit': 450, u'remaining': 450, u'reset': 1479692706}}
For more Notebooks as well as additional Python and Big Data tutorials, please visit http://social-metrics.org or follow me on Twitter @gregorysaxton