Exploring White House Visitor Data

In this project I will provide a brief analysis of 2015 White House Visitor Access Records. I will focus on the date and time data to answer the following questions:

How many visitors did the White house have per month?

What is the earliest and latest time of appointments?

What are the minimum and maximum length of appointments?

Why is this information important and what else can be analyzed from this data?

To accomplish this I will do the following:

  1. Explore the Data
  2. Clean the Data
  3. Analyze the Data
  4. Make Conclusions from my analysis

1. Explore the Data

In [1]:
from csv import reader
opened_file = open('potus_2015.csv')
read_file = reader(opened_file)
potus = list(read_file)
potus_h = potus[0]
potus = potus[1:]
In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(potus_h)
print('\n')
explore_data(potus, 0, 5, True)
['name', 'appt_made_date', 'appt_start_date', 'appt_end_date', 'visitee_namelast', 'visitee_namefirst', 'meeting_room', 'description']


['Joshua T. Blanton', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard']


['Jack T. Gutting', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard']


['Bradley T. Guiles', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard']


['Loryn F. Grieb', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard']


['Travis D. Gordon', '2014-12-18T00:00:00', '1/6/15 9:30', '1/6/15 23:59', '', 'potus', 'west wing', 'JointService Military Honor Guard']


Number of rows: 47954
Number of columns: 8

2. Clean the Data

In [3]:
def process_time(start):
    if ':' not in start:
        start_2 = start + ' 12:00'
        start = start_2
    else:
        return start
    return start

for row in potus:
    start = row[2]
    start = process_time(start)
    row[2] = start

start = []
for row in potus:
    start.append(row[2])
    
print(start[:5])
['1/6/15 9:30', '1/6/15 9:30', '1/6/15 9:30', '1/6/15 9:30', '1/6/15 9:30']
In [4]:
def process_time(end):
    if ':' not in end:
        end_2 = end + ' 12:00'
        end = end_2
    else:
        return end
    return end

for row in potus:
    end = row[3]
    end = process_time(end)
    row[3] = end

end = []
for row in potus:
    end.append(row[3])
    
print(end[:5])
['1/6/15 23:59', '1/6/15 23:59', '1/6/15 23:59', '1/6/15 23:59', '1/6/15 23:59']
In [5]:
for row in potus:
    start = row[2]
    start = start.replace("2015-10-01","10/1/2015")
    start = start.replace(":00:00",":00")
    row[2] = start
    end = row[3]
    end = end.replace("2015-10-01","10/1/2015")
    end = end.replace(":00:00",":00")
    row[3] = end
In [6]:
import datetime as dt
In [7]:
date_format = "%m/%d/%y %H:%M"
for row in potus:
    appt_start_date = row[2]
    appt_start_date = dt.datetime.strptime(appt_start_date, date_format)
    row[2] = appt_start_date
print(type(appt_start_date))

date_format = "%m/%d/%y %H:%M"
for row in potus:
    appt_end_date = row[3]
    appt_end_date = dt.datetime.strptime(appt_end_date, date_format)
    row[3] = appt_end_date
print(type(appt_end_date))
<class 'datetime.datetime'>
<class 'datetime.datetime'>

3. Analyze the Data

How many visitors did the White house have per month?

In [8]:
visitors_per_month = {}
for row in potus:
    appt_start_date_obj = row[2]
    appt_start_date_str = appt_start_date_obj.strftime("%B, %y")
    if appt_start_date_str not in visitors_per_month:
         visitors_per_month[appt_start_date_str] = 1
    else:
        visitors_per_month[appt_start_date_str] += 1
print(visitors_per_month)
{'January, 15': 1248, 'February, 15': 2165, 'March, 15': 2262, 'April, 15': 4996, 'May, 15': 3013, 'June, 15': 7743, 'July, 15': 2930, 'August, 15': 1350, 'September, 15': 4416, 'October, 15': 3669, 'November, 15': 1133, 'December, 15': 13029}
In [9]:
appt_times = []
for row in potus:
    appt_dt = row[2]
    app_t = appt_dt.time()
    appt_times.append(app_t)
print(appt_times[:3])
[datetime.time(9, 30), datetime.time(9, 30), datetime.time(9, 30)]

What is the earliest and latest time of appointments?

In [10]:
min_time = min(appt_times)
print(min_time)
max_time = max(appt_times)
print(max_time)
06:00:00
21:30:00

What are the minimum and maximum length of appointments?

In [11]:
for row in potus:
    start_date = str(row[2])
    start_date = dt.datetime.strptime(start_date, "%m/%d/%y %H:%M")
    row[2] = start_date
    
for row in potus:
    end_date = str(row[3])
    end_date = dt.datetime.strptime(end_date, "%m/%d/%y %H:%M")
    row[3] = end_date

appt_lengths = {}

for row in potus:
    start_date = row[2]
    end_date = row[3]
    length = end_date - start_date
    if length not in appt_lengths:
        appt_lengths[length] = 1
    else:
        appt_lengths[length] += 1

min_length = min(appt_lengths)
max_length = max(appt_lengths)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-5fa6c4ae9aae> in <module>
      1 for row in potus:
      2     start_date = str(row[2])
----> 3     start_date = dt.datetime.strptime(start_date, "%m/%d/%y %H:%M")
      4     row[2] = start_date
      5 

~/anaconda3/lib/python3.7/_strptime.py in _strptime_datetime(cls, data_string, format)
    575     """Return a class cls instance based on the input string and the
    576     format string."""
--> 577     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    578     tzname, gmtoff = tt[-2:]
    579     args = tt[:6] + (fraction,)

~/anaconda3/lib/python3.7/_strptime.py in _strptime(data_string, format)
    357     if not found:
    358         raise ValueError("time data %r does not match format %r" %
--> 359                          (data_string, format))
    360     if len(data_string) != found.end():
    361         raise ValueError("unconverted data remains: %s" %

ValueError: time data '2015-01-06 09:30:00' does not match format '%m/%d/%y %H:%M'

4. Conclusion