Exploring Hacker News Posts

This project will explore data obtained from Hacker News, a site built by the Y Combinator startup incubator. Through Hacker News, people can post tech-related submissions and receive feedback in the form of votes and comments. Hacker News consists of two types of posts: ask and show. Through data analysis, the more popular type of post will be identified, and the relationship between time of day and post popularity will be determined.

Initialization

Convert the raw hacker news .csv file into a workable list of lists. Print the first five entries

In [1]:
from csv import reader
hn = list(reader(open("hacker_news.csv")))

for row in hn[0:5]:
    print(row,"\n")
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] 

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] 

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] 

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] 

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'] 

Display column headers.

In [2]:
headers = hn[0]
print(headers)
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

Remove the column headers from the dataset.

In [3]:
hn = hn[1:]
print(hn[0:5])
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]

Data Filtration

Filter out any posts that do not start with either Ask HN or Show HN.

Categorize posts into ask_posts, show_posts, and other_posts lists.

In [5]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    row[1] = row[1].lower()
    
    if row[1].startswith("ask hn"):
        ask_posts.append(row)
    elif row[1].startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
1744
1162
17194

Data Analysis

Determine whether ask or show posts receive more comments on average to determine popularity.

In [7]:
def avg_finder(list_n):
    total_comments = 0
    for row in list_n:
        total_comments += int(row[4])
        
    return total_comments / len(list_n)

ask_avg = avg_finder(ask_posts)
show_avg = avg_finder(show_posts)

print(ask_avg)
print(show_avg)
14.038417431192661
10.31669535283993

Ask posts receive almost four more comments than show posts on average.

Determine whether ask posts posted at certain times are more likely to receive more comments.

In [20]:
import datetime as dt

ask_hour_freq = {}
ask_hour_num = {}
for row in ask_posts:
    dt_obj = dt.datetime.strptime(row[6], "%m/%d/%Y %H:%M")
    hour = dt_obj.hour
    
    if hour in ask_hour_freq:
        ask_hour_freq[hour] += int(row[4])
        ask_hour_num[hour] += 1
    else:
        ask_hour_freq[hour] = int(row[4])
        ask_hour_num[hour] = 1
        
for key in ask_hour_freq:
    ask_hour_freq[key] = ask_hour_freq[key] / ask_hour_num[key]
    
print(ask_hour_freq)
{9: 5.5777777777777775, 13: 14.741176470588234, 10: 13.440677966101696, 14: 13.233644859813085, 16: 16.796296296296298, 23: 7.985294117647059, 12: 9.41095890410959, 17: 11.46, 15: 38.5948275862069, 21: 16.009174311926607, 20: 21.525, 2: 23.810344827586206, 18: 13.20183486238532, 3: 7.796296296296297, 5: 10.08695652173913, 19: 10.8, 1: 11.383333333333333, 22: 6.746478873239437, 8: 10.25, 4: 7.170212765957447, 0: 8.127272727272727, 6: 9.022727272727273, 7: 7.852941176470588, 11: 11.051724137931034}

Sort frequency table in descending order.

In [21]:
ask_hour_list = []
for key in ask_hour_freq:
    ask_hour_list.append((ask_hour_freq[key], key))
    
ask_hour_list = sorted(ask_hour_list, reverse = True)
for entry in ask_hour_list:
    print(entry[1], ":", entry[0])
15 : 38.5948275862069
2 : 23.810344827586206
20 : 21.525
16 : 16.796296296296298
21 : 16.009174311926607
13 : 14.741176470588234
10 : 13.440677966101696
14 : 13.233644859813085
18 : 13.20183486238532
17 : 11.46
1 : 11.383333333333333
11 : 11.051724137931034
19 : 10.8
8 : 10.25
5 : 10.08695652173913
12 : 9.41095890410959
6 : 9.022727272727273
0 : 8.127272727272727
23 : 7.985294117647059
7 : 7.852941176470588
3 : 7.796296296296297
4 : 7.170212765957447
22 : 6.746478873239437
9 : 5.5777777777777775

Identify the five hours with the highest average number of comments per Ask HN post.

In [23]:
for entry in ask_hour_list[:5]:
    hour = dt.time(entry[1])
    hour_string = hour.strftime("%H")
    print("{}:00: {:.2f} average comments per post".format(hour_string, entry[0]))
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post

Dataset created in Eastern Time US time zone. Shift to Denver time zone.

Get list of all time zones to get string required for Denver.

In [24]:
import pytz
for timezone in pytz.all_timezones_set:
    print(timezone)
America/Whitehorse
WET
Asia/Tbilisi
Greenwich
America/Indiana/Winamac
Iceland
Antarctica/Vostok
Asia/Choibalsan
Europe/Uzhgorod
Antarctica/Palmer
Africa/Bamako
America/Argentina/Rio_Gallegos
Atlantic/Azores
Asia/Kabul
America/Iqaluit
Africa/Johannesburg
America/Kralendijk
Atlantic/Cape_Verde
Turkey
Africa/Malabo
America/Punta_Arenas
Africa/Kinshasa
Asia/Bangkok
Asia/Kuching
Etc/GMT+7
Pacific/Nauru
Atlantic/Jan_Mayen
Asia/Aqtobe
America/La_Paz
Chile/EasterIsland
Asia/Ashgabat
GMT
Kwajalein
America/Grenada
Australia/West
Africa/Addis_Ababa
Etc/GMT+9
Eire
US/Indiana-Starke
Africa/Asmara
PST8PDT
Pacific/Pago_Pago
Asia/Dushanbe
America/Toronto
Atlantic/Bermuda
Africa/Porto-Novo
Etc/GMT-6
America/Swift_Current
America/Matamoros
America/Fort_Nelson
Asia/Tokyo
Europe/Vienna
Asia/Nicosia
America/Cuiaba
Europe/Guernsey
America/Guadeloupe
Asia/Vientiane
Africa/Dar_es_Salaam
Asia/Dili
Europe/Bratislava
America/Indiana/Vincennes
Canada/Saskatchewan
Asia/Anadyr
Pacific/Wake
Poland
America/Yakutat
America/Jamaica
Europe/Isle_of_Man
America/Coral_Harbour
Africa/Banjul
America/Mazatlan
Pacific/Kosrae
Asia/Jakarta
Europe/Simferopol
Africa/El_Aaiun
Asia/Harbin
Asia/Saigon
Indian/Mayotte
Asia/Qatar
America/Nassau
Etc/GMT+1
Etc/UTC
America/Mendoza
Atlantic/Faeroe
Pacific/Wallis
America/Thule
Africa/Luanda
Africa/Djibouti
America/St_Lucia
ROK
Asia/Kolkata
America/Regina
Africa/Freetown
Pacific/Guam
Asia/Kuwait
Asia/Barnaul
Etc/Universal
Pacific/Noumea
Pacific/Ponape
Africa/Conakry
America/Belem
America/Denver
Asia/Yakutsk
Asia/Ulaanbaatar
Asia/Aden
Navajo
America/Argentina/Buenos_Aires
Antarctica/Mawson
Asia/Pontianak
EET
America/Nipigon
America/Danmarkshavn
Canada/Atlantic
Asia/Katmandu
Etc/GMT-9
America/Argentina/Catamarca
Europe/Sarajevo
America/Buenos_Aires
America/St_Barthelemy
Indian/Mahe
NZ
America/Montevideo
Pacific/Saipan
Asia/Urumqi
America/Bahia_Banderas
Australia/Adelaide
US/East-Indiana
US/Aleutian
Pacific/Majuro
Asia/Hebron
Arctic/Longyearbyen
Asia/Beirut
Pacific/Funafuti
Africa/Ouagadougou
Asia/Baku
Asia/Singapore
Australia/Melbourne
Africa/Tripoli
Pacific/Bougainville
America/Jujuy
Europe/Moscow
US/Hawaii
America/Argentina/Mendoza
CET
Pacific/Tongatapu
America/Rio_Branco
Europe/Belfast
America/Indianapolis
America/Havana
Asia/Chita
Etc/GMT-14
America/Tegucigalpa
America/Bahia
America/Araguaina
Etc/UCT
America/Montreal
Europe/Gibraltar
Asia/Tashkent
America/Pangnirtung
America/Lower_Princes
Europe/Budapest
Antarctica/Syowa
Indian/Mauritius
Africa/Maputo
Europe/Zagreb
Pacific/Samoa
US/Eastern
America/Cambridge_Bay
Etc/GMT+4
Africa/Nairobi
Africa/Timbuktu
Africa/Lagos
Africa/Lubumbashi
Asia/Rangoon
America/Nome
Africa/Ceuta
Atlantic/Canary
Pacific/Apia
Europe/London
Indian/Antananarivo
Australia/Queensland
Antarctica/McMurdo
Canada/Newfoundland
Asia/Thimbu
Asia/Ashkhabad
America/Argentina/Salta
Asia/Novosibirsk
Pacific/Chatham
America/Inuvik
Australia/Sydney
HST
America/Edmonton
Europe/Tallinn
Pacific/Yap
Etc/GMT+6
America/Nuuk
Antarctica/Rothera
Atlantic/Madeira
America/Antigua
Europe/Zurich
Pacific/Norfolk
US/Alaska
Brazil/East
America/Moncton
CST6CDT
Pacific/Rarotonga
Pacific/Chuuk
America/Asuncion
Europe/Ulyanovsk
Asia/Sakhalin
Asia/Macau
Africa/Khartoum
America/Cancun
Antarctica/DumontDUrville
Antarctica/Macquarie
Africa/Algiers
MST7MDT
America/Noronha
Asia/Riyadh
Europe/Kirov
Africa/Bissau
Asia/Ho_Chi_Minh
Etc/GMT-11
America/Barbados
America/Thunder_Bay
GMT0
America/Grand_Turk
Asia/Gaza
Asia/Calcutta
Europe/Kaliningrad
Universal
Australia/South
America/Santarem
America/Paramaribo
Canada/Central
Japan
Asia/Dhaka
Europe/Copenhagen
America/Ensenada
America/Anguilla
Europe/Tirane
Etc/Greenwich
Asia/Amman
America/Sitka
America/Shiprock
Europe/Kiev
America/Catamarca
Asia/Tel_Aviv
Europe/Warsaw
America/Guayaquil
Cuba
America/St_Vincent
America/St_Kitts
Australia/LHI
Africa/Gaborone
Africa/Lome
America/Creston
Canada/Yukon
America/Sao_Paulo
America/Managua
America/Tortola
Africa/Accra
Europe/Dublin
Europe/Volgograd
W-SU
Asia/Istanbul
Asia/Hong_Kong
Pacific/Fiji
Chile/Continental
GB-Eire
Indian/Kerguelen
Pacific/Niue
Africa/Blantyre
Europe/Belgrade
America/Porto_Acre
US/Samoa
Africa/Brazzaville
Australia/Currie
America/Fortaleza
America/Santiago
America/Phoenix
Europe/Mariehamn
UTC
Pacific/Easter
Pacific/Marquesas
America/Halifax
Europe/Stockholm
Canada/Pacific
America/Fort_Wayne
America/Rankin_Inlet
Asia/Jerusalem
Europe/Madrid
America/Argentina/San_Luis
Africa/Ndjamena
Brazil/West
Europe/Vatican
America/Blanc-Sablon
Australia/NSW
Africa/Mbabane
America/Cayman
Etc/GMT-13
America/Indiana/Tell_City
Asia/Qostanay
America/Curacao
America/Glace_Bay
America/Argentina/Ushuaia
US/Pacific
America/Eirunepe
Atlantic/Reykjavik
NZ-CHAT
Asia/Kathmandu
Australia/Lord_Howe
America/Rosario
Asia/Seoul
America/Guyana
America/Manaus
Asia/Jayapura
Etc/Zulu
Asia/Atyrau
Indian/Chagos
Europe/San_Marino
Africa/Juba
Africa/Maseru
Pacific/Midway
Antarctica/Casey
Canada/Mountain
Europe/Athens
ROC
America/Martinique
Indian/Reunion
Asia/Magadan
Portugal
Europe/Saratov
Asia/Chongqing
America/Argentina/ComodRivadavia
Africa/Libreville
Asia/Aqtau
Europe/Bucharest
Asia/Bahrain
Europe/Luxembourg
Australia/Lindeman
Asia/Pyongyang
Etc/GMT
Europe/Nicosia
Pacific/Tarawa
Etc/GMT-4
Europe/Amsterdam
America/Argentina/La_Rioja
MET
Africa/Tunis
Asia/Ust-Nera
America/Miquelon
Asia/Shanghai
Asia/Almaty
Asia/Tehran
Libya
Etc/GMT-2
America/Belize
America/Puerto_Rico
Pacific/Gambier
Asia/Vladivostok
America/Metlakatla
Europe/Astrakhan
Asia/Irkutsk
Asia/Yekaterinburg
Australia/Darwin
America/Atikokan
Pacific/Enderbury
Asia/Dacca
Pacific/Fakaofo
America/Atka
Asia/Makassar
Pacific/Guadalcanal
Asia/Phnom_Penh
Asia/Ulan_Bator
Asia/Yerevan
America/Mexico_City
Europe/Zaporozhye
Europe/Andorra
Pacific/Auckland
Europe/Brussels
Asia/Macao
Asia/Colombo
Etc/GMT+2
Pacific/Johnston
Asia/Hovd
America/Scoresbysund
Egypt
Europe/Istanbul
Europe/Podgorica
Europe/Ljubljana
America/Dominica
Hongkong
America/Vancouver
America/Aruba
America/Resolute
Mexico/BajaSur
Asia/Samarkand
Europe/Chisinau
Asia/Brunei
US/Mountain
Asia/Qyzylorda
America/Santa_Isabel
America/Campo_Grande
Pacific/Pitcairn
America/Virgin
Etc/GMT+8
Asia/Taipei
America/Indiana/Vevay
Iran
Asia/Novokuznetsk
Zulu
America/Juneau
America/Boa_Vista
America/Montserrat
Asia/Tomsk
Europe/Helsinki
Europe/Jersey
Asia/Famagusta
America/Port-au-Prince
Australia/Perth
Singapore
Pacific/Port_Moresby
Asia/Kamchatka
Mexico/General
America/Ojinaga
America/El_Salvador
Europe/Busingen
Europe/Samara
Etc/GMT-7
America/North_Dakota/Beulah
America/Goose_Bay
Australia/Eucla
Indian/Cocos
PRC
Antarctica/Troll
America/Detroit
Asia/Ujung_Pandang
Asia/Dubai
MST
Australia/Hobart
Etc/GMT+0
Africa/Dakar
America/Argentina/Tucuman
Brazil/Acre
America/Marigot
America/North_Dakota/Center
Pacific/Truk
Asia/Yangon
Australia/Yancowinna
Etc/GMT-10
Africa/Sao_Tome
Australia/Canberra
Pacific/Honolulu
UCT
Europe/Prague
America/North_Dakota/New_Salem
Australia/North
GB
Asia/Karachi
Africa/Lusaka
America/St_Thomas
America/Chihuahua
Indian/Maldives
Africa/Cairo
America/Costa_Rica
Africa/Kampala
America/Chicago
Israel
Africa/Monrovia
America/Dawson
America/Godthab
America/Cayenne
America/Los_Angeles
Asia/Muscat
Asia/Khandyga
Asia/Bishkek
Atlantic/St_Helena
Etc/GMT-0
Europe/Berlin
America/Dawson_Creek
Africa/Mogadishu
Europe/Paris
Atlantic/Stanley
America/Winnipeg
America/Menominee
Europe/Monaco
America/Indiana/Knox
Asia/Oral
Pacific/Kwajalein
Pacific/Pohnpei
Etc/GMT+11
Africa/Abidjan
Mexico/BajaNorte
Africa/Windhoek
Etc/GMT-8
Europe/Tiraspol
America/Boise
America/Porto_Velho
Antarctica/Davis
America/Maceio
Europe/Vaduz
Etc/GMT-1
EST5EDT
Pacific/Kiritimati
US/Arizona
Atlantic/South_Georgia
Etc/GMT+12
America/Argentina/Cordoba
America/Cordoba
Europe/Minsk
America/Recife
GMT+0
Jamaica
America/Bogota
Brazil/DeNoronha
Asia/Manila
America/Indiana/Petersburg
Africa/Douala
America/Argentina/Jujuy
Etc/GMT+10
Etc/GMT-12
Asia/Thimphu
Canada/Eastern
US/Michigan
Africa/Bujumbura
Atlantic/Faroe
Europe/Lisbon
Asia/Baghdad
Australia/Victoria
Europe/Malta
Asia/Kashgar
America/Monterrey
America/Yellowknife
Asia/Chungking
Etc/GMT-3
America/Kentucky/Monticello
America/Merida
Europe/Oslo
Pacific/Galapagos
Europe/Riga
Africa/Bangui
Africa/Asmera
Etc/GMT-5
EST
Africa/Niamey
America/Kentucky/Louisville
America/Indiana/Indianapolis
Asia/Kuala_Lumpur
America/Knox_IN
America/Louisville
Asia/Omsk
America/Adak
Europe/Vilnius
America/New_York
US/Central
America/Rainy_River
Indian/Comoro
America/Santo_Domingo
Asia/Krasnoyarsk
Etc/GMT+5
America/Caracas
America/Lima
Australia/Brisbane
America/Hermosillo
America/Port_of_Spain
Africa/Nouakchott
Europe/Sofia
Asia/Damascus
Africa/Casablanca
Antarctica/South_Pole
Pacific/Tahiti
America/Panama
America/Argentina/San_Juan
Africa/Harare
GMT-0
America/Indiana/Marengo
Indian/Christmas
Africa/Kigali
Etc/GMT+3
Etc/GMT0
Australia/ACT
Pacific/Efate
Asia/Srednekolymsk
Australia/Broken_Hill
Europe/Rome
America/Guatemala
America/Anchorage
America/Tijuana
Europe/Skopje
Pacific/Palau
Australia/Tasmania
America/St_Johns

Find time zone difference between Denver and East Coast of the United States.

In [34]:
d_tz_str = "America/Denver"
ny_tz_str = "America/New_York"

d_tz_obj = pytz.timezone(d_tz_str)
ny_tz_obj = pytz.timezone(ny_tz_str)

d_dt_obj = dt.datetime.now(d_tz_obj)
ny_dt_obj = dt.datetime.now(ny_tz_obj)

print(d_dt_obj)
print(ny_dt_obj)
2022-06-25 16:19:24.000125-06:00
2022-06-25 18:19:24.000235-04:00

The East Coast of the United States is two hours ahead of Denver.

Adjust the top five times to post list based on this time zone gap.

In [41]:
for entry in ask_hour_list[:5]:
    hour = dt.datetime(2000,1,12,entry[1])
    hour = hour - dt.timedelta(hours = 2)
    hour_string = hour.strftime("%H")
    print("{}:00: {:.2f} average comments per post".format(hour_string, entry[0]))
13:00: 38.59 average comments per post
00:00: 23.81 average comments per post
18:00: 21.52 average comments per post
14:00: 16.80 average comments per post
19:00: 16.01 average comments per post

Conclusion

In Denver, the most optimal times to post in order are 1 p.m., 12 a.m., 6 p.m., 2 p.m., and 7 p.m.

Thank you for taking the time to look through my Hacker News Guided Project. Feedback is much appreciated!

In [ ]: