This project will explore data obtained from Hacker News, a site built by the Y Combinator startup incubator. Through Hacker News, people can post tech-related submissions and receive feedback in the form of votes and comments. Hacker News consists of two types of posts: ask and show. Through data analysis, the more popular type of post will be identified, and the relationship between time of day and post popularity will be determined.
Convert the raw hacker news .csv file into a workable list of lists. Print the first five entries
from csv import reader
hn = list(reader(open("hacker_news.csv")))
for row in hn[0:5]:
print(row,"\n")
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'] ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'] ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'] ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'] ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']
Display column headers.
headers = hn[0]
print(headers)
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
Remove the column headers from the dataset.
hn = hn[1:]
print(hn[0:5])
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]
Filter out any posts that do not start with either Ask HN or Show HN.
Categorize posts into ask_posts, show_posts, and other_posts lists.
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
row[1] = row[1].lower()
if row[1].startswith("ask hn"):
ask_posts.append(row)
elif row[1].startswith("show hn"):
show_posts.append(row)
else:
other_posts.append(row)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
1744 1162 17194
Determine whether ask or show posts receive more comments on average to determine popularity.
def avg_finder(list_n):
total_comments = 0
for row in list_n:
total_comments += int(row[4])
return total_comments / len(list_n)
ask_avg = avg_finder(ask_posts)
show_avg = avg_finder(show_posts)
print(ask_avg)
print(show_avg)
14.038417431192661 10.31669535283993
Ask posts receive almost four more comments than show posts on average.
Determine whether ask posts posted at certain times are more likely to receive more comments.
import datetime as dt
ask_hour_freq = {}
ask_hour_num = {}
for row in ask_posts:
dt_obj = dt.datetime.strptime(row[6], "%m/%d/%Y %H:%M")
hour = dt_obj.hour
if hour in ask_hour_freq:
ask_hour_freq[hour] += int(row[4])
ask_hour_num[hour] += 1
else:
ask_hour_freq[hour] = int(row[4])
ask_hour_num[hour] = 1
for key in ask_hour_freq:
ask_hour_freq[key] = ask_hour_freq[key] / ask_hour_num[key]
print(ask_hour_freq)
{9: 5.5777777777777775, 13: 14.741176470588234, 10: 13.440677966101696, 14: 13.233644859813085, 16: 16.796296296296298, 23: 7.985294117647059, 12: 9.41095890410959, 17: 11.46, 15: 38.5948275862069, 21: 16.009174311926607, 20: 21.525, 2: 23.810344827586206, 18: 13.20183486238532, 3: 7.796296296296297, 5: 10.08695652173913, 19: 10.8, 1: 11.383333333333333, 22: 6.746478873239437, 8: 10.25, 4: 7.170212765957447, 0: 8.127272727272727, 6: 9.022727272727273, 7: 7.852941176470588, 11: 11.051724137931034}
Sort frequency table in descending order.
ask_hour_list = []
for key in ask_hour_freq:
ask_hour_list.append((ask_hour_freq[key], key))
ask_hour_list = sorted(ask_hour_list, reverse = True)
for entry in ask_hour_list:
print(entry[1], ":", entry[0])
15 : 38.5948275862069 2 : 23.810344827586206 20 : 21.525 16 : 16.796296296296298 21 : 16.009174311926607 13 : 14.741176470588234 10 : 13.440677966101696 14 : 13.233644859813085 18 : 13.20183486238532 17 : 11.46 1 : 11.383333333333333 11 : 11.051724137931034 19 : 10.8 8 : 10.25 5 : 10.08695652173913 12 : 9.41095890410959 6 : 9.022727272727273 0 : 8.127272727272727 23 : 7.985294117647059 7 : 7.852941176470588 3 : 7.796296296296297 4 : 7.170212765957447 22 : 6.746478873239437 9 : 5.5777777777777775
Identify the five hours with the highest average number of comments per Ask HN post.
for entry in ask_hour_list[:5]:
hour = dt.time(entry[1])
hour_string = hour.strftime("%H")
print("{}:00: {:.2f} average comments per post".format(hour_string, entry[0]))
15:00: 38.59 average comments per post 02:00: 23.81 average comments per post 20:00: 21.52 average comments per post 16:00: 16.80 average comments per post 21:00: 16.01 average comments per post
Dataset created in Eastern Time US time zone. Shift to Denver time zone.
Get list of all time zones to get string required for Denver.
import pytz
for timezone in pytz.all_timezones_set:
print(timezone)
America/Whitehorse WET Asia/Tbilisi Greenwich America/Indiana/Winamac Iceland Antarctica/Vostok Asia/Choibalsan Europe/Uzhgorod Antarctica/Palmer Africa/Bamako America/Argentina/Rio_Gallegos Atlantic/Azores Asia/Kabul America/Iqaluit Africa/Johannesburg America/Kralendijk Atlantic/Cape_Verde Turkey Africa/Malabo America/Punta_Arenas Africa/Kinshasa Asia/Bangkok Asia/Kuching Etc/GMT+7 Pacific/Nauru Atlantic/Jan_Mayen Asia/Aqtobe America/La_Paz Chile/EasterIsland Asia/Ashgabat GMT Kwajalein America/Grenada Australia/West Africa/Addis_Ababa Etc/GMT+9 Eire US/Indiana-Starke Africa/Asmara PST8PDT Pacific/Pago_Pago Asia/Dushanbe America/Toronto Atlantic/Bermuda Africa/Porto-Novo Etc/GMT-6 America/Swift_Current America/Matamoros America/Fort_Nelson Asia/Tokyo Europe/Vienna Asia/Nicosia America/Cuiaba Europe/Guernsey America/Guadeloupe Asia/Vientiane Africa/Dar_es_Salaam Asia/Dili Europe/Bratislava America/Indiana/Vincennes Canada/Saskatchewan Asia/Anadyr Pacific/Wake Poland America/Yakutat America/Jamaica Europe/Isle_of_Man America/Coral_Harbour Africa/Banjul America/Mazatlan Pacific/Kosrae Asia/Jakarta Europe/Simferopol Africa/El_Aaiun Asia/Harbin Asia/Saigon Indian/Mayotte Asia/Qatar America/Nassau Etc/GMT+1 Etc/UTC America/Mendoza Atlantic/Faeroe Pacific/Wallis America/Thule Africa/Luanda Africa/Djibouti America/St_Lucia ROK Asia/Kolkata America/Regina Africa/Freetown Pacific/Guam Asia/Kuwait Asia/Barnaul Etc/Universal Pacific/Noumea Pacific/Ponape Africa/Conakry America/Belem America/Denver Asia/Yakutsk Asia/Ulaanbaatar Asia/Aden Navajo America/Argentina/Buenos_Aires Antarctica/Mawson Asia/Pontianak EET America/Nipigon America/Danmarkshavn Canada/Atlantic Asia/Katmandu Etc/GMT-9 America/Argentina/Catamarca Europe/Sarajevo America/Buenos_Aires America/St_Barthelemy Indian/Mahe NZ America/Montevideo Pacific/Saipan Asia/Urumqi America/Bahia_Banderas Australia/Adelaide US/East-Indiana US/Aleutian Pacific/Majuro Asia/Hebron Arctic/Longyearbyen Asia/Beirut Pacific/Funafuti Africa/Ouagadougou Asia/Baku Asia/Singapore Australia/Melbourne Africa/Tripoli Pacific/Bougainville America/Jujuy Europe/Moscow US/Hawaii America/Argentina/Mendoza CET Pacific/Tongatapu America/Rio_Branco Europe/Belfast America/Indianapolis America/Havana Asia/Chita Etc/GMT-14 America/Tegucigalpa America/Bahia America/Araguaina Etc/UCT America/Montreal Europe/Gibraltar Asia/Tashkent America/Pangnirtung America/Lower_Princes Europe/Budapest Antarctica/Syowa Indian/Mauritius Africa/Maputo Europe/Zagreb Pacific/Samoa US/Eastern America/Cambridge_Bay Etc/GMT+4 Africa/Nairobi Africa/Timbuktu Africa/Lagos Africa/Lubumbashi Asia/Rangoon America/Nome Africa/Ceuta Atlantic/Canary Pacific/Apia Europe/London Indian/Antananarivo Australia/Queensland Antarctica/McMurdo Canada/Newfoundland Asia/Thimbu Asia/Ashkhabad America/Argentina/Salta Asia/Novosibirsk Pacific/Chatham America/Inuvik Australia/Sydney HST America/Edmonton Europe/Tallinn Pacific/Yap Etc/GMT+6 America/Nuuk Antarctica/Rothera Atlantic/Madeira America/Antigua Europe/Zurich Pacific/Norfolk US/Alaska Brazil/East America/Moncton CST6CDT Pacific/Rarotonga Pacific/Chuuk America/Asuncion Europe/Ulyanovsk Asia/Sakhalin Asia/Macau Africa/Khartoum America/Cancun Antarctica/DumontDUrville Antarctica/Macquarie Africa/Algiers MST7MDT America/Noronha Asia/Riyadh Europe/Kirov Africa/Bissau Asia/Ho_Chi_Minh Etc/GMT-11 America/Barbados America/Thunder_Bay GMT0 America/Grand_Turk Asia/Gaza Asia/Calcutta Europe/Kaliningrad Universal Australia/South America/Santarem America/Paramaribo Canada/Central Japan Asia/Dhaka Europe/Copenhagen America/Ensenada America/Anguilla Europe/Tirane Etc/Greenwich Asia/Amman America/Sitka America/Shiprock Europe/Kiev America/Catamarca Asia/Tel_Aviv Europe/Warsaw America/Guayaquil Cuba America/St_Vincent America/St_Kitts Australia/LHI Africa/Gaborone Africa/Lome America/Creston Canada/Yukon America/Sao_Paulo America/Managua America/Tortola Africa/Accra Europe/Dublin Europe/Volgograd W-SU Asia/Istanbul Asia/Hong_Kong Pacific/Fiji Chile/Continental GB-Eire Indian/Kerguelen Pacific/Niue Africa/Blantyre Europe/Belgrade America/Porto_Acre US/Samoa Africa/Brazzaville Australia/Currie America/Fortaleza America/Santiago America/Phoenix Europe/Mariehamn UTC Pacific/Easter Pacific/Marquesas America/Halifax Europe/Stockholm Canada/Pacific America/Fort_Wayne America/Rankin_Inlet Asia/Jerusalem Europe/Madrid America/Argentina/San_Luis Africa/Ndjamena Brazil/West Europe/Vatican America/Blanc-Sablon Australia/NSW Africa/Mbabane America/Cayman Etc/GMT-13 America/Indiana/Tell_City Asia/Qostanay America/Curacao America/Glace_Bay America/Argentina/Ushuaia US/Pacific America/Eirunepe Atlantic/Reykjavik NZ-CHAT Asia/Kathmandu Australia/Lord_Howe America/Rosario Asia/Seoul America/Guyana America/Manaus Asia/Jayapura Etc/Zulu Asia/Atyrau Indian/Chagos Europe/San_Marino Africa/Juba Africa/Maseru Pacific/Midway Antarctica/Casey Canada/Mountain Europe/Athens ROC America/Martinique Indian/Reunion Asia/Magadan Portugal Europe/Saratov Asia/Chongqing America/Argentina/ComodRivadavia Africa/Libreville Asia/Aqtau Europe/Bucharest Asia/Bahrain Europe/Luxembourg Australia/Lindeman Asia/Pyongyang Etc/GMT Europe/Nicosia Pacific/Tarawa Etc/GMT-4 Europe/Amsterdam America/Argentina/La_Rioja MET Africa/Tunis Asia/Ust-Nera America/Miquelon Asia/Shanghai Asia/Almaty Asia/Tehran Libya Etc/GMT-2 America/Belize America/Puerto_Rico Pacific/Gambier Asia/Vladivostok America/Metlakatla Europe/Astrakhan Asia/Irkutsk Asia/Yekaterinburg Australia/Darwin America/Atikokan Pacific/Enderbury Asia/Dacca Pacific/Fakaofo America/Atka Asia/Makassar Pacific/Guadalcanal Asia/Phnom_Penh Asia/Ulan_Bator Asia/Yerevan America/Mexico_City Europe/Zaporozhye Europe/Andorra Pacific/Auckland Europe/Brussels Asia/Macao Asia/Colombo Etc/GMT+2 Pacific/Johnston Asia/Hovd America/Scoresbysund Egypt Europe/Istanbul Europe/Podgorica Europe/Ljubljana America/Dominica Hongkong America/Vancouver America/Aruba America/Resolute Mexico/BajaSur Asia/Samarkand Europe/Chisinau Asia/Brunei US/Mountain Asia/Qyzylorda America/Santa_Isabel America/Campo_Grande Pacific/Pitcairn America/Virgin Etc/GMT+8 Asia/Taipei America/Indiana/Vevay Iran Asia/Novokuznetsk Zulu America/Juneau America/Boa_Vista America/Montserrat Asia/Tomsk Europe/Helsinki Europe/Jersey Asia/Famagusta America/Port-au-Prince Australia/Perth Singapore Pacific/Port_Moresby Asia/Kamchatka Mexico/General America/Ojinaga America/El_Salvador Europe/Busingen Europe/Samara Etc/GMT-7 America/North_Dakota/Beulah America/Goose_Bay Australia/Eucla Indian/Cocos PRC Antarctica/Troll America/Detroit Asia/Ujung_Pandang Asia/Dubai MST Australia/Hobart Etc/GMT+0 Africa/Dakar America/Argentina/Tucuman Brazil/Acre America/Marigot America/North_Dakota/Center Pacific/Truk Asia/Yangon Australia/Yancowinna Etc/GMT-10 Africa/Sao_Tome Australia/Canberra Pacific/Honolulu UCT Europe/Prague America/North_Dakota/New_Salem Australia/North GB Asia/Karachi Africa/Lusaka America/St_Thomas America/Chihuahua Indian/Maldives Africa/Cairo America/Costa_Rica Africa/Kampala America/Chicago Israel Africa/Monrovia America/Dawson America/Godthab America/Cayenne America/Los_Angeles Asia/Muscat Asia/Khandyga Asia/Bishkek Atlantic/St_Helena Etc/GMT-0 Europe/Berlin America/Dawson_Creek Africa/Mogadishu Europe/Paris Atlantic/Stanley America/Winnipeg America/Menominee Europe/Monaco America/Indiana/Knox Asia/Oral Pacific/Kwajalein Pacific/Pohnpei Etc/GMT+11 Africa/Abidjan Mexico/BajaNorte Africa/Windhoek Etc/GMT-8 Europe/Tiraspol America/Boise America/Porto_Velho Antarctica/Davis America/Maceio Europe/Vaduz Etc/GMT-1 EST5EDT Pacific/Kiritimati US/Arizona Atlantic/South_Georgia Etc/GMT+12 America/Argentina/Cordoba America/Cordoba Europe/Minsk America/Recife GMT+0 Jamaica America/Bogota Brazil/DeNoronha Asia/Manila America/Indiana/Petersburg Africa/Douala America/Argentina/Jujuy Etc/GMT+10 Etc/GMT-12 Asia/Thimphu Canada/Eastern US/Michigan Africa/Bujumbura Atlantic/Faroe Europe/Lisbon Asia/Baghdad Australia/Victoria Europe/Malta Asia/Kashgar America/Monterrey America/Yellowknife Asia/Chungking Etc/GMT-3 America/Kentucky/Monticello America/Merida Europe/Oslo Pacific/Galapagos Europe/Riga Africa/Bangui Africa/Asmera Etc/GMT-5 EST Africa/Niamey America/Kentucky/Louisville America/Indiana/Indianapolis Asia/Kuala_Lumpur America/Knox_IN America/Louisville Asia/Omsk America/Adak Europe/Vilnius America/New_York US/Central America/Rainy_River Indian/Comoro America/Santo_Domingo Asia/Krasnoyarsk Etc/GMT+5 America/Caracas America/Lima Australia/Brisbane America/Hermosillo America/Port_of_Spain Africa/Nouakchott Europe/Sofia Asia/Damascus Africa/Casablanca Antarctica/South_Pole Pacific/Tahiti America/Panama America/Argentina/San_Juan Africa/Harare GMT-0 America/Indiana/Marengo Indian/Christmas Africa/Kigali Etc/GMT+3 Etc/GMT0 Australia/ACT Pacific/Efate Asia/Srednekolymsk Australia/Broken_Hill Europe/Rome America/Guatemala America/Anchorage America/Tijuana Europe/Skopje Pacific/Palau Australia/Tasmania America/St_Johns
Find time zone difference between Denver and East Coast of the United States.
d_tz_str = "America/Denver"
ny_tz_str = "America/New_York"
d_tz_obj = pytz.timezone(d_tz_str)
ny_tz_obj = pytz.timezone(ny_tz_str)
d_dt_obj = dt.datetime.now(d_tz_obj)
ny_dt_obj = dt.datetime.now(ny_tz_obj)
print(d_dt_obj)
print(ny_dt_obj)
2022-06-25 16:19:24.000125-06:00 2022-06-25 18:19:24.000235-04:00
The East Coast of the United States is two hours ahead of Denver.
Adjust the top five times to post list based on this time zone gap.
for entry in ask_hour_list[:5]:
hour = dt.datetime(2000,1,12,entry[1])
hour = hour - dt.timedelta(hours = 2)
hour_string = hour.strftime("%H")
print("{}:00: {:.2f} average comments per post".format(hour_string, entry[0]))
13:00: 38.59 average comments per post 00:00: 23.81 average comments per post 18:00: 21.52 average comments per post 14:00: 16.80 average comments per post 19:00: 16.01 average comments per post
In Denver, the most optimal times to post in order are 1 p.m., 12 a.m., 6 p.m., 2 p.m., and 7 p.m.
Thank you for taking the time to look through my Hacker News Guided Project. Feedback is much appreciated!