Hacker News Posts

In this project, we'll work with a data set of submissions to popular technology site Hacker News.

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(opened_file)
print(hn[:5])
['id,title,url,num_points,num_comments,author,created_at\n', '12224879,Interactive Dynamic Video,http://www.interactivedynamicvideo.com/,386,52,ne0phyte,8/4/2016 11:52\n', '10975351,How to Use Open Source and Shut the Fuck Up at the Same Time,http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/,39,10,josep2,1/26/2016 19:30\n', "11964716,Florida DJs May Face Felony for April Fools' Water Joke,http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/,2,1,vezycash,6/23/2016 22:20\n", '11919867,Technology ventures: From Idea to Enterprise,https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429,3,1,hswarna,6/17/2016 0:01\n']
In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])
id,title,url,num_points,num_comments,author,created_at

['12224879,Interactive Dynamic Video,http://www.interactivedynamicvideo.com/,386,52,ne0phyte,8/4/2016 11:52\n', '10975351,How to Use Open Source and Shut the Fuck Up at the Same Time,http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/,39,10,josep2,1/26/2016 19:30\n', "11964716,Florida DJs May Face Felony for April Fools' Water Joke,http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/,2,1,vezycash,6/23/2016 22:20\n", '11919867,Technology ventures: From Idea to Enterprise,https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429,3,1,hswarna,6/17/2016 0:01\n', '10301696,Note by Note: The Making of Steinway L1037 (2007),http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0,8,2,walterbell,9/30/2015 4:12\n']

Let's now separate posts beginning with Ask HN and Show HN (and case variations) into two different lists next.

In [8]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"): 
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
0
0
20100