%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = pd.read_csv("articles_2017-10-01_2017-10-31.csv", index_col="id", \
parse_dates=["published", "discovered"])
data.head()
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
df8cf9e5ed31a1bfb34c9b73d9e5bce5ab98b439 | https://www.economist.com/news/europe/21729855... | An unconstitutional vote on independence turns... | 2017-10-01 16:09:23.688 | 2017-10-01 00:00:00 | 125 | 0.508058 | 2017-10-01T21:22:11.563Z | 6 | 63 | 56 | The Economist | economist_com | 1299 | 10322 | 35.0 | True | 8312258.0 | 2017-10-02T17:09:04.000Z | 1825 |
ff73a23349976db8c3e5ae3d4a64242b156f14a7 | https://www.washingtonpost.com/sports/colleges... | Rourke leads Ohio to shootout win over Massach... | 2017-10-01 00:07:21.546 | 2017-10-01 00:00:07 | 0 | 0.000000 | 2017-10-01T06:23:10.290Z | 0 | 0 | 0 | The Washington Post | washingtonpost_com | 0 | 0 | NaN | False | NaN | NaN | 191 |
107efa77273acbbe8fd60c139306e7fcd09fdd98 | https://www.nytimes.com/2017/09/30/sports/base... | Luis Severino and the Yankees Are Straddling a... | 2017-10-01 00:01:10.412 | 2017-10-01 00:00:13 | 38 | 0.459418 | 2017-10-01T00:13:03.108Z | 5 | 4 | 29 | New York Times | nytimes_com | 0 | 780 | 125.0 | False | NaN | NaN | 120 |
5c4db3ff5ac83175ba202cb9327d7e7b52f79f46 | https://www.washingtonpost.com/sports/colleges... | Kentucky rebounds to outlast Eastern Michigan ... | 2017-10-01 00:07:22.846 | 2017-10-01 00:00:30 | 0 | 0.000000 | 2017-10-01T06:23:10.293Z | 0 | 0 | 0 | The Washington Post | washingtonpost_com | 0 | 0 | NaN | False | NaN | NaN | 191 |
faa6a6d39f269bf6a40af0f52f02a41cb5b38601 | https://www.washingtonpost.com/sports/mystics/... | USA Basketball hosts a Women in the Game seminar | 2017-10-01 00:07:22.215 | 2017-10-01 00:01:13 | 0 | 0.000000 | 2017-10-01T06:23:10.291Z | 0 | 0 | 0 | The Washington Post | washingtonpost_com | 0 | 0 | NaN | False | NaN | NaN | 191 |
The response score is a number between 0 and 50 that indicates the level of response to an article.
Perhaps in the future we may choose to include other factors, but for now we just include engagements on Facebook. The maximum score of 50 should be achieved by an article that does really well compared with others.
pd.options.display.float_format = '{:.2f}'.format
data.fb_engagements.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 158885.00 mean 1290.91 std 9886.87 min 0.00 50% 26.00 75% 263.00 90% 1656.00 95% 4629.00 99% 24130.32 99.5% 40363.06 99.9% 106861.46 max 1680741.00 Name: fb_engagements, dtype: float64
There's a few articles there with 1 million plus engagements, let's just double check that.
data[data.fb_engagements > 1000000]
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
3d58d5fdd5b5649ac1e92f4853a80fe7d2b244d3 | http://www.foxnews.com/entertainment/2017/10/0... | Top CBS lawyer: No sympathy for Vegas vics, 'p... | 2017-10-02 16:54:13.613 | 2017-10-02 16:48:32 | 1680741 | 3545.41 | 2017-10-02T20:33:04.125Z | 469349 | 1031278 | 180114 | Fox News | foxnews_com | 0 | 2601 | 174.00 | True | 15888170.00 | 2017-10-02T20:15:01.000Z | 285 |
data.fb_engagements.mode()
0 0 dtype: int64
Going back to the enagement counts, we see the mean is 1,290, mode is zero, median is 26, 90th percentile is 1,656, 99th percentile is 24,130, 99.5th percentile is 40,363. The standard deviation is 9,886, significantly higher than the mean, so this is not a normal distribution.
Key publishers stats
data.groupby("publisher_id").agg({'url': 'count', 'fb_engagements': ['sum', 'median', 'mean']})
url | fb_engagements | |||
---|---|---|---|---|
count | sum | median | mean | |
publisher_id | ||||
anotherangryvoice_blogspot_co_uk | 38 | 72806 | 1434.50 | 1915.95 |
bbc_co_uk | 11973 | 10440781 | 40.00 | 872.03 |
breitbart_com | 2707 | 13276781 | 372.00 | 4904.61 |
brexitcentral_com | 53 | 39843 | 271.00 | 751.75 |
buzzfeed_com | 2010 | 6444963 | 153.50 | 3206.45 |
cnn_com | 3521 | 19965843 | 639.00 | 5670.50 |
dailymail_co_uk | 24900 | 18894463 | 26.00 | 758.81 |
economist_com | 517 | 183881 | 49.00 | 355.67 |
evolvepolitics_com | 74 | 226865 | 1288.00 | 3065.74 |
foxnews_com | 6844 | 23214169 | 42.00 | 3391.90 |
ft_com | 3792 | 330615 | 6.00 | 87.19 |
huffingtonpost_com | 11213 | 16768347 | 8.00 | 1495.44 |
independent_co_uk | 6578 | 10132679 | 42.00 | 1540.39 |
indy100_com | 512 | 463823 | 127.00 | 905.90 |
lemonde_fr | 3918 | 2323126 | 88.00 | 592.94 |
libdemvoice_org | 177 | 1880 | 6.00 | 10.62 |
mirror_co_uk | 10525 | 7532990 | 48.00 | 715.72 |
nbcnews_com | 2207 | 7893296 | 389.00 | 3576.48 |
newstatesman_com | 567 | 251219 | 23.00 | 443.07 |
npr_org | 2020 | 8157583 | 208.00 | 4038.41 |
nytimes_com | 5185 | 19339366 | 200.00 | 3729.87 |
order-order_com | 296 | 96614 | 148.00 | 326.40 |
propublica_org | 52 | 239626 | 495.00 | 4608.19 |
reuters_com | 6125 | 1939056 | 21.00 | 316.58 |
rt_com | 2678 | 2417004 | 217.00 | 902.54 |
skwawkbox_org | 127 | 108501 | 234.00 | 854.34 |
telegraph_co_uk | 7477 | 3406193 | 25.00 | 455.56 |
thecanary_co | 243 | 348388 | 806.00 | 1433.70 |
theguardian_com | 8680 | 11711135 | 145.00 | 1349.21 |
thetimes_co_uk | 8868 | 262582 | 1.00 | 29.61 |
washingtonpost_com | 24227 | 16152850 | 0.00 | 666.73 |
westmonster_com | 371 | 352208 | 28.00 | 949.35 |
yournewswire_com | 410 | 2117388 | 297.00 | 5164.36 |
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
non_zero_fb_enagagements = data.fb_engagements[data.fb_engagements > 0]
That's a bit better, but still way too clustered at the low end. Let's look at a log normal distribution.
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(12,4.5))
plt.hist(np.log(non_zero_fb_enagagements + median), bins=50)
plt.axvline(np.log(mean), linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(np.log(median), label=f'Median ({median:,.0f})', color='green')
plt.axvline(np.log(ninety), linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(np.log(ninetyfive), linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
plt.axvline(np.log(ninetynine), linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
eng = data.fb_engagements[(data.fb_engagements < 5000)]
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(15,7))
plt.hist(eng, bins=50)
plt.title("Article count by engagements")
plt.axvline(median, label=f'Median ({median:,.0f})', color='green')
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(ninety, linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(ninetyfive, linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
# plt.axvline(ninetynine, linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
log_engagements = (non_zero_fb_enagagements
.clip_upper(data.fb_engagements.quantile(.999))
.apply(lambda x: np.log(x + median))
)
log_engagements.describe()
count 128733.00 mean 5.07 std 1.78 min 3.30 25% 3.58 50% 4.48 75% 6.13 max 11.58 Name: fb_engagements, dtype: float64
Use standard feature scaling to bring that to a 1 to 50 range
def scale_log_engagements(engagements_logged):
return np.ceil(
50 * (engagements_logged - log_engagements.min()) / (log_engagements.max() - log_engagements.min())
)
def scale_engagements(engagements):
return scale_log_engagements(np.log(engagements + median))
scaled_non_zero_engagements = scale_log_engagements(log_engagements)
scaled_non_zero_engagements.describe()
count 128733.00 mean 11.18 std 10.78 min 0.00 25% 2.00 50% 8.00 75% 18.00 max 50.00 Name: fb_engagements, dtype: float64
# add in the zeros, as zero
scaled_engagements = pd.concat([scaled_non_zero_engagements, data.fb_engagements[data.fb_engagements == 0]])
proposed = pd.DataFrame({"fb_engagements": data.fb_engagements, "response_score": scaled_engagements})
proposed.response_score.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x116568710>
Looks good to me, lets save that.
data["response_score"] = proposed.response_score
The maximum of 50 points is awarded when the engagements are greater than the 99.9th percentile, rolling over the last month.
i.e. where $limit$ is the 99.5th percentile of engagements calculated over the previous month, the response score for article $a$ is:
\begin{align} basicScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ \log(\min(engagements_a,limit) + median(engagements)) & \text{if } engagements_a > 0 \end{cases} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{basicScore_a - \min(basicScore)}{\max(basicScore) - \min(basicScore)} & \text{if } engagements_a > 0 \end{cases} \\ \\ \text{The latter equation can be expanded to:} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} & \text{if } engagements_a > 0 \end{cases} \\ \end{align}The aim of the promotion score is to indicate how important the article was to the publisher, by tracking where they chose to promote it. This is a number between 0 and 50 comprised of:
The first two should be scaled by the popularity/reach of the home page, for which we use the alexa page rank as a proxy.
The last should be scaled by the popularity/reach of the brand page, for which we use the number of likes the brand page has.
data.mins_as_lead.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 158885.00 mean 9.21 std 92.90 min 0.00 50% 0.00 75% 0.00 90% 0.00 95% 0.00 99% 269.00 99.5% 584.00 99.9% 1199.12 max 11563.00 Name: mins_as_lead, dtype: float64
As expected, the vast majority of articles don't make it as lead. Let's explore how long typically publishers put something as lead for.
lead_articles = data[data.mins_as_lead > 0]
lead_articles.mins_as_lead.describe([0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 4515.00 mean 324.11 std 449.13 min 4.00 25% 84.00 50% 174.00 75% 406.50 90% 834.00 95% 1074.00 99% 1648.30 99.5% 2187.97 99.9% 5453.86 max 11563.00 Name: mins_as_lead, dtype: float64
lead_articles.mins_as_lead.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x111226fd0>
For lead, it's a significant thing for an article to be lead at all, so although we want to penalise articles that were lead for a very short time, mostly we want to score the maximum even if it wasn't lead for ages. So we'll give maximum points when something has been lead for an hour.
lead_articles.mins_as_lead.clip_upper(60).plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x11181e5f8>
We also want to scale this by the alexa page rank, such that the maximum score of 20 points is for an article that was on the front for 4 hours for the most popular site.
So lets explore the alexa nunbers.
alexa_ranks = data.groupby(by="publisher_id").alexa_rank.mean().sort_values()
alexa_ranks
publisher_id bbc_co_uk 96 cnn_com 105 nytimes_com 120 theguardian_com 142 buzzfeed_com 147 dailymail_co_uk 158 washingtonpost_com 191 huffingtonpost_com 215 foxnews_com 285 rt_com 365 telegraph_co_uk 370 independent_co_uk 386 reuters_com 497 npr_org 594 lemonde_fr 618 mirror_co_uk 706 nbcnews_com 826 breitbart_com 994 ft_com 1596 economist_com 1825 indy100_com 5014 thetimes_co_uk 6435 newstatesman_com 12769 thecanary_co 15686 propublica_org 16066 yournewswire_com 22568 order-order_com 32515 anotherangryvoice_blogspot_co_uk 77827 westmonster_com 97775 evolvepolitics_com 119412 skwawkbox_org 152475 libdemvoice_org 344992 brexitcentral_com 469149 Name: alexa_rank, dtype: int64
alexa_ranks.plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x111b9c9b0>
Let's try the simple option first: just divide the number of minutes as lead by the alexa rank. What's the scale of numbers we get then.
lead_proposal_1 = lead_articles.mins_as_lead.clip_upper(60) / lead_articles.alexa_rank
lead_proposal_1.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10e73bf28>
Looks like there's too much of a cluster around 0. Have we massively over penalised the publishers with a high alexa rank?
lead_proposal_1.groupby(data.publisher_id).mean().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x114699e80>
Yes. Let's try taking the log of the alexa rank and see if that looks better.
lead_proposal_2 = (lead_articles.mins_as_lead.clip_upper(60) / np.log(lead_articles.alexa_rank))
lead_proposal_2.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x112b434e0>
lead_proposal_2.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 35.00 | 5.31 | 0.11 | 4.71 | 5.33 | 5.33 | 5.33 | 5.33 |
bbc_co_uk | 101.00 | 12.90 | 1.46 | 1.10 | 13.15 | 13.15 | 13.15 | 13.15 |
breitbart_com | 202.00 | 8.32 | 1.45 | 0.58 | 8.69 | 8.69 | 8.69 | 8.69 |
brexitcentral_com | 47.00 | 4.38 | 0.84 | 0.69 | 4.59 | 4.59 | 4.59 | 4.59 |
buzzfeed_com | 302.00 | 11.89 | 0.90 | 1.80 | 12.02 | 12.02 | 12.02 | 12.02 |
cnn_com | 198.00 | 12.26 | 2.14 | 0.86 | 12.89 | 12.89 | 12.89 | 12.89 |
dailymail_co_uk | 169.00 | 11.54 | 1.46 | 0.99 | 11.85 | 11.85 | 11.85 | 11.85 |
economist_com | 42.00 | 7.22 | 2.19 | 0.53 | 7.99 | 7.99 | 7.99 | 7.99 |
evolvepolitics_com | 27.00 | 5.13 | 0.02 | 5.05 | 5.13 | 5.13 | 5.13 | 5.13 |
foxnews_com | 115.00 | 10.58 | 0.33 | 7.08 | 10.61 | 10.61 | 10.61 | 10.61 |
ft_com | 103.00 | 7.55 | 1.75 | 0.54 | 8.14 | 8.14 | 8.14 | 8.14 |
huffingtonpost_com | 176.00 | 11.02 | 0.79 | 3.72 | 11.17 | 11.17 | 11.17 | 11.17 |
independent_co_uk | 135.00 | 9.83 | 1.12 | 0.84 | 10.07 | 10.07 | 10.07 | 10.07 |
indy100_com | 231.00 | 5.35 | 2.06 | 0.47 | 3.52 | 6.92 | 7.04 | 7.04 |
lemonde_fr | 189.00 | 8.46 | 2.19 | 0.62 | 9.34 | 9.34 | 9.34 | 9.34 |
libdemvoice_org | 142.00 | 4.65 | 0.35 | 1.18 | 4.71 | 4.71 | 4.71 | 4.71 |
mirror_co_uk | 327.00 | 8.62 | 1.52 | 0.61 | 9.15 | 9.15 | 9.15 | 9.15 |
nbcnews_com | 116.00 | 8.74 | 1.10 | 0.74 | 8.93 | 8.93 | 8.93 | 8.93 |
newstatesman_com | 76.00 | 6.02 | 1.12 | 1.06 | 6.35 | 6.35 | 6.35 | 6.35 |
npr_org | 159.00 | 9.08 | 1.11 | 3.13 | 9.39 | 9.39 | 9.39 | 9.39 |
nytimes_com | 54.00 | 12.53 | 0.00 | 12.53 | 12.53 | 12.53 | 12.53 | 12.53 |
order-order_com | 292.00 | 4.26 | 1.55 | 0.39 | 2.89 | 4.33 | 5.78 | 5.78 |
propublica_org | 22.00 | 6.20 | 0.00 | 6.20 | 6.20 | 6.20 | 6.20 | 6.20 |
reuters_com | 99.00 | 9.26 | 1.40 | 3.06 | 9.66 | 9.66 | 9.66 | 9.66 |
rt_com | 151.00 | 9.60 | 1.82 | 0.85 | 10.17 | 10.17 | 10.17 | 10.17 |
skwawkbox_org | 127.00 | 4.70 | 0.89 | 0.42 | 5.03 | 5.03 | 5.03 | 5.03 |
telegraph_co_uk | 102.00 | 9.83 | 1.50 | 0.85 | 10.15 | 10.15 | 10.15 | 10.15 |
thecanary_co | 232.00 | 4.85 | 1.67 | 0.93 | 3.62 | 6.11 | 6.21 | 6.21 |
theguardian_com | 156.00 | 11.14 | 2.55 | 1.01 | 12.11 | 12.11 | 12.11 | 12.11 |
thetimes_co_uk | 63.00 | 6.65 | 0.88 | 2.17 | 6.84 | 6.84 | 6.84 | 6.84 |
washingtonpost_com | 81.00 | 10.93 | 1.85 | 0.76 | 11.42 | 11.42 | 11.42 | 11.42 |
westmonster_com | 79.00 | 4.96 | 0.83 | 1.31 | 5.22 | 5.22 | 5.22 | 5.22 |
yournewswire_com | 165.00 | 5.78 | 0.82 | 0.40 | 5.99 | 5.99 | 5.99 | 5.99 |
lead_proposal_2.groupby(data.publisher_id).min().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x1137b1a90>
That looks about right, as long as the smaller publishers were closer to zero. So let's apply feature scaling to this, to give a number between 1 and 20. (Anything not as lead will pass though as zero.)
def rescale(series):
return (series - series.min()) / (series.max() - series.min())
lead_proposal_3 = np.ceil(20 * rescale(lead_proposal_2))
lead_proposal_2.min(), lead_proposal_2.max()
(0.38500569152790032, 13.145359968846892)
lead_proposal_3.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x113223390>
lead_proposal_3.groupby(data.publisher_id).median().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x113ace0b8>
data["lead_score"] = pd.concat([lead_proposal_3, data.mins_as_lead[data.mins_as_lead==0]])
data.lead_score.value_counts().sort_index()
0.00 154372 1.00 40 2.00 57 3.00 63 4.00 79 5.00 88 6.00 79 7.00 281 8.00 299 9.00 317 10.00 218 11.00 196 12.00 62 13.00 112 14.00 585 15.00 398 16.00 362 17.00 284 18.00 242 19.00 424 20.00 327 Name: lead_score, dtype: int64
data.lead_score.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 20.00 breitbart_com 14.00 brexitcentral_com 7.00 buzzfeed_com 19.00 cnn_com 20.00 dailymail_co_uk 18.00 economist_com 12.00 evolvepolitics_com 8.00 foxnews_com 17.00 ft_com 13.00 huffingtonpost_com 17.00 independent_co_uk 16.00 indy100_com 11.00 lemonde_fr 15.00 libdemvoice_org 7.00 mirror_co_uk 14.00 nbcnews_com 14.00 newstatesman_com 10.00 npr_org 15.00 nytimes_com 20.00 order-order_com 9.00 propublica_org 10.00 reuters_com 15.00 rt_com 16.00 skwawkbox_org 8.00 telegraph_co_uk 16.00 thecanary_co 10.00 theguardian_com 19.00 thetimes_co_uk 11.00 washingtonpost_com 18.00 westmonster_com 8.00 yournewswire_com 9.00 Name: lead_score, dtype: float64
In summary then, score for article $a$ is:
$$ unscaledLeadScore_a = \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)}\\ leadScore_a = 19 \cdot \frac{unscaledLeadScore_a - \min(unscaledLeadScore)} {\max(unscaledLeadScore) - \min(unscaledLeadScore)} + 1 $$Since the minium value of $minsAsLead$ is 1, $\min(unscaledLeadScore)$ is pretty insignificant. So we can simplify this to:
$$ leadScore_a = 20 \cdot \frac{unscaledLeadScore_a } {\max(unscaledLeadScore)} $$or:
$$ leadScore_a = 20 \cdot \frac{\frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} } {\frac{60}{\log(\max(alexaRank))}} $$$$ leadScore_a = \left( 20 \cdot \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} \cdot {\frac{\log(\max(alexaRank))}{60}} \right) $$This is similar to time as lead, so lets try doing the same calculation, except we also want to factor in the number of slots on the front:
$$frontScore_a = 15 \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right)$$(data.alexa_rank * data.num_articles_on_front).min() / 1440
2.4500000000000002
time_on_front_proposal_1 = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15)
time_on_front_proposal_1.plot.hist(figsize=(15, 7), bins=15)
<matplotlib.axes._subplots.AxesSubplot at 0x114dc0f28>
time_on_front_proposal_1.value_counts().sort_index()
1.00 75844 2.00 7589 3.00 4551 4.00 4163 5.00 791 6.00 581 7.00 569 8.00 890 9.00 586 10.00 223 11.00 308 12.00 348 13.00 120 14.00 66 15.00 35 dtype: int64
time_on_front_proposal_1.groupby(data.publisher_id).sum()
publisher_id anotherangryvoice_blogspot_co_uk 38.00 bbc_co_uk 15612.00 breitbart_com 2624.00 brexitcentral_com 53.00 buzzfeed_com 9480.00 cnn_com 12207.00 dailymail_co_uk 14732.00 economist_com 302.00 evolvepolitics_com 74.00 foxnews_com 7983.00 ft_com 3528.00 huffingtonpost_com 7713.00 independent_co_uk 4548.00 indy100_com 491.00 lemonde_fr 3904.00 libdemvoice_org 177.00 mirror_co_uk 9900.00 nbcnews_com 1981.00 newstatesman_com 557.00 npr_org 2492.00 nytimes_com 9902.00 order-order_com 295.00 propublica_org 52.00 reuters_com 7124.00 rt_com 4531.00 skwawkbox_org 127.00 telegraph_co_uk 5567.00 thecanary_co 243.00 theguardian_com 12853.00 thetimes_co_uk 8850.00 washingtonpost_com 9250.00 westmonster_com 348.00 yournewswire_com 410.00 dtype: float64
That looks good to me.
data["front_score"] = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15).fillna(0)
data.front_score
id df8cf9e5ed31a1bfb34c9b73d9e5bce5ab98b439 1.00 ff73a23349976db8c3e5ae3d4a64242b156f14a7 0.00 107efa77273acbbe8fd60c139306e7fcd09fdd98 2.00 5c4db3ff5ac83175ba202cb9327d7e7b52f79f46 0.00 faa6a6d39f269bf6a40af0f52f02a41cb5b38601 0.00 d8fdb38812bfa4ebf5096eff5c3836c7bfea2650 0.00 5b5056265c75d8f3c84a209365a3463b47aba3e1 0.00 c2376359478d79aec41cc87c60ce96a07200315e 0.00 58d25dd29d2cc5510e6f2aecfb19dfbad7d8072f 0.00 3f6ab2c7c68a2b1a231a2483781690e8208b504c 0.00 0c710c25d58126384a7a0b2e9a44ff1e1c84760f 0.00 b62c7a7e048c9c0b43894f1af09412fd03f526c4 1.00 da66ff6b2857b28d802ac43d51739b91ea537513 1.00 6acc7b4c632c0dd5783da419bd82f9c20500bce3 0.00 273a63e59763d0b4b8ddcc0cc0ea6890e70d3612 0.00 29630b4931d9094406c7da71a92654ff4aa3a804 0.00 b8b37dc30b7ad207bd93ae3cdb595a5ea634a5d1 0.00 95174653c48ade1a96c74d7eeb8daab72ad6438d 0.00 84401f2ca38991c4ccdf1518377132f3fd029157 1.00 48911d9ba1a31b061bc4c831bbc63670cc757aab 0.00 5d6fc4a824831cf8daf5c6d06c6649b72a9ae32e 1.00 9b2ead55398bd9af21adcfb60ef2ecd477243ff4 1.00 bbda5b39aff202e1396ef50549d8dd82246dd3b7 1.00 05532ac3109b3f97efc5cf01b1287c2a9eebb12e 1.00 211dea07520407f9916cf0431839e3492dbecaf1 0.00 0d63758483a6a7eea9c609a29b9a33930484b5dd 1.00 546940fde649b45d214734663d9932cc825a0e33 0.00 469ba85ad83defa6bb1baab22b889ef333f585c0 0.00 d2072a2146f76f319304ebe728eee8a106171416 1.00 545ed82638e2b1facb7e2d696237bd7e846a5744 1.00 ... d0dd0c23333756ccd50495ab96cc98c7946a91c7 1.00 bb27ecb8d6f3b9d536137616293963a89a7a0a31 1.00 f0ec131dd80598a2df96a5323e5e1409e6e4c4ae 1.00 123ae6f4c00b1837c811fc8e8e52f2ba22f69493 1.00 4d051a39263f284239615c6810a89673dafaec69 0.00 7cdd27d441fa5d8e23a00562f6d865f74ec51be0 0.00 3506c633ed4cdc5a6206272a703fe4f0b21b08d5 2.00 bc475aff1476ca08707c57a4b12ee156b06da4bc 1.00 f34a5885b114fe4b586e17ca3431ba992319a432 9.00 db9fd62c225c848050967275e23d643691be3175 1.00 a0c7f6edd1a3e2450646715aa0a037ff88db1fc8 1.00 9357448c1c298dbe1df8a2cbdb6485233c0e6417 0.00 4f3c42941ae3a8b2b6c69b65fbaf18b28f8c5958 0.00 1a7ded246da17f7c6c2aa4727cd49b871cea4cbe 0.00 ed625081fcc17e5ee7db2b4e6873e6fba309a303 1.00 10144a48ef63b862fab6201edc5c2daec02397cc 1.00 bfca8d6d582df2172f337712378d21a4b3107d02 1.00 6cd23490cefdb9f8d7bf9c867c011787c727dcfe 0.00 ff1740a2026615c843a834190d778d87f180dc9a 1.00 fc92edc544029b512dee3e2662650306e8cc60ac 0.00 a9b58a311d8f1b979900e9a379509e44cbbd73d8 0.00 764782b334e0a4595592b66c546ad9b1e9ec222d 1.00 2eb86c05f6b4a321e7ada7117739391b10866092 1.00 d980f90f472268089a410b39cbe8633d22da482a 0.00 0f8573f9e9c8a20294fd56f6a9ef5f0f1d26c25b 1.00 4390dec1fda6917d9461659311c329ae3298bd43 2.00 3aace91a23bd1281c696e8ad2d1ff678fd1ee279 0.00 405e04e258a3610e9308d64588410b80c01644a2 1.00 80944fe2bc75f2b72e33865d50f3b3326c5d11dd 1.00 925c2db58567841316695f21643a8c362e2a1b85 1.00 Name: front_score, Length: 158885, dtype: float64
One way a publisher has of promoting content is to post to their brand page. The significance of doing so is stronger when the brand page has more followers (likes).
$$ facebookPromotionProposed1_a = 15 \left( \frac {brandPageLikes_a} {\max(brandPageLikes)} \right) $$Now lets explore the data to see if that makes sense. tr;dr the formula above is incorrect
data.fb_brand_page_likes.max()
44693975.0
facebook_promotion_proposed_1 = np.ceil((15 * (data.fb_brand_page_likes / data.fb_brand_page_likes.max())).fillna(0))
facebook_promotion_proposed_1.value_counts().sort_index().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x11391d2b0>
facebook_promotion_proposed_1.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 38.00 | 0.84 | 0.37 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
bbc_co_uk | 11973.00 | 0.63 | 3.02 | 0.00 | 0.00 | 0.00 | 0.00 | 15.00 |
breitbart_com | 2707.00 | 0.90 | 0.99 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
brexitcentral_com | 53.00 | 0.92 | 0.27 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
buzzfeed_com | 2010.00 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
cnn_com | 3521.00 | 2.74 | 4.46 | 0.00 | 0.00 | 0.00 | 10.00 | 10.00 |
dailymail_co_uk | 24900.00 | 0.56 | 1.58 | 0.00 | 0.00 | 0.00 | 0.00 | 5.00 |
economist_com | 517.00 | 2.22 | 1.32 | 0.00 | 0.00 | 3.00 | 3.00 | 3.00 |
evolvepolitics_com | 74.00 | 0.50 | 0.50 | 0.00 | 0.00 | 0.50 | 1.00 | 1.00 |
foxnews_com | 6844.00 | 0.59 | 1.79 | 0.00 | 0.00 | 0.00 | 0.00 | 6.00 |
ft_com | 3792.00 | 0.50 | 0.87 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
huffingtonpost_com | 11213.00 | 0.45 | 1.26 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
independent_co_uk | 6578.00 | 0.60 | 1.20 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
indy100_com | 512.00 | 0.63 | 0.48 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
lemonde_fr | 3918.00 | 0.84 | 0.99 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
libdemvoice_org | 177.00 | 0.81 | 0.40 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
mirror_co_uk | 10525.00 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
nbcnews_com | 2207.00 | 1.99 | 2.00 | 0.00 | 0.00 | 0.00 | 4.00 | 4.00 |
newstatesman_com | 567.00 | 0.74 | 0.44 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
npr_org | 2020.00 | 1.39 | 1.50 | 0.00 | 0.00 | 0.00 | 3.00 | 3.00 |
nytimes_com | 5185.00 | 1.47 | 2.28 | 0.00 | 0.00 | 0.00 | 5.00 | 5.00 |
order-order_com | 296.00 | 0.81 | 0.39 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
propublica_org | 52.00 | 0.83 | 0.38 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
reuters_com | 6125.00 | 0.58 | 0.91 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
rt_com | 2678.00 | 0.96 | 1.00 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
skwawkbox_org | 127.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
telegraph_co_uk | 7477.00 | 0.51 | 0.87 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
thecanary_co | 243.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
theguardian_com | 8680.00 | 0.51 | 1.12 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
thetimes_co_uk | 8868.00 | 0.06 | 0.23 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
washingtonpost_com | 24227.00 | 0.19 | 0.73 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
westmonster_com | 371.00 | 0.23 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
yournewswire_com | 410.00 | 0.22 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
That's too much variation: sites like the Guardian, which have a respectable 7.5m likes, should not be scoring a 3. Lets try applying a log to it, and then standard feature scaling again.
data.fb_brand_page_likes.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 330074.00 bbc_co_uk 44693975.00 breitbart_com 3720252.00 brexitcentral_com 11212.00 buzzfeed_com 2753297.00 cnn_com 29136270.00 dailymail_co_uk 13152590.00 economist_com 8351246.00 evolvepolitics_com 113625.00 foxnews_com 16012614.00 ft_com 3699305.00 huffingtonpost_com 9795839.00 independent_co_uk 7729827.00 indy100_com 228462.00 lemonde_fr 3930501.00 libdemvoice_org 8591.00 mirror_co_uk 2904439.00 nbcnews_com 9346923.00 newstatesman_com 154712.00 npr_org 6234045.00 nytimes_com 14853130.00 order-order_com 44768.00 propublica_org 368773.00 reuters_com 3896512.00 rt_com 4624363.00 skwawkbox_org 5808.00 telegraph_co_uk 4379813.00 thecanary_co 156042.00 theguardian_com 7777911.00 thetimes_co_uk 705739.00 washingtonpost_com 6072234.00 westmonster_com 15504.00 yournewswire_com 27023.00 Name: fb_brand_page_likes, dtype: float64
np.log(2149)
7.6727578966425103
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max())
publisher_id anotherangryvoice_blogspot_co_uk 12.71 bbc_co_uk 17.62 breitbart_com 15.13 brexitcentral_com 9.32 buzzfeed_com 14.83 cnn_com 17.19 dailymail_co_uk 16.39 economist_com 15.94 evolvepolitics_com 11.64 foxnews_com 16.59 ft_com 15.12 huffingtonpost_com 16.10 independent_co_uk 15.86 indy100_com 12.34 lemonde_fr 15.18 libdemvoice_org 9.06 mirror_co_uk 14.88 nbcnews_com 16.05 newstatesman_com 11.95 npr_org 15.65 nytimes_com 16.51 order-order_com 10.71 propublica_org 12.82 reuters_com 15.18 rt_com 15.35 skwawkbox_org 8.67 telegraph_co_uk 15.29 thecanary_co 11.96 theguardian_com 15.87 thetimes_co_uk 13.47 washingtonpost_com 15.62 westmonster_com 9.65 yournewswire_com 10.20 Name: fb_brand_page_likes, dtype: float64
That's more like it, but the lower numbers should be smaller.
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max() / 1000)
publisher_id anotherangryvoice_blogspot_co_uk 5.80 bbc_co_uk 10.71 breitbart_com 8.22 brexitcentral_com 2.42 buzzfeed_com 7.92 cnn_com 10.28 dailymail_co_uk 9.48 economist_com 9.03 evolvepolitics_com 4.73 foxnews_com 9.68 ft_com 8.22 huffingtonpost_com 9.19 independent_co_uk 8.95 indy100_com 5.43 lemonde_fr 8.28 libdemvoice_org 2.15 mirror_co_uk 7.97 nbcnews_com 9.14 newstatesman_com 5.04 npr_org 8.74 nytimes_com 9.61 order-order_com 3.80 propublica_org 5.91 reuters_com 8.27 rt_com 8.44 skwawkbox_org 1.76 telegraph_co_uk 8.38 thecanary_co 5.05 theguardian_com 8.96 thetimes_co_uk 6.56 washingtonpost_com 8.71 westmonster_com 2.74 yournewswire_com 3.30 Name: fb_brand_page_likes, dtype: float64
scaled_fb_brand_page_likes = (data.fb_brand_page_likes / 1000)
facebook_promotion_proposed_2 = np.ceil(\
(15 * \
(np.log(scaled_fb_brand_page_likes) / np.log(scaled_fb_brand_page_likes.max()))\
)\
).fillna(0)
facebook_promotion_proposed_2.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 9.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 4.00 buzzfeed_com 12.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 7.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 8.00 lemonde_fr 12.00 libdemvoice_org 4.00 mirror_co_uk 12.00 nbcnews_com 13.00 newstatesman_com 8.00 npr_org 13.00 nytimes_com 14.00 order-order_com 6.00 propublica_org 9.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 3.00 telegraph_co_uk 12.00 thecanary_co 8.00 theguardian_com 13.00 thetimes_co_uk 10.00 washingtonpost_com 13.00 westmonster_com 4.00 yournewswire_com 5.00 Name: fb_brand_page_likes, dtype: float64
LGTM. So the equation is
$$ facebookPromotion_a = 15 \left( \frac {\log(\frac {brandPageLikes_a}{1000})} {\log(\frac {\max(brandPageLikes)}{1000}))} \right) $$Now, let's try applying standard feature scaling approch to this, rather than using a magic number of 1,000. That equation would be:
\begin{align} unscaledFacebookPromotion_a &= \log(brandPageLikes_a) \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \min(unscaledFacebookPromotion)}{\max(unscaledFacebookPromotion) - \min(unscaledFacebookPromotion)} \\ \\ \text{The scaling can be simplified to:} \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \\ \\ \text{Meaning the overall equation becomes:} \\ facebookPromotion_a &= 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \end{align}facebook_promotion_proposed_3 = np.ceil(
(14 *
(
(np.log(data.fb_brand_page_likes) - np.log(data.fb_brand_page_likes.min()) ) /
(np.log(data.fb_brand_page_likes.max()) - np.log(data.fb_brand_page_likes.min()))
)
) + 1
)
facebook_promotion_proposed_3.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 3.00 buzzfeed_com 11.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 6.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 7.00 lemonde_fr 12.00 libdemvoice_org 2.00 mirror_co_uk 11.00 nbcnews_com 13.00 newstatesman_com 7.00 npr_org 12.00 nytimes_com 14.00 order-order_com 5.00 propublica_org 8.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 2.00 telegraph_co_uk 12.00 thecanary_co 7.00 theguardian_com 13.00 thetimes_co_uk 9.00 washingtonpost_com 12.00 westmonster_com 3.00 yournewswire_com 4.00 Name: fb_brand_page_likes, dtype: float64
data["facebook_promotion_score"] = facebook_promotion_proposed_3.fillna(0.0)
data["promotion_score"] = (data.lead_score + data.front_score + data.facebook_promotion_score)
data["attention_index"] = (data.promotion_score + data.response_score)
data.promotion_score.plot.hist(bins=np.arange(50), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x117d636d8>
data.attention_index.plot.hist(bins=np.arange(100), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x11b1a8dd8>
data.attention_index.value_counts().sort_index()
0.00 25301 1.00 19223 2.00 13018 3.00 9206 4.00 7096 5.00 5298 6.00 4565 7.00 4207 8.00 3798 9.00 3294 10.00 3141 11.00 2918 12.00 2747 13.00 2446 14.00 2401 15.00 2309 16.00 2109 17.00 1949 18.00 1853 19.00 1811 20.00 1774 21.00 1788 22.00 1726 23.00 1636 24.00 1589 25.00 1552 26.00 1594 27.00 1445 28.00 1437 29.00 1384 ... 67.00 94 68.00 92 69.00 62 70.00 60 71.00 48 72.00 54 73.00 44 74.00 35 75.00 41 76.00 39 77.00 32 78.00 38 79.00 30 80.00 26 81.00 22 82.00 27 83.00 27 84.00 12 85.00 9 86.00 16 87.00 10 88.00 18 89.00 11 90.00 3 91.00 7 92.00 2 93.00 5 94.00 3 95.00 1 98.00 1 Name: attention_index, Length: 97, dtype: int64
# and lets see the articles with the biggest attention index
data.sort_values("attention_index", ascending=False)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
82f641a891db565edfcef82af5dac6e60f18dc29 | http://www.bbc.co.uk/news/world-europe-41780116 | Catalans declare independence from Spain | 2017-10-27 13:28:14.944 | 2017-10-27 13:26:28.000 | 122017 | 735.45 | 2017-10-27T13:49:10.398Z | 25016 | 82794 | 14207 | ... | True | 44557554.00 | 2017-10-27T13:36:37.000Z | 96 | 50.00 | 20.00 | 13.00 | 15.00 | 48.00 | 98.00 |
fb7f2c7fc2b3d5d441f187c2cc298eb68481caca | http://www.bbc.co.uk/news/world-us-canada-4146... | 'Active shooter' near Las Vegas casino | 2017-10-02 05:58:13.020 | 2017-10-02 05:55:50.000 | 155740 | 282.82 | 2017-10-02T12:02:05.884Z | 36936 | 96254 | 22550 | ... | True | 44231064.00 | 2017-10-02T10:52:53.000Z | 96 | 50.00 | 20.00 | 10.00 | 15.00 | 45.00 | 95.00 |
55a653279260f78e8c3d8fdae5ac3ff000ff1a63 | http://www.cnn.com/2017/10/02/us/las-vegas-sho... | Portraits of the victims of the Las Vegas shoo... | 2017-10-02 22:13:24.653 | 2017-10-02 22:08:01.000 | 121221 | 208.76 | 2017-10-03T00:26:03.622Z | 10651 | 96865 | 13705 | ... | True | 28779427.00 | 2017-10-02T23:30:10.000Z | 105 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
67d0bda0783da4c5d2ce176c1d3144d21b226c7a | http://www.cnn.com/2017/10/30/politics/paul-ma... | Manafort to turn himself in to Mueller, source... | 2017-10-30 11:58:15.661 | 2017-10-30 11:55:36.000 | 163516 | 678.74 | 2017-10-30T12:10:03.568Z | 47714 | 92136 | 23666 | ... | True | 29101437.00 | 2017-10-30T11:57:37.000Z | 105 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
f9ec6620251671f7010b07793542152080b97f63 | http://www.bbc.co.uk/news/world-europe-41463719 | Catalonia has 'won right to statehood' | 2017-10-01 20:49:10.902 | 2017-10-01 20:46:53.000 | 92269 | 283.28 | 2017-10-02T02:12:11.055Z | 11378 | 71867 | 9024 | ... | True | 44223789.00 | 2017-10-02T01:52:48.000Z | 96 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
cec51f99475b0936b80e7e1f07661fd769850060 | http://www.cnn.com/2017/10/08/politics/vice-pr... | Mike Pence leaves Colts game after anthem protest | 2017-10-08 17:58:20.413 | 2017-10-08 17:54:50.000 | 154384 | 515.68 | 2017-10-08T18:40:09.200Z | 64195 | 77072 | 13117 | ... | True | 28829396.00 | 2017-10-08T18:21:38.000Z | 105 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
d6c669a2a511a179c138c95cd00ecd21f67786ca | http://www.cnn.com/2017/10/16/health/puerto-ri... | Floating hospital sits near Puerto Rico after ... | 2017-10-16 22:31:22.572 | 2017-10-16 22:29:14.000 | 95905 | 225.62 | 2017-10-17T18:04:09.923Z | 16034 | 63684 | 16187 | ... | True | 29040951.00 | 2017-10-17T02:29:12.000Z | 105 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
a40b5a3781bcdbb51023839d1e504623829509f7 | https://www.buzzfeed.com/stephaniemcneal/lular... | Women Say They're Stuck With $20,000 Of Worthl... | 2017-10-25 20:40:31.730 | 2017-10-25 20:38:10.000 | 140824 | 764.35 | 2017-10-26T15:09:08.794Z | 71842 | 53756 | 15226 | ... | True | 2746668.00 | 2017-10-25T23:34:00.000Z | 147 | 50.00 | 19.00 | 13.00 | 11.00 | 43.00 | 93.00 |
6f31aba0d9c375c9b6658dba325de9ba6aab5472 | http://www.cnn.com/2017/10/27/politics/first-c... | Exclusive: First charges filed in Mueller inve... | 2017-10-28 00:37:18.546 | 2017-10-28 00:29:59.000 | 676010 | 4464.02 | 2017-10-28T01:09:09.274Z | 173337 | 425374 | 77299 | ... | True | 29089089.00 | 2017-10-28T00:46:40.000Z | 105 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
6a0116039bbc12a38eed333af50ac8ccbb14e530 | http://www.bbc.co.uk/news/world-us-canada-4152... | Trump rolls back access to free birth control | 2017-10-06 16:07:06.751 | 2017-10-06 16:05:48.000 | 47522 | 156.57 | 2017-10-06T18:39:06.354Z | 14732 | 28303 | 4487 | ... | True | 44281131.00 | 2017-10-06T18:28:26.000Z | 96 | 46.00 | 20.00 | 12.00 | 15.00 | 47.00 | 93.00 |
95a1f2bb7086f89ff2bf066faece9605d3843cee | http://www.cnn.com/2017/10/02/us/las-vegas-att... | Here's what we know about Stephen Paddock, the... | 2017-10-02 12:37:23.217 | 2017-10-02 12:31:34.000 | 67404 | 171.83 | 2017-10-02T13:59:07.658Z | 23031 | 34768 | 9605 | ... | True | 28768865.00 | 2017-10-02T13:44:02.000Z | 105 | 48.00 | 20.00 | 9.00 | 15.00 | 44.00 | 92.00 |
b0a7e0c7b10ee97b2901390f0ce1e495cc7608e3 | https://www.buzzfeed.com/josephbernstein/heres... | Here's How Breitbart And Milo Smuggled White N... | 2017-10-05 20:28:27.591 | 2017-10-04 20:22:03.000 | 120844 | 735.92 | 2017-10-06T05:05:10.751Z | 36491 | 58391 | 25962 | ... | True | 2713110.00 | 2017-10-05T21:11:19.000Z | 147 | 50.00 | 19.00 | 12.00 | 11.00 | 42.00 | 92.00 |
f075885c7f633014983f3a4dfa8a2568e3f30820 | https://www.buzzfeed.com/adambvary/anthony-rap... | Actor Anthony Rapp: Kevin Spacey Made A Sexual... | 2017-10-30 01:34:21.416 | 2017-10-30 01:32:42.000 | 63217 | 277.41 | 2017-10-30T03:07:10.078Z | 28302 | 28576 | 6339 | ... | True | 2749565.00 | 2017-10-30T02:13:58.000Z | 147 | 47.00 | 19.00 | 14.00 | 11.00 | 44.00 | 91.00 |
bb779a59d0768a95e90a7064a4cc96d385802150 | http://www.cnn.com/2017/10/02/us/las-vegas-sho... | Las Vegas shooting: Live updates | 2017-10-02 06:43:21.668 | 2017-10-02 06:40:33.000 | 250916 | 577.52 | 2017-10-02T12:56:06.116Z | 41731 | 178605 | 30580 | ... | True | 28767961.00 | 2017-10-02T12:45:14.000Z | 105 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
fa25e376e29f7eff2f858ef404c04ccb1764fd96 | http://www.cnn.com/2017/10/30/politics/donald-... | Trump 'seething' as Mueller probe reaches form... | 2017-10-30 23:43:17.351 | 2017-10-30 23:37:29.000 | 79114 | 300.46 | 2017-10-31T01:47:07.847Z | 13042 | 61458 | 4614 | ... | True | 29102689.00 | 2017-10-31T01:31:10.000Z | 105 | 49.00 | 20.00 | 7.00 | 15.00 | 42.00 | 91.00 |
83f4006ad4366078fe5620632914e763dfed5d24 | http://www.cnn.com/2017/10/13/us/california-fi... | Woman dies in husband's arms while hiding in s... | 2017-10-13 19:31:28.678 | 2017-10-13 19:29:10.000 | 61063 | 131.60 | 2017-10-14T06:21:11.198Z | 3836 | 53852 | 3375 | ... | True | 29025532.00 | 2017-10-14T06:00:28.000Z | 105 | 47.00 | 20.00 | 9.00 | 15.00 | 44.00 | 91.00 |
bf9df1d59d28db1dc8a586a19bddec58c9dfcd56 | http://www.cnn.com/2017/10/17/politics/trump-j... | Trump warns John McCain: 'Be careful ... I fig... | 2017-10-17 15:31:27.783 | 2017-10-17 15:24:46.000 | 71046 | 241.69 | 2017-10-17T16:45:09.863Z | 24910 | 40669 | 5467 | ... | True | 29044379.00 | 2017-10-17T16:30:24.000Z | 105 | 48.00 | 20.00 | 8.00 | 15.00 | 43.00 | 91.00 |
10f7302737a4fe288003babdfa2e9350e36556ff | https://www.buzzfeed.com/coralewis/these-are-t... | These Are The Victims Of The Las Vegas Shooting | 2017-10-02 15:43:19.061 | 2017-10-02 15:30:45.000 | 246234 | 1436.61 | 2017-10-03T17:06:12.166Z | 15576 | 193415 | 37243 | ... | True | 2710339.00 | 2017-10-02T23:08:00.000Z | 147 | 50.00 | 19.00 | 11.00 | 11.00 | 41.00 | 91.00 |
daff5a6de1600351e1301260c729478c9dc2e36a | http://www.bbc.co.uk/news/entertainment-arts-4... | Rock and roll legend Fats Domino dies | 2017-10-25 14:43:15.954 | 2017-10-25 14:39:46.000 | 82618 | 507.52 | 2017-10-25T15:15:11.832Z | 10435 | 56692 | 15491 | ... | True | 44530330.00 | 2017-10-25T14:42:20.000Z | 96 | 49.00 | 20.00 | 7.00 | 15.00 | 42.00 | 91.00 |
beab15b7777cc9d822f21d549f76bf12fcdb53a2 | http://www.cnn.com/2017/10/31/us/new-york-shot... | Shots fired in Manhattan; one person in custody | 2017-10-31 19:43:20.020 | 2017-10-31 19:38:30.000 | 179977 | 545.86 | 2017-10-31T21:07:09.102Z | 38732 | 105983 | 35262 | ... | True | 29107811.00 | 2017-10-31T20:11:44.000Z | 105 | 50.00 | 20.00 | 5.00 | 15.00 | 40.00 | 90.00 |
d221e354b313ce4e5536c4a0dee1dfd7cca379f3 | http://www.cnn.com/2017/10/09/us/california-fi... | Wildfires rage in swath of California's wine c... | 2017-10-09 11:52:23.501 | 2017-10-09 11:49:24.000 | 79676 | 153.43 | 2017-10-10T01:16:09.079Z | 20985 | 48458 | 10233 | ... | True | 28832711.00 | 2017-10-10T01:02:09.000Z | 105 | 49.00 | 20.00 | 6.00 | 15.00 | 41.00 | 90.00 |
80756bf5f41ea79c83dddf95acfea413cedc47a6 | https://www.buzzfeed.com/laurageiser/las-vegas... | Photos Show The Terrifying Aftermath Of Las Ve... | 2017-10-02 10:34:10.208 | 2017-10-02 10:18:42.000 | 147698 | 420.10 | 2017-10-03T07:02:10.022Z | 12795 | 113076 | 21827 | ... | True | 2709326.00 | 2017-10-02T12:56:40.000Z | 147 | 50.00 | 19.00 | 10.00 | 11.00 | 40.00 | 90.00 |
3c40e0ffdf776c5bfef61de5ef7152eb00dc68d1 | http://www.cnn.com/2017/10/12/politics/obamaca... | Trump will end health care cost-sharing subsidies | 2017-10-13 03:01:31.361 | 2017-10-13 02:56:41.000 | 82624 | 200.70 | 2017-10-13T03:33:11.986Z | 31282 | 41245 | 10097 | ... | True | 29018649.00 | 2017-10-13T03:16:50.000Z | 105 | 49.00 | 20.00 | 5.00 | 15.00 | 40.00 | 89.00 |
21baf75c0cee1455134ba03272544695a9632d43 | http://www.bbc.co.uk/news/world-us-canada-4182... | Casualties reported after New York 'shooting' | 2017-10-31 19:37:10.279 | 2017-10-31 19:34:57.000 | 56934 | 336.38 | 2017-10-31T19:48:11.900Z | 10368 | 33744 | 12822 | ... | True | 44596174.00 | 2017-10-31T19:35:52.000Z | 96 | 47.00 | 20.00 | 7.00 | 15.00 | 42.00 | 89.00 |
ee3157d189708e28111bca2672b17c29164c3536 | https://www.buzzfeed.com/nidhiprakash/puerto-r... | Puerto Rico's Government Just Admitted 911 Peo... | 2017-10-27 22:34:23.096 | 2017-10-27 21:53:21.000 | 43600 | 38.37 | 2017-10-28T20:34:06.012Z | 4928 | 27727 | 10945 | ... | True | 2748557.00 | 2017-10-27T23:44:01.000Z | 147 | 45.00 | 19.00 | 14.00 | 11.00 | 44.00 | 89.00 |
4eac773f3ebe0d94a0fa4a6b6670ddf917c2731a | http://www.huffingtonpost.com/entry/las-vegas-... | Las Vegas Police Investigate Reports Of Active... | 2017-10-02 05:59:14.138 | 2017-10-02 05:55:25.000 | 369539 | 945.57 | 2017-10-02T12:44:12.235Z | 76304 | 232393 | 60842 | ... | True | 9751564.00 | 2017-10-02T06:01:09.000Z | 215 | 50.00 | 17.00 | 9.00 | 13.00 | 39.00 | 89.00 |
516109aeae85946b20f0815443503941a9377f91 | http://www.cnn.com/2017/10/19/politics/bush-sp... | George W. Bush just laid the smackdown on Trum... | 2017-10-19 18:53:18.568 | 2017-10-19 18:49:39.000 | 92682 | 261.19 | 2017-10-19T23:43:06.164Z | 14364 | 69760 | 8558 | ... | True | 29056712.00 | 2017-10-19T23:30:15.000Z | 105 | 50.00 | 20.00 | 4.00 | 15.00 | 39.00 | 89.00 |
cac2ab0eb67721d299e607b8f0fffd7231ea75d1 | http://www.cnn.com/2017/10/05/politics/special... | Mueller's team met with Russia dossier author | 2017-10-05 22:07:21.402 | 2017-10-05 22:03:50.000 | 41500 | 197.71 | 2017-10-05T22:28:13.927Z | 7478 | 27644 | 6378 | ... | True | 28813554.00 | 2017-10-05T22:15:08.000Z | 105 | 45.00 | 20.00 | 9.00 | 15.00 | 44.00 | 89.00 |
379ec53f3e91218fdedbc34f3c1353d022721d34 | http://www.cnn.com/2017/10/01/politics/donald-... | Trump: Tillerson 'wasting his time' negotiatin... | 2017-10-01 14:52:21.832 | 2017-10-01 14:49:24.000 | 44896 | 145.64 | 2017-10-01T16:14:03.920Z | 15331 | 22588 | 6977 | ... | True | 28745597.00 | 2017-10-01T15:19:18.000Z | 105 | 45.00 | 20.00 | 9.00 | 15.00 | 44.00 | 89.00 |
ca2cdf87d5f8fa84da0437cfa9274cd1df3cb1c5 | http://www.cnn.com/2017/10/03/politics/russian... | Exclusive: Russian-linked Facebook ads targete... | 2017-10-04 01:34:27.207 | 2017-10-04 01:30:07.000 | 56373 | 276.02 | 2017-10-04T01:45:10.551Z | 12626 | 33241 | 10506 | ... | True | 28795396.00 | 2017-10-04T01:37:10.000Z | 105 | 47.00 | 20.00 | 7.00 | 15.00 | 42.00 | 89.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
02c0972c41194fb4cd15f19804aeb894cc2c5bee | https://www.washingtonpost.com/world/asia_paci... | AP PHOTOS: Portraits of Rohingya survivors of ... | 2017-10-27 04:58:15.224 | 2017-10-27 04:53:31.000 | 0 | 0.00 | 2017-10-27T05:09:12.445Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5c9a06cf457cd2602d5cc3f89b773cc77c6ac968 | https://www.theguardian.com/australia-news/vid... | Whistleblowers allege Crown casino tampered wi... | 2017-10-18 05:40:03.973 | 2017-10-18 05:39:46.000 | 1 | 0.02 | 2017-10-18T06:51:09.099Z | 0 | 0 | 1 | ... | False | nan | NaN | 142 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3c5cde42b98c1493c6339db25cde00b3c22db267 | https://www.nbcnews.com/card/carrie-barnette-n... | nbcnews:card_text | 2017-10-05 13:43:19.752 | 2017-10-05 13:41:53.000 | 0 | 0.00 | 2017-10-05T14:55:09.510Z | 0 | 0 | 0 | ... | False | nan | NaN | 826 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
91482986a8d1514c8e4dfe4ba5421e9ed4a4e7ed | http://www.bbc.co.uk/news/uk-wales-politics-41... | Welsh Development Bank chief will not be based... | 2017-10-18 05:40:06.627 | 2017-10-18 05:36:57.000 | 1 | 0.02 | 2017-10-18T07:51:11.799Z | 0 | 0 | 1 | ... | False | nan | NaN | 96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
eefc2e0e12b67df110332240509e65e95651ca4b | https://www.washingtonpost.com/sports/national... | After 3K, Beltre wants 2018 in Texas to anothe... | 2017-10-02 19:55:19.710 | 2017-10-02 19:50:22.000 | 0 | 0.00 | 2017-10-03T01:11:10.085Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
80aca1202079bde5eb32962f06515b0f0bc4dca0 | http://www.bbc.co.uk/news/av/uk-41771399/hallo... | Halloween at the zoo | 2017-10-27 05:10:14.420 | 2017-10-27 05:06:33.000 | 0 | 0.00 | 2017-10-27T05:22:06.633Z | 0 | 0 | 0 | ... | False | nan | NaN | 96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1779d2c71fcf3e462aac8fe1c628f02d6b789f5d | http://www.huffingtonpost.com/entry/privileged... | Privileged | 2017-10-02 20:04:25.114 | 2017-10-02 19:50:50.424 | 0 | 0.00 | 2017-10-03T01:18:11.551Z | 0 | 0 | 0 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
654e1e4f92b9f174cad4bbba83cae806bd24f3de | https://www.washingtonpost.com/national/market... | Markets Right Now: US stocks edged higher in e... | 2017-10-05 13:52:17.547 | 2017-10-05 13:43:22.000 | 0 | 0.00 | 2017-10-05T15:04:10.354Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
55e64897d16d13eae4f53b3b56b647f60ce1ab6c | http://www.bbc.co.uk/news/av/world-us-canada-4... | Top-secret JFK files | 2017-10-27 05:07:13.301 | 2017-10-27 05:03:47.000 | 0 | 0.00 | 2017-10-27T05:19:05.694Z | 0 | 0 | 0 | ... | False | nan | NaN | 96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
af63fd3495409c98948ab9364f8f411f909af1ad | https://www.nbcnews.com/card/nicol-kimura-n807696 | nbcnews:card_text | 2017-10-05 13:46:14.450 | 2017-10-05 13:43:20.000 | 0 | 0.00 | 2017-10-05T14:58:07.471Z | 0 | 0 | 0 | ... | False | nan | NaN | 826 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
bd3c218f72e56d810d95c0e73d30eb7de098d7e5 | https://www.washingtonpost.com/business/austra... | Australian casino denies lawmaker’s criminal a... | 2017-10-18 05:13:22.000 | 2017-10-18 05:08:14.000 | 0 | 0.00 | 2017-10-18T05:24:09.122Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3812f835fdba0ae032cefa8741151163a99605ee | https://www.washingtonpost.com/sports/wizards/... | Griffin’s 3-pointer beats Blazers, keeps Clipp... | 2017-10-27 05:07:17.833 | 2017-10-27 05:01:09.000 | 0 | 0.00 | 2017-10-27T05:19:05.696Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
86924e65577ee3d3b7b815016b08e3e75538b4e3 | http://www.huffingtonpost.com/entry/the-man-fr... | "The Man from Mesquite Was Not Discreet’ By Re... | 2017-10-02 20:04:30.044 | 2017-10-02 19:51:44.779 | 0 | 0.00 | 2017-10-03T01:18:11.561Z | 0 | 0 | 0 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c8121b18ed4fec985495254e8aee6db7e6d50df5 | https://www.washingtonpost.com/national/housto... | Houston-area “Tourniquet Killer” set to die | 2017-10-18 05:13:21.269 | 2017-10-18 05:08:21.000 | 0 | 0.00 | 2017-10-18T05:24:09.121Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
7a15a79697d14b7ad795d3ed74700d1cb5ce91b1 | https://www.washingtonpost.com/sports/national... | A year after 1st title since 1908, Cubs trail ... | 2017-10-18 05:13:19.308 | 2017-10-18 05:09:27.000 | 0 | 0.00 | 2017-10-18T05:24:09.120Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c75ed341bea900137000eaa60445cd10df57cede | https://www.washingtonpost.com/sports/capitals... | Perron leads Vegas past the Sabres in overtime... | 2017-10-18 05:13:19.036 | 2017-10-18 05:10:34.000 | 0 | 0.00 | 2017-10-18T05:24:09.118Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f46fa5010355b1586993f4b8c485876f645b388b | http://www.dailymail.co.uk/tvshowbiz/article-4... | Katy Perry shares bizarre video of cupping pro... | 2017-10-18 05:16:21.443 | 2017-10-18 05:12:56.000 | 1 | 0.02 | 2017-10-18T18:33:09.907Z | 0 | 0 | 1 | ... | False | nan | NaN | 158 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
19a847cd1839b45086f13a4da14f18b4bbabb034 | https://www.washingtonpost.com/sports/capitals... | Couture scores twice in Sharks’ 5-2 victory ov... | 2017-10-18 05:22:16.440 | 2017-10-18 05:16:35.000 | 0 | 0.00 | 2017-10-18T05:33:10.876Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
0c480b0c0a89ec79422232747f480131b92f0d2f | https://www.washingtonpost.com/local/southern-... | Home sales for Calvert, Charles and St. Mary’s... | 2017-10-05 13:52:15.357 | 2017-10-05 13:42:52.000 | 0 | 0.00 | 2017-10-05T15:04:10.351Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2a10f2a01a7fba6d11999773e61cdcf944ca2c3c | https://www.nbcnews.com/card/denise-cohen-n807651 | nbcnews:card_text | 2017-10-05 13:43:18.597 | 2017-10-05 13:42:32.000 | 0 | 0.00 | 2017-10-05T14:55:09.509Z | 0 | 0 | 0 | ... | False | nan | NaN | 826 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
58227a10730242e3d915ac2b3388fafdaeb52e56 | https://www.washingtonpost.com/local/alexandri... | Alexandria launches campaign to promote Metror... | 2017-10-18 05:34:20.280 | 2017-10-18 05:22:10.000 | 0 | 0.00 | 2017-10-18T05:46:06.441Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5b462c9f5a37417714123cbc18822bd6564e6a5e | https://www.washingtonpost.com/sports/wizards/... | 76ers’ Fultz gets to make NBA debut at Wizards... | 2017-10-18 05:34:22.200 | 2017-10-18 05:22:20.000 | 1 | 0.02 | 2017-10-18T08:47:11.413Z | 0 | 0 | 1 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a01768152856e76a0ed39277e38d86a693fbe7bc | https://www.washingtonpost.com/national/women-... | Women in California Capitol speak out against ... | 2017-10-18 05:31:18.648 | 2017-10-18 05:23:16.000 | 1 | 0.02 | 2017-10-18T16:46:11.233Z | 0 | 0 | 1 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
cc465581129290ac85bde303ee8569636cdf9e96 | https://www.washingtonpost.com/world/asia_paci... | Car bomb kills 4 police, 2 civilians in southw... | 2017-10-18 05:34:23.107 | 2017-10-18 05:25:15.000 | 0 | 0.00 | 2017-10-18T05:46:06.445Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b5665edf2f09e3f0c53eb90958a81464bbc9b698 | https://www.washingtonpost.com/business/techno... | China’s Xi calls for more technology development | 2017-10-18 05:34:23.355 | 2017-10-18 05:25:23.000 | 0 | 0.00 | 2017-10-18T05:46:06.446Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c2abd2a6fa5b760f9c3456efdc1ca9d635ce1ed9 | https://www.washingtonpost.com/sports/colleges... | Go for 2 or play for OT? How coaches make that... | 2017-10-18 05:34:21.155 | 2017-10-18 05:27:18.000 | 0 | 0.00 | 2017-10-18T05:46:06.443Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2cb52cf6cf9b5f81cea9166fd15b9916ce5532fa | https://www.washingtonpost.com/sports/colleges... | No. 13 Notre Dame dreams alive and well at sea... | 2017-10-18 05:43:15.630 | 2017-10-18 05:31:20.000 | 0 | 0.00 | 2017-10-18T05:54:07.811Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a09d2a501db72426813643eb873fe2caa0a81848 | https://www.huffingtonpost.com/entry/are-custo... | Are Customers Finally Ready to Adopt Smart Hom... | 2017-10-18 05:46:22.312 | 2017-10-18 05:33:28.519 | 1 | 0.02 | 2017-10-18T08:58:03.893Z | 0 | 0 | 1 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
dc2eda44b3595ed320b2bc8adc38db32cf1a8c10 | https://www.washingtonpost.com/business/subaru... | Subaru investigates its own inspections after ... | 2017-10-27 05:07:20.004 | 2017-10-27 04:59:49.000 | 0 | 0.00 | 2017-10-27T05:19:05.698Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
563c25c593fb24594bbb4c00defa2c2122fea1dc | https://www.washingtonpost.com/politics/ap-sou... | AP Source: Wyo. Senate race might see insurgen... | 2017-10-09 08:01:12.645 | 2017-10-09 07:48:30.000 | 0 | 0.00 | 2017-10-09T19:16:11.987Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
158885 rows × 25 columns
data["score_diff"] = data.promotion_score - data.response_score
# promoted but low response
data.sort_values("score_diff", ascending=False).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
e43a9b30900aab9765ce74c00665e24da573a56c | https://www.buzzfeed.com/aishagani/people-are-... | People Are Pointing Out A Black Woman Started ... | 2017-10-19 11:43:24.546 | 2017-10-17 13:26:51 | 0 | 0.00 | 2017-10-19T11:55:04.504Z | 0 | 0 | 0 | ... | 2738311.00 | 2017-10-19T14:45:00.000Z | 147 | 0.00 | 19.00 | 13.00 | 11.00 | 43.00 | 43.00 | 43.00 |
2b852dff5e7a3e4352681245f98178a56a52689b | https://www.buzzfeed.com/monicamark/meet-the-b... | Meet The Badass Women Wrestlers Of Senegal | 2017-10-27 12:58:12.984 | 2017-10-23 14:54:04 | 3 | 0.27 | 2017-10-27T13:10:07.398Z | 0 | 0 | 3 | ... | 2748890.00 | 2017-10-28T17:45:00.000Z | 147 | 1.00 | 19.00 | 13.00 | 11.00 | 43.00 | 44.00 | 42.00 |
74f30801f07e697b1804a420c37cddc54563401c | https://www.buzzfeed.com/danvergano/how-a-us-r... | How A US Raid On An Afghan Village Went Wrong | 2017-10-17 16:01:29.821 | 2017-10-16 16:57:47 | 18 | 0.28 | 2017-10-17T17:12:09.350Z | 2 | 3 | 13 | ... | 2736252.00 | 2017-10-17T23:04:00.000Z | 147 | 3.00 | 19.00 | 14.00 | 11.00 | 44.00 | 47.00 | 41.00 |
68c78eedcf54c1421649c1287bda9cd7cdc624ef | https://www.buzzfeed.com/arianelange/fbi-in-ho... | Inside The FBI's Half-Secret Relationship With... | 2017-10-09 16:34:18.783 | 2017-10-07 17:17:50 | 2 | 0.18 | 2017-10-09T16:46:05.185Z | 0 | 0 | 2 | ... | 2714329.00 | 2017-10-09T19:21:00.000Z | 147 | 1.00 | 19.00 | 12.00 | 11.00 | 42.00 | 43.00 | 41.00 |
c616bb3ad332266cfa9dfc11b2d1cc2518b466d0 | https://www.buzzfeed.com/talalansari/trumps-an... | Trump’s Anti-Islam Rhetoric Convinced These Mu... | 2017-10-31 13:04:18.596 | 2017-10-26 18:18:02 | 1 | 0.10 | 2017-10-31T13:15:06.299Z | 0 | 0 | 1 | ... | 2750122.00 | 2017-10-31T14:32:31.000Z | 147 | 0.00 | 19.00 | 7.00 | 11.00 | 37.00 | 37.00 | 37.00 |
e30407d0a2a955d86fd7299f4df704bd8f9d7f7e | https://www.buzzfeed.com/kateaurthur/harvey-we... | Harvey Weinstein's Leave May Or May Not Be Per... | 2017-10-07 00:24:08.404 | 2017-10-06 23:39:18 | 32 | 0.20 | 2017-10-07T00:35:07.428Z | 2 | 13 | 17 | ... | 2714137.00 | 2017-10-09T00:28:00.000Z | 147 | 5.00 | 19.00 | 11.00 | 11.00 | 41.00 | 46.00 | 36.00 |
13fbafcfe4f592d7705b509fa4156fb1f30704d2 | https://www.buzzfeed.com/verabergengruen/for-t... | For These Veterans, Growing Pot Isn't Just A J... | 2017-10-18 14:01:22.471 | 2017-10-17 16:44:09 | 25 | 0.25 | 2017-10-18T17:14:11.226Z | 0 | 9 | 16 | ... | 2737415.00 | 2017-10-19T00:44:00.000Z | 147 | 4.00 | 19.00 | 10.00 | 11.00 | 40.00 | 44.00 | 36.00 |
10d029b2ac847112ad6c076081c70041fcfe5fe3 | https://www.buzzfeed.com/claudiarosenbaum/harv... | Harvey Weinstein Is Suing His Old Company To O... | 2017-10-26 21:22:23.665 | 2017-10-26 21:20:36 | 67 | 0.11 | 2017-10-26T22:35:03.393Z | 8 | 36 | 23 | ... | 2747080.00 | 2017-10-27T03:27:00.000Z | 147 | 8.00 | 19.00 | 12.00 | 11.00 | 42.00 | 50.00 | 34.00 |
ac204bbe8809292e367036ba771b54ca74182593 | https://www.buzzfeed.com/karlazabludovsky/thes... | These Women Tried To Take Hashtag Activism Int... | 2017-10-29 14:22:21.368 | 2017-10-28 13:28:16 | 0 | 0.00 | 2017-10-29T14:34:05.941Z | 0 | 0 | 0 | ... | nan | NaN | 147 | 0.00 | 19.00 | 15.00 | 0.00 | 34.00 | 34.00 | 34.00 |
7491dfb6c163d82d7010eb957bef64dd17ceabd2 | https://www.buzzfeed.com/craigsilverman/rememb... | Myspace Looked Like It Was Back. Actually, It ... | 2017-10-27 15:37:16.042 | 2017-10-27 15:16:17 | 126 | 0.39 | 2017-10-27T16:49:07.055Z | 14 | 41 | 71 | ... | 2747403.00 | 2017-10-27T16:20:27.000Z | 147 | 11.00 | 19.00 | 15.00 | 11.00 | 45.00 | 56.00 | 34.00 |
b60e623d5f4a6d1295d7415b784122f18fa8e871 | https://www.buzzfeed.com/adriancarrasquillo/th... | The Trump (Alternate) Reality Show | 2017-10-09 22:13:24.662 | 2017-10-09 21:49:37 | 119 | 0.62 | 2017-10-10T00:25:09.375Z | 36 | 60 | 23 | ... | 2714352.00 | 2017-10-09T23:32:00.000Z | 147 | 11.00 | 19.00 | 14.00 | 11.00 | 44.00 | 55.00 | 33.00 |
de1121cdb0ba8b77f91e9a68c7b5c4009054a70d | https://www.buzzfeed.com/hillarycrosleycoker/w... | Untitled Draft 10/06/2017 5:43 PM | 2017-10-08 15:29:11.458 | 2017-10-07 00:29:42 | 1 | 0.00 | 2017-10-08T15:40:07.699Z | 0 | 0 | 1 | ... | nan | NaN | 147 | 0.00 | 19.00 | 13.00 | 0.00 | 32.00 | 32.00 | 32.00 |
5b9ee08340d9d6f6c0d6dca6042220bf4bcb2721 | https://www.buzzfeed.com/johnstanton/so-many-f... | So Many Father-Led Families Are Crossing The U... | 2017-10-23 15:19:26.080 | 2017-10-23 15:16:49 | 185 | 0.60 | 2017-10-23T17:31:12.251Z | 44 | 89 | 52 | ... | 2745048.00 | 2017-10-23T16:21:42.000Z | 147 | 13.00 | 19.00 | 15.00 | 11.00 | 45.00 | 58.00 | 32.00 |
320c913f5a57cb5b013ec7be22d48390164dd1b4 | https://www.buzzfeed.com/paulmcleod/5-ways-pre... | 5 Ways President Trump Could Undermine Obamaca... | 2017-10-04 21:01:25.176 | 2017-10-04 20:57:53 | 73 | 0.59 | 2017-10-05T03:16:03.440Z | 19 | 43 | 11 | ... | 2712352.00 | 2017-10-05T02:44:00.000Z | 147 | 8.00 | 19.00 | 9.00 | 11.00 | 39.00 | 47.00 | 31.00 |
b47452ed5f2cb6318349dfb6a3236caa0cbcdec5 | https://www.buzzfeed.com/maryanngeorgantopoulo... | New York City Police Are Investigating Whether... | 2017-10-12 14:37:25.142 | 2017-10-12 14:34:47 | 100 | 0.43 | 2017-10-12T17:51:06.310Z | 21 | 53 | 26 | ... | 2725035.00 | 2017-10-12T17:24:22.000Z | 147 | 10.00 | 19.00 | 11.00 | 11.00 | 41.00 | 51.00 | 31.00 |
c4b8b37c6ceab237e6a8a93e2e04a843eb09918a | https://www.buzzfeed.com/venessawong/sweet-hyp... | Tom Brady Is A Health Nut. He's Also An Invest... | 2017-10-18 19:46:30.979 | 2017-10-18 16:58:10 | 9 | 0.10 | 2017-10-18T19:57:08.176Z | 1 | 0 | 8 | ... | nan | NaN | 147 | 2.00 | 19.00 | 14.00 | 0.00 | 33.00 | 35.00 | 31.00 |
18ced93ccf538cac448b4fb306877cb0947f2256 | https://www.buzzfeed.com/kelseymckinney/missin... | Kelly Clarkson Isn't Afraid To Get Political | 2017-10-28 01:49:07.796 | 2017-10-27 18:55:31 | 186 | 0.30 | 2017-10-28T02:00:06.264Z | 15 | 145 | 26 | ... | 2749535.00 | 2017-10-30T00:44:00.000Z | 147 | 13.00 | 19.00 | 14.00 | 11.00 | 44.00 | 57.00 | 31.00 |
05982decb78deb49ed11c760acaef4e178c5c4d9 | https://www.buzzfeed.com/nicolenguyen/iphone-x... | Life Without A Home Button: The iPhone X Review | 2017-10-31 10:01:13.571 | 2017-10-28 18:38:14 | 1 | 0.10 | 2017-10-31T10:12:07.580Z | 0 | 0 | 1 | ... | nan | NaN | 147 | 0.00 | 19.00 | 12.00 | 0.00 | 31.00 | 31.00 | 31.00 |
38b3de48740205c6e5fcdbdc8be85b48acc19711 | https://www.buzzfeed.com/nathanieljanowitz/nar... | Narco Rap Is Hip-Hop’s Most Dangerous Game | 2017-10-08 20:24:13.762 | 2017-10-06 02:31:51 | 2 | 0.10 | 2017-10-08T20:35:08.810Z | 0 | 0 | 2 | ... | nan | NaN | 147 | 1.00 | 19.00 | 13.00 | 0.00 | 32.00 | 33.00 | 31.00 |
8d7362720fea9158facbee073354948f3b846c28 | https://www.buzzfeed.com/borzoudaragahi/us-pla... | US Plans For Victory In Afghanistan Could End ... | 2017-10-28 14:46:18.737 | 2017-10-26 18:42:07 | 2 | 0.20 | 2017-10-28T14:57:07.689Z | 0 | 0 | 2 | ... | nan | NaN | 147 | 1.00 | 19.00 | 13.00 | 0.00 | 32.00 | 33.00 | 31.00 |
0e4ad3fc3a855762723b2438525a82252f93d5d3 | https://www.buzzfeed.com/morganshanahan/workin... | Working For Harvey Weinstein Taught Me What Ra... | 2017-10-20 18:39:11.862 | 2017-10-13 23:23:48 | 6 | 0.00 | 2017-10-20T18:50:10.877Z | 0 | 1 | 5 | ... | nan | NaN | 147 | 2.00 | 19.00 | 14.00 | 0.00 | 33.00 | 35.00 | 31.00 |
fa9ad17dbcd5db37d3dfb331a1426a8a3c7e3116 | https://www.buzzfeed.com/nidhiprakash/lost-his... | Adrian Lost His Home During Maria. He's Back A... | 2017-10-01 20:52:30.257 | 2017-10-01 19:10:05 | 79 | 0.35 | 2017-10-02T05:07:11.362Z | 1 | 64 | 14 | ... | 2709223.00 | 2017-10-02T04:01:42.000Z | 147 | 9.00 | 19.00 | 10.00 | 11.00 | 40.00 | 49.00 | 31.00 |
7c39ccb58f94bdbe57fad67af14692806c347d57 | https://www.buzzfeed.com/franciswhittaker/spai... | Spain Has Said It Will Impose Direct Rule Over... | 2017-10-19 09:01:17.429 | 2017-10-19 08:47:06 | 88 | 0.79 | 2017-10-19T09:12:08.044Z | 7 | 57 | 24 | ... | 2738044.00 | 2017-10-19T09:02:40.000Z | 147 | 9.00 | 19.00 | 9.00 | 11.00 | 39.00 | 48.00 | 30.00 |
28014b7339c6f16c897d4ae416d388a57cec60de | https://www.buzzfeed.com/mollyhensleyclancy/mi... | Mike Pence's Closest Ally Is Helping The Shady... | 2017-10-25 18:10:23.537 | 2017-10-24 22:24:25 | 15 | 0.10 | 2017-10-25T19:23:03.906Z | 0 | 2 | 13 | ... | nan | NaN | 147 | 3.00 | 19.00 | 14.00 | 0.00 | 33.00 | 36.00 | 30.00 |
f5a4f3299d5df22bc47653c1ecf7d036389c835d | http://www.independent.co.uk/news/uk/crime/gro... | Grooming gangs 'are abusing girls across the c... | 2017-10-08 19:34:24.651 | 2017-10-06 17:38:44 | 0 | 0.00 | 2017-10-08T19:45:11.610Z | 0 | 0 | 0 | ... | 7610637.00 | 2017-10-09T09:08:24.000Z | 386 | 0.00 | 16.00 | 1.00 | 13.00 | 30.00 | 30.00 | 30.00 |
25 rows × 26 columns
# high response but not promoted
data.sort_values("score_diff", ascending=True).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
71e81f0fcfa22600f98d3cc4ee27936533a4dd6d | https://www.nytimes.com/2017/10/19/opinion/lup... | Lupita Nyong’o: What Harvey Weinstein Did to Me | 2017-10-19 23:10:05.465 | 2017-10-19 23:07:39.000 | 220864 | 348.36 | 2017-10-20T01:13:06.889Z | 17754 | 170298 | 32812 | ... | nan | NaN | 120 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
46291bc86b71eb9568baeacc57bb29da4aa758c4 | http://www.washingtonpost.com/video/politics/t... | Trump says he 'met with the president of the V... | 2017-10-13 15:04:21.372 | 2017-10-13 14:40:20.000 | 108872 | 120.46 | 2017-10-13T21:35:05.768Z | 39459 | 57244 | 12169 | ... | nan | NaN | 191 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
8583d374f97cb80471a866e026c2a2114ac22b11 | https://www.buzzfeed.com/karlazabludovsky/miss... | Em vez de suas medidas, as candidatas a Miss P... | 2017-10-31 12:34:10.273 | 2017-10-31 12:33:01.000 | 85716 | 100.23 | 2017-10-31T20:07:10.376Z | 1614 | 80030 | 4072 | ... | nan | NaN | 147 | 49.00 | 0.00 | 0.00 | 0.00 | 0.00 | 49.00 | -49.00 |
c49994bf9bef90ef3cf7ce862b2d09a51199cba1 | https://www.rt.com/on-air/407672-white-house-n... | White House holds news briefing | 2017-10-24 19:54:26.936 | 2017-10-24 19:54:26.936 | 177442 | 2947.13 | 2017-10-24T23:06:06.316Z | 24436 | 10728 | 142278 | ... | nan | NaN | 365 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
d6eab40c5d349770d620186b1b1bfdad866a6846 | https://www.washingtonpost.com/news/answer-she... | 9 million kids get health insurance under CHIP... | 2017-10-01 17:31:23.893 | 2017-10-01 17:26:15.000 | 111959 | 85.87 | 2017-10-02T16:20:07.039Z | 14799 | 72543 | 24617 | ... | nan | NaN | 191 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
d6bbd1d2c466964efa980539db7814a33b8c2e20 | https://www.rt.com/on-air/407022-candlelit-mar... | Candlelit march held in Barcelona in solidarit... | 2017-10-17 17:54:24.750 | 2017-10-17 17:52:46.000 | 177581 | 2902.54 | 2017-10-17T22:07:04.448Z | 24414 | 10853 | 142314 | ... | nan | NaN | 365 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
93d2e448be2afe095cdf31db88283f7d635eb3c0 | http://www.foxnews.com/us/2017/10/30/miami-art... | Miami art professor turns American flags into ... | 2017-10-30 21:49:14.134 | 2017-10-30 18:16:40.000 | 157318 | 67.36 | 2017-11-02T16:59:10.278Z | 45507 | 91575 | 20236 | ... | nan | NaN | 285 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
f9695aa2c3ffc63dd26bf44f6a0cb6db2f93c28d | http://www.foxnews.com/us/2017/10/08/vice-pres... | Vice President Mike Pence leaves Colts-49ers g... | 2017-10-08 17:39:14.984 | 2017-10-08 17:30:30.000 | 904129 | 3142.79 | 2017-10-08T19:13:11.760Z | 118524 | 749560 | 36045 | ... | nan | NaN | 285 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
0ecdbba916a0f44b8091a64825d318312ceb86e1 | https://www.nbcnews.com/storyline/hurricane-ha... | Former presidents call for unity at hurricane ... | 2017-10-22 02:54:14.794 | 2017-10-22 02:51:38.000 | 83119 | 235.52 | 2017-10-22T05:16:11.007Z | 2938 | 76430 | 3751 | ... | nan | NaN | 826 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
85451623cabf0c4f4b2a628b63c45a9c3d5223be | https://www.buzzfeed.com/juliegerstein/aparent... | Aparentemente muitos homens não estão limpando... | 2017-10-20 17:16:27.237 | 2017-10-20 17:11:08.000 | 70541 | 54.90 | 2017-10-20T19:41:08.448Z | 28801 | 37496 | 4244 | ... | nan | NaN | 147 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
dfdd5c1f0626392c60c7fc785753ac61e23a4a76 | http://www.foxnews.com/politics/2017/10/05/veg... | Vegas survivor: Shot in leg or not, I'm standi... | 2017-10-05 10:04:09.875 | 2017-10-05 09:10:49.000 | 382395 | 1039.10 | 2017-10-05T21:55:10.762Z | 31082 | 335970 | 15343 | ... | nan | NaN | 285 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
506176e75a9998207c4d4391b4a6d891624d9578 | http://www.foxnews.com/entertainment/2017/10/0... | Tom Petty rushed to hospital in full cardiac a... | 2017-10-02 19:29:16.977 | 2017-10-02 19:23:32.000 | 101301 | 870.20 | 2017-10-02T20:12:06.132Z | 26812 | 58352 | 16137 | ... | nan | NaN | 285 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
995586179a2b62c4c187efbb842341c2abbcf2ee | http://www.cnn.com/2014/10/16/health/dying-reg... | What the dying really regret | 2017-10-25 17:43:29.996 | 2017-10-25 17:38:54.000 | 75132 | 0.11 | 2017-10-26T03:04:03.689Z | 10061 | 41716 | 23355 | ... | nan | NaN | 105 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
e6eaeb88e30f277baf8afc2f97e192a3a3d27aa7 | http://www.dailymail.co.uk/health/article-4978... | FDA to make smaller condoms for US men's small... | 2017-10-13 18:49:25.881 | 2017-10-13 18:44:15.000 | 79292 | 52.82 | 2017-10-16T16:24:05.886Z | 34180 | 39561 | 5551 | ... | nan | NaN | 158 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
93a629d08de6c04301eba6fc897114bc1bf289d8 | http://www.foxnews.com/tech/2017/10/18/nurse-f... | Nurse fleeing California wildfires puts horse ... | 2017-10-18 13:29:16.707 | 2017-10-18 13:23:23.000 | 73479 | 42.46 | 2017-10-19T00:48:10.628Z | 5939 | 58527 | 9013 | ... | nan | NaN | 285 | 48.00 | 0.00 | 1.00 | 0.00 | 1.00 | 49.00 | -47.00 |
28af9b5c0c790680105df76276c159c45958a969 | https://www.independent.co.uk/life-style/gadge... | Bill Gates and Steve Jobs raised their kids te... | 2017-10-24 12:37:19.509 | 2017-10-24 12:36:56.000 | 62002 | 24.79 | 2017-10-24T21:53:09.678Z | 6642 | 42168 | 13192 | ... | nan | NaN | 386 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
c35287599493e1e4b27c3361a44f8226cd04cb70 | https://www.theguardian.com/technology/2017/oc... | Facebook moving non-promoted posts out of news... | 2017-10-23 14:16:05.569 | 2017-10-23 14:12:22.000 | 78338 | 726.25 | 2017-10-23T19:31:03.676Z | 24161 | 47065 | 7112 | ... | nan | NaN | 142 | 49.00 | 0.00 | 2.00 | 0.00 | 2.00 | 51.00 | -47.00 |
e352ce2c14cfa3c5ded9f882bb90b9da05402e2e | https://www.buzzfeed.com/tatianafarah/foi-assi... | Foi assim que ficou um terreiro de candomblé a... | 2017-10-04 12:52:17.858 | 2017-10-03 19:46:01.000 | 59119 | 172.94 | 2017-10-04T20:44:08.660Z | 4796 | 51898 | 2425 | ... | nan | NaN | 147 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
ca5664929841d69ccf73b822f25891f97e86076e | http://www.cnn.com/videos/politics/2017/10/18/... | Trump to widow: He knew what he signed up for ... | 2017-10-18 02:59:11.292 | 2017-10-18 02:47:34.000 | 83934 | 104.68 | 2017-10-18T13:22:08.291Z | 31597 | 41668 | 10669 | ... | nan | NaN | 105 | 49.00 | 0.00 | 3.00 | 0.00 | 3.00 | 52.00 | -46.00 |
e0d847f0bb693e793094a0c23a0eaf336825a13e | https://www.independent.co.uk/life-style/gadge... | Tesla is sending hundreds of battery packs to ... | 2017-10-01 12:04:25.173 | 2017-10-01 12:00:00.000 | 52437 | 58.07 | 2017-10-01T21:31:03.254Z | 1621 | 46157 | 4659 | ... | nan | NaN | 386 | 46.00 | 0.00 | 0.00 | 0.00 | 0.00 | 46.00 | -46.00 |
50a7d9a142ba7e8e5658ce49ad9f91547f0d77a9 | https://www.nytimes.com/2017/10/12/opinion/boy... | The Fake Wokeness of the Boy Scouts | 2017-10-12 19:19:10.391 | 2017-10-12 19:15:39.000 | 60892 | 29.45 | 2017-10-13T14:41:09.376Z | 20783 | 33141 | 6968 | ... | nan | NaN | 120 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
da37a2f01f73622f42e5beb27b378243862c0928 | http://www.foxnews.com/entertainment/2017/10/1... | Jane Fonda says she's not proud of America | 2017-10-17 12:04:15.903 | 2017-10-17 12:04:15.903 | 66001 | 65.07 | 2017-10-17T19:06:10.125Z | 38311 | 22952 | 4738 | ... | nan | NaN | 285 | 48.00 | 0.00 | 2.00 | 0.00 | 2.00 | 50.00 | -46.00 |
29d8cd062a9294cb9543c1241969947b11bd0607 | https://www.washingtonpost.com/news/politics/w... | Pence, set to attend today’s Indianapolis Colt... | 2017-10-08 17:29:09.707 | 2017-10-08 17:26:00.000 | 61817 | 324.48 | 2017-10-08T19:41:11.305Z | 28310 | 29868 | 3639 | ... | nan | NaN | 191 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
abbed45b5d1204302548c2fb89448d7a9ee4ed3a | https://www.nytimes.com/2017/10/24/us/politics... | Partial Transcript: Jeff Flake’s Speech on the... | 2017-10-24 19:31:12.448 | 2017-10-24 19:29:29.000 | 97127 | 306.80 | 2017-10-25T11:03:06.559Z | 22181 | 62315 | 12631 | ... | nan | NaN | 120 | 50.00 | 0.00 | 4.00 | 0.00 | 4.00 | 54.00 | -46.00 |
1c4f032c3222a7f9d02aa7ec5406f17503dd4fb8 | http://yournewswire.com/morgan-freeman-jail-hi... | Morgan Freeman: 'Jailing Hillary' Best Way To ... | 2017-10-29 19:19:22.587 | 2017-10-29 17:31:04.000 | 97780 | 41.09 | 2017-10-30T23:28:11.292Z | 18882 | 64837 | 14061 | ... | 27003.00 | 2017-10-30T13:51:01.000Z | 22568 | 50.00 | 0.00 | 1.00 | 4.00 | 5.00 | 55.00 | -45.00 |
25 rows × 26 columns
Write that data to a file. Note that the scores here are provisional for two reasons:
data.to_csv("articles_with_provisional_scores_2017-10-01_2017-10-31.csv")
The attention index of an article is comprised of four components:
Or, in other words:
\begin{align} attentionIndex_a &= leadScore_a + frontScore_a + facebookPromotionScore_a + responseScore_a \\ leadScore_a &= 20 \cdot \left(\frac{\min(minsAsLead_a, 60)}{alexaRank_a}\right) \cdot \left( \frac{\min(alexaRank)}{60} \right) \\ frontScore_a &= 15 \cdot \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \cdot \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right) \\ facebookPromotion_a &= \begin{cases} 0 \text{ if not shared on brand page }\\ 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \text{ otherwise } \end{cases} \\ responseScore_a &= \begin{cases} 0 \text{ if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} \text{ if } engagements_a > 0 \end{cases} \\ \end{align}