The idea of the attention index is to provide a score that indicates the impact of an article, and can easily be aggregated by subject, publisher or other axis.
The index comprises of two parts:
The index will be a number between 0 and 100. 50% is driven by the promotion, and 50% by response:
The promotion score should take into account:
It should be scaled based on the value of that promotion, so a popular, well-visited site should score higher than one on the fringes. And similarly a powerful, well-followed brand page should score higher than one less followed.
The response score takes into account the number of engagements on Facebook.
The rest of this notebook explores how those numbers could work, starting with the response score because that is easier, I think.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = pd.read_csv("articles_2017-08-01_2017-08-31.csv", index_col="id", \
parse_dates=["published", "discovered"])
data.head()
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
103d30e1ada75aa031d84f88af50d113184ea4ed | https://www.theguardian.com/lifeandstyle/2017/... | Break down barriers to breastfeeding in the UK | 2017-08-01 00:03:02.887 | 2017-08-01 00:00:02.000 | 294 | 0.822802 | 2017-08-01T08:32:01.637Z | 20 | 220 | 54 | The Guardian | theguardian_com | 0 | 449 | 85.0 | False | NaN | NaN | 142 |
dedd4245cfa237afcd5f543960e04ae90b26ac48 | https://www.thetimes.co.uk/article/breastfeedi... | Breastfeeding ‘should be a school subject’ | 2017-08-01 00:00:03.958 | 2017-08-01 00:00:03.958 | 41 | 0.112903 | 2017-08-01T14:42:00.635Z | 21 | 12 | 8 | The Times | thetimes_co_uk | 0 | 1380 | 238.0 | False | NaN | NaN | 6435 |
18c2ab6d72b11232f9df96b3748b26519368b810 | http://www.huffingtonpost.com/entry/washington... | Washington's Marijuana Legalization: The Kids ... | 2017-08-01 00:03:18.131 | 2017-08-01 00:00:20.436 | 122 | 0.258134 | 2017-08-01T08:32:01.645Z | 7 | 70 | 45 | HuffPost | huffingtonpost_com | 0 | 0 | NaN | False | NaN | NaN | 215 |
e798d89b959cf713fe3658c2ee296af4bc9361e5 | http://www.huffingtonpost.com/entry/an-ounce-o... | An Ounce of Love Creates a Ton of Healing | 2017-08-01 00:03:18.059 | 2017-08-01 00:00:23.485 | 4 | 0.016133 | 2017-08-01T20:56:00.086Z | 0 | 1 | 3 | HuffPost | huffingtonpost_com | 0 | 0 | NaN | False | NaN | NaN | 215 |
092977d2eacec21b414928bf580d5caeb849e9c0 | https://www.washingtonpost.com/news/retropolis... | The only communications director booted faster... | 2017-08-01 00:03:09.289 | 2017-08-01 00:00:28.000 | 1611 | 11.112903 | 2017-08-01T01:18:02.072Z | 198 | 1266 | 147 | The Washington Post | washingtonpost_com | 0 | 1020 | 74.0 | True | 5962099.0 | 2017-08-01T00:15:15.000Z | 191 |
The response score is a number between 0 and 50 that indicates the level of response to an article.
Perhaps in the future we may choose to include other factors, but for now we just include engagements on Facebook. The maximum score of 50 should be achieved by an article that does really well compared with others.
pd.options.display.float_format = '{:.2f}'.format
data.fb_engagements.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 148391.00 mean 1542.71 std 12427.44 min 0.00 50% 29.00 75% 300.00 90% 2085.00 95% 5631.50 99% 27998.00 99.5% 46698.40 99.9% 130402.22 max 2362234.00 Name: fb_engagements, dtype: float64
There's a few articles there with 1 million plus engagements, let's just double check that.
data[data.fb_engagements > 1000000]
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
478a416c259d9028be1860e8b63518fae553fa8f | http://www.huffingtonpost.com/entry/tina-fey-s... | Tina Fey Absolutely Destroys Nazis, Trump & Pa... | 2017-08-18 07:55:03.814 | 2017-08-18 07:46:45 | 1053024 | 971.27 | 2017-08-18T15:50:00.454Z | 122164 | 815305 | 115555 | HuffPost | huffingtonpost_com | 0 | 1909 | 28.00 | True | 9683181.00 | 2017-08-18T12:30:59.000Z | 215 |
01c376d7653532da2d4251733e10e3f400ca4afa | http://www.bbc.co.uk/news/entertainment-arts-4... | Sir Bruce Forsyth: TV legend dies aged 89 | 2017-08-18 15:36:07.209 | 2017-08-18 15:35:22 | 1033767 | 8477.44 | 2017-08-18T16:02:02.360Z | 125104 | 695601 | 213062 | BBC | bbc_co_uk | 0 | 1134 | 55.00 | True | 43514982.00 | 2017-08-18T15:36:50.000Z | 96 |
3a62bf1003c686e38634844245aeb7a11f1c2be7 | http://www.independent.co.uk/news/world/americ... | Trapped Mexican bakers make 'pan dulce' bread ... | 2017-08-30 20:42:20.971 | 2017-08-30 18:47:44 | 2362234 | 2191.75 | 2017-08-31T03:50:01.773Z | 80669 | 2064103 | 217462 | The Independent | independent_co_uk | 0 | 434 | 221.00 | True | 7368466.00 | 2017-08-30T23:01:00.000Z | 386 |
data.fb_engagements.mode()
0 0 dtype: int64
Going back to the enagement counts, we see the mean is 1,542, mode is zero, median is 29, 90th percentile is 2,085, 99th percentile is 27,998, 99.5th percentile is 46,698. The standard deviation is 12,427, significantly higher than the mean, so this is not a normal distribution.
We want to provide a sensible way of allocating this to the 50 buckets we have available. Let's just bucket geometrically first:
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
plt.figure(figsize=(12,4.5))
plt.hist(data.fb_engagements, bins=50)
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(median, label=f'Median ({median:,.0f})', color='red')
leg = plt.legend()
Well that's not very useful. Almost everything will score less than 0 if we just do that, which isn't a useful metric.
Let's start by excluding zeros.
non_zero_fb_enagagements = data.fb_engagements[data.fb_engagements > 0]
plt.figure(figsize=(12,4.5))
plt.hist(non_zero_fb_enagagements, bins=50)
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(median, label=f'Median ({median:,.0f})', color='red')
leg = plt.legend()
That's still a big number at the bottom, and so not a useful score.
Next, we exclude the outliers: cap at the 99.9th percentile (i.e. 119211), so that 0.1% of articles should receive the maximum score.
non_zero_fb_enagagements_without_outliers = non_zero_fb_enagagements.clip_upper(119211)
plt.figure(figsize=(12,4.5))
plt.hist(non_zero_fb_enagagements_without_outliers, bins=50)
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(median, label=f'Median ({median:,.0f})', color='red')
leg = plt.legend()
That's a bit better, but still way too clustered at the low end. Let's look at a log normal distribution.
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(12,4.5))
plt.hist(np.log(non_zero_fb_enagagements + median), bins=50)
plt.axvline(np.log(mean), linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(np.log(median), label=f'Median ({median:,.0f})', color='green')
plt.axvline(np.log(ninety), linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(np.log(ninetyfive), linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
plt.axvline(np.log(ninetynine), linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
That's looking a bit more interesting.
After some exploration, to avoid too much emphasis on the lower end of the scale, we move the numbers to the right a bit by adding on the median.
log_engagements = (non_zero_fb_enagagements
.clip_upper(data.fb_engagements.quantile(.999))
.apply(lambda x: np.log(x + median))
)
log_engagements.describe()
count 121131.00 mean 5.19 std 1.81 min 3.40 25% 3.66 50% 4.55 75% 6.26 max 11.78 Name: fb_engagements, dtype: float64
Use standard feature scaling to bring that to a 1 to 50 range
def scale_log_engagements(engagements_logged):
return np.ceil(
50 * (engagements_logged - log_engagements.min()) / (log_engagements.max() - log_engagements.min())
)
def scale_engagements(engagements):
return scale_log_engagements(np.log(engagements + median))
scaled_non_zero_engagements = scale_log_engagements(log_engagements)
scaled_non_zero_engagements.describe()
count 121131.00 mean 11.14 std 10.85 min 0.00 25% 2.00 50% 7.00 75% 18.00 max 50.00 Name: fb_engagements, dtype: float64
# add in the zeros, as zero
scaled_engagements = pd.concat([scaled_non_zero_engagements, data.fb_engagements[data.fb_engagements == 0]])
proposed = pd.DataFrame({"fb_engagements": data.fb_engagements, "response_score": scaled_engagements})
proposed.response_score.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x10faabf98>
Now look at how the shares distribute to score:
plt.figure(figsize=(15,8))
shares = np.arange(1, 60000)
plt.plot(shares, scale_engagements(shares))
plt.xlabel("shares")
plt.ylabel("score")
plt.axhline(scale_engagements(mean), linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axhline(scale_engagements(median), label=f'Median ({median:,.0f})', color='green')
plt.axhline(scale_engagements(ninety), linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axhline(scale_engagements(ninetyfive), linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
plt.axhline(scale_engagements(ninetynine), linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
plt.legend(frameon=True, shadow=True)
<matplotlib.legend.Legend at 0x1090fd1d0>
proposed.groupby("response_score").fb_engagements.agg([np.size, np.min, np.max])
size | amin | amax | |
---|---|---|---|
response_score | |||
0.00 | 35077 | 0 | 1 |
1.00 | 16107 | 2 | 6 |
2.00 | 9464 | 7 | 12 |
3.00 | 7642 | 13 | 20 |
4.00 | 5955 | 21 | 29 |
5.00 | 5346 | 30 | 40 |
6.00 | 4387 | 41 | 52 |
7.00 | 4140 | 53 | 67 |
8.00 | 3831 | 68 | 85 |
9.00 | 3495 | 86 | 106 |
10.00 | 3356 | 107 | 131 |
11.00 | 3112 | 132 | 160 |
12.00 | 3092 | 161 | 195 |
13.00 | 2775 | 196 | 235 |
14.00 | 2765 | 236 | 284 |
15.00 | 2550 | 285 | 341 |
16.00 | 2440 | 342 | 408 |
17.00 | 2409 | 409 | 488 |
18.00 | 2317 | 489 | 583 |
19.00 | 1982 | 584 | 694 |
20.00 | 2101 | 695 | 826 |
21.00 | 1858 | 827 | 983 |
22.00 | 1810 | 984 | 1167 |
23.00 | 1715 | 1168 | 1385 |
24.00 | 1645 | 1386 | 1643 |
25.00 | 1576 | 1644 | 1949 |
26.00 | 1487 | 1950 | 2309 |
27.00 | 1400 | 2310 | 2736 |
28.00 | 1371 | 2737 | 3240 |
29.00 | 1199 | 3241 | 3837 |
30.00 | 1195 | 3838 | 4542 |
31.00 | 1076 | 4543 | 5376 |
32.00 | 1075 | 5377 | 6362 |
33.00 | 942 | 6366 | 7528 |
34.00 | 803 | 7531 | 8903 |
35.00 | 601 | 8908 | 10532 |
36.00 | 556 | 10537 | 12463 |
37.00 | 565 | 12466 | 14730 |
38.00 | 563 | 14743 | 17436 |
39.00 | 422 | 17443 | 20610 |
40.00 | 406 | 20629 | 24375 |
41.00 | 346 | 24405 | 28839 |
42.00 | 276 | 28864 | 34097 |
43.00 | 231 | 34126 | 40296 |
44.00 | 211 | 40349 | 47634 |
45.00 | 142 | 47773 | 56179 |
46.00 | 129 | 56432 | 66572 |
47.00 | 99 | 66743 | 78676 |
48.00 | 94 | 78958 | 92860 |
49.00 | 58 | 93911 | 110017 |
50.00 | 197 | 110686 | 2362234 |
Looks good to me, lets save that.
data["response_score"] = proposed.response_score
The maximum of 50 points is awarded when the engagements are greater than the 99.9th percentile, rolling over the last month.
i.e. where $limit$ is the 99.5th percentile of engagements calculated over the previous month, the response score for article $a$ is:
\begin{align} basicScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ \log(\min(engagements_a,limit) + median(engagements)) & \text{if } engagements_a > 0 \end{cases} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{basicScore_a - \min(basicScore)}{\max(basicScore) - \min(basicScore)} & \text{if } engagements_a > 0 \end{cases} \\ \\ \text{The latter equation can be expanded to:} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} & \text{if } engagements_a > 0 \end{cases} \\ \end{align}The aim of the promotion score is to indicate how important the article was to the publisher, by tracking where they chose to promote it. This is a number between 0 and 50 comprised of:
The first two should be scaled by the popularity/reach of the home page, for which we use the alexa page rank as a proxy.
The last should be scaled by the popularity/reach of the brand page, for which we use the number of likes the brand page has.
data.mins_as_lead.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 148391.00 mean 9.64 std 126.32 min 0.00 50% 0.00 75% 0.00 90% 0.00 95% 0.00 99% 269.00 99.5% 590.00 99.9% 1312.66 max 21859.00 Name: mins_as_lead, dtype: float64
As expected, the vast majority of articles don't make it as lead. Let's explore how long typically publishers put something as lead for.
lead_articles = data[data.mins_as_lead > 0]
lead_articles.mins_as_lead.describe([0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 3815.00 mean 374.78 std 695.67 min 4.00 25% 99.00 50% 190.00 75% 465.00 90% 875.00 95% 1194.00 99% 2163.46 99.5% 2918.80 99.9% 6576.36 max 21859.00 Name: mins_as_lead, dtype: float64
lead_articles.mins_as_lead.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x10906b550>
For lead, it's a significant thing for an article to be lead at all, so although we want to penalise articles that were lead for a very short time, mostly we want to score the maximum even if it wasn't lead for ages. So we'll give maximum points when something has been lead for an hour.
lead_articles.mins_as_lead.clip_upper(60).plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1091091d0>
We also want to scale this by the alexa page rank, such that the maximum score of 20 points is for an article that was on the front for 4 hours for the most popular site.
So lets explore the alexa nunbers.
alexa_ranks = data.groupby(by="publisher_id").alexa_rank.mean().sort_values()
alexa_ranks
publisher_id bbc_co_uk 96 cnn_com 105 nytimes_com 120 theguardian_com 142 buzzfeed_com 147 dailymail_co_uk 158 washingtonpost_com 191 huffingtonpost_com 215 foxnews_com 285 rt_com 365 telegraph_co_uk 370 independent_co_uk 386 reuters_com 497 npr_org 594 mirror_co_uk 706 nbcnews_com 826 breitbart_com 994 ft_com 1596 economist_com 1825 indy100_com 5014 thetimes_co_uk 6435 newstatesman_com 12769 thecanary_co 15686 propublica_org 16066 yournewswire_com 22568 order-order_com 32515 anotherangryvoice_blogspot_co_uk 77827 westmonster_com 97775 evolvepolitics_com 119412 skwawkbox_org 152475 libdemvoice_org 344992 brexitcentral_com 469149 Name: alexa_rank, dtype: int64
alexa_ranks.plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10c3d1f60>
Let's try the simple option first: just divide the number of minutes as lead by the alexa rank. What's the scale of numbers we get then.
lead_proposal_1 = lead_articles.mins_as_lead.clip_upper(60) / lead_articles.alexa_rank
lead_proposal_1.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10826d358>
Looks like there's too much of a cluster around 0. Have we massively over penalised the publishers with a high alexa rank?
lead_proposal_1.groupby(data.publisher_id).mean().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10a6837f0>
Yes. Let's try taking the log of the alexa rank and see if that looks better.
lead_proposal_2 = (lead_articles.mins_as_lead.clip_upper(60) / np.log(lead_articles.alexa_rank))
lead_proposal_2.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10b80bfd0>
lead_proposal_2.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 6.00 | 5.33 | 0.00 | 5.33 | 5.33 | 5.33 | 5.33 | 5.33 |
bbc_co_uk | 97.00 | 12.83 | 1.78 | 0.88 | 13.15 | 13.15 | 13.15 | 13.15 |
breitbart_com | 205.00 | 8.36 | 1.27 | 0.72 | 8.69 | 8.69 | 8.69 | 8.69 |
brexitcentral_com | 21.00 | 4.41 | 0.84 | 0.77 | 4.59 | 4.59 | 4.59 | 4.59 |
buzzfeed_com | 259.00 | 11.80 | 1.08 | 2.81 | 12.02 | 12.02 | 12.02 | 12.02 |
cnn_com | 193.00 | 12.38 | 1.79 | 1.07 | 12.89 | 12.89 | 12.89 | 12.89 |
dailymail_co_uk | 168.00 | 11.49 | 1.56 | 0.99 | 11.85 | 11.85 | 11.85 | 11.85 |
economist_com | 39.00 | 7.31 | 2.02 | 0.53 | 7.99 | 7.99 | 7.99 | 7.99 |
evolvepolitics_com | 54.00 | 4.99 | 0.60 | 2.05 | 5.13 | 5.13 | 5.13 | 5.13 |
foxnews_com | 180.00 | 10.33 | 1.35 | 0.71 | 10.61 | 10.61 | 10.61 | 10.61 |
ft_com | 99.00 | 7.43 | 1.97 | 0.54 | 8.14 | 8.14 | 8.14 | 8.14 |
huffingtonpost_com | 171.00 | 10.74 | 1.68 | 0.74 | 11.17 | 11.17 | 11.17 | 11.17 |
independent_co_uk | 140.00 | 9.53 | 1.85 | 0.84 | 10.07 | 10.07 | 10.07 | 10.07 |
indy100_com | 61.00 | 7.04 | 0.00 | 7.04 | 7.04 | 7.04 | 7.04 | 7.04 |
libdemvoice_org | 94.00 | 4.67 | 0.18 | 3.53 | 4.71 | 4.71 | 4.71 | 4.71 |
mirror_co_uk | 359.00 | 8.45 | 1.67 | 1.37 | 9.15 | 9.15 | 9.15 | 9.15 |
nbcnews_com | 101.00 | 8.71 | 1.12 | 0.74 | 8.93 | 8.93 | 8.93 | 8.93 |
newstatesman_com | 76.00 | 6.09 | 0.98 | 1.48 | 6.35 | 6.35 | 6.35 | 6.35 |
npr_org | 156.00 | 8.95 | 1.47 | 0.63 | 9.39 | 9.39 | 9.39 | 9.39 |
nytimes_com | 53.00 | 12.51 | 0.17 | 11.28 | 12.53 | 12.53 | 12.53 | 12.53 |
order-order_com | 153.00 | 5.07 | 1.36 | 0.48 | 5.29 | 5.78 | 5.78 | 5.78 |
propublica_org | 28.00 | 6.20 | 0.00 | 6.20 | 6.20 | 6.20 | 6.20 | 6.20 |
reuters_com | 79.00 | 8.92 | 2.20 | 0.64 | 9.66 | 9.66 | 9.66 | 9.66 |
rt_com | 148.00 | 9.48 | 2.01 | 0.68 | 10.17 | 10.17 | 10.17 | 10.17 |
skwawkbox_org | 97.00 | 4.82 | 0.73 | 0.84 | 5.03 | 5.03 | 5.03 | 5.03 |
telegraph_co_uk | 103.00 | 9.95 | 1.09 | 2.37 | 10.15 | 10.15 | 10.15 | 10.15 |
thecanary_co | 204.00 | 4.84 | 1.80 | 0.93 | 3.62 | 6.21 | 6.21 | 6.21 |
theguardian_com | 148.00 | 11.44 | 1.97 | 1.01 | 12.11 | 12.11 | 12.11 | 12.11 |
thetimes_co_uk | 71.00 | 6.81 | 0.28 | 4.45 | 6.84 | 6.84 | 6.84 | 6.84 |
washingtonpost_com | 81.00 | 10.83 | 2.33 | 0.76 | 11.42 | 11.42 | 11.42 | 11.42 |
westmonster_com | 66.00 | 4.77 | 1.24 | 0.35 | 5.22 | 5.22 | 5.22 | 5.22 |
yournewswire_com | 105.00 | 5.82 | 0.46 | 3.79 | 5.99 | 5.99 | 5.99 | 5.99 |
lead_proposal_2.groupby(data.publisher_id).min().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10bf9ddd8>
That looks about right, as long as the smaller publishers were closer to zero. So let's apply feature scaling to this, to give a number between 1 and 20. (Anything not as lead will pass though as zero.)
def rescale(series):
return (series - series.min()) / (series.max() - series.min())
lead_proposal_3 = np.ceil(20 * rescale(lead_proposal_2))
lead_proposal_2.min(), lead_proposal_2.max()
(0.34811595555636582, 13.145359968846892)
lead_proposal_3.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x108410128>
lead_proposal_3.groupby(data.publisher_id).median().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10d82bf60>
data["lead_score"] = pd.concat([lead_proposal_3, data.mins_as_lead[data.mins_as_lead==0]])
data.lead_score.value_counts().sort_index()
0.00 144577 1.00 34 2.00 38 3.00 28 4.00 38 5.00 33 6.00 44 7.00 150 8.00 246 9.00 240 10.00 230 11.00 148 12.00 62 13.00 107 14.00 577 15.00 215 16.00 360 17.00 335 18.00 237 19.00 373 20.00 319 Name: lead_score, dtype: int64
data.lead_score.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 20.00 breitbart_com 14.00 brexitcentral_com 7.00 buzzfeed_com 19.00 cnn_com 20.00 dailymail_co_uk 18.00 economist_com 12.00 evolvepolitics_com 8.00 foxnews_com 17.00 ft_com 13.00 huffingtonpost_com 17.00 independent_co_uk 16.00 indy100_com 11.00 libdemvoice_org 7.00 mirror_co_uk 14.00 nbcnews_com 14.00 newstatesman_com 10.00 npr_org 15.00 nytimes_com 20.00 order-order_com 9.00 propublica_org 10.00 reuters_com 15.00 rt_com 16.00 skwawkbox_org 8.00 telegraph_co_uk 16.00 thecanary_co 10.00 theguardian_com 19.00 thetimes_co_uk 11.00 washingtonpost_com 18.00 westmonster_com 8.00 yournewswire_com 9.00 Name: lead_score, dtype: float64
In summary then, score for article $a$ is:
$$ unscaledLeadScore_a = \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)}\\ leadScore_a = 19 \cdot \frac{unscaledLeadScore_a - \min(unscaledLeadScore)} {\max(unscaledLeadScore) - \min(unscaledLeadScore)} + 1 $$Since the minium value of $minsAsLead$ is 1, $\min(unscaledLeadScore)$ is pretty insignificant. So we can simplify this to:
$$ leadScore_a = 20 \cdot \frac{unscaledLeadScore_a } {\max(unscaledLeadScore)} $$or:
$$ leadScore_a = 20 \cdot \frac{\frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} } {\frac{60}{\log(\max(alexaRank))}} $$$$ leadScore_a = \left( 20 \cdot \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} \cdot {\frac{\log(\max(alexaRank))}{60}} \right) $$This is similar to time as lead, so lets try doing the same calculation, except we also want to factor in the number of slots on the front:
$$frontScore_a = 15 \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right)$$(data.alexa_rank * data.num_articles_on_front).min() / 1440
2.4500000000000002
time_on_front_proposal_1 = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15)
time_on_front_proposal_1.plot.hist(figsize=(15, 7), bins=15)
<matplotlib.axes._subplots.AxesSubplot at 0x10e9307f0>
time_on_front_proposal_1.value_counts().sort_index()
1.00 65394 2.00 8125 3.00 5301 4.00 4167 5.00 1027 6.00 646 7.00 498 8.00 486 9.00 818 10.00 296 11.00 287 12.00 385 13.00 209 14.00 70 15.00 14 dtype: int64
time_on_front_proposal_1.groupby(data.publisher_id).sum()
publisher_id anotherangryvoice_blogspot_co_uk 6.00 bbc_co_uk 15580.00 breitbart_com 2711.00 brexitcentral_com 23.00 buzzfeed_com 10173.00 cnn_com 12205.00 dailymail_co_uk 13680.00 economist_com 270.00 evolvepolitics_com 68.00 foxnews_com 8774.00 ft_com 2823.00 huffingtonpost_com 8296.00 independent_co_uk 4425.00 indy100_com 528.00 libdemvoice_org 94.00 mirror_co_uk 9846.00 nbcnews_com 1819.00 newstatesman_com 468.00 npr_org 2727.00 nytimes_com 9632.00 order-order_com 157.00 propublica_org 44.00 reuters_com 6395.00 rt_com 4161.00 skwawkbox_org 99.00 telegraph_co_uk 6219.00 thecanary_co 221.00 theguardian_com 12530.00 thetimes_co_uk 8663.00 washingtonpost_com 9457.00 westmonster_com 290.00 yournewswire_com 222.00 dtype: float64
That looks good to me.
data["front_score"] = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15).fillna(0)
data.front_score
id 103d30e1ada75aa031d84f88af50d113184ea4ed 2.00 dedd4245cfa237afcd5f543960e04ae90b26ac48 1.00 18c2ab6d72b11232f9df96b3748b26519368b810 0.00 e798d89b959cf713fe3658c2ee296af4bc9361e5 0.00 092977d2eacec21b414928bf580d5caeb849e9c0 3.00 cbe715858a280b4bec6f036f1adc25d189583ca5 1.00 a7205ddbe6b5245c1299437ff9e1bb886793d3f5 1.00 70dd8ba597cd8967bfee062350c97bc0290f9101 3.00 b0675b893643adfe537b7fdf9cf1a97dccdecbcb 3.00 8408c1851ad0eb71d942fae8826833d358feadf0 1.00 4ef338521090ae5c38420602464ef508fd02669f 0.00 089fffad87b278a4475608a6a592fc2740d9de1b 0.00 dbad7af8248f83bb8cd22260a1e1a4663cb077f3 0.00 be6cdcf31d2030055884d90920f3ccd74737448f 1.00 559c71962376c3643bf92498241c33087b01a9af 0.00 fcf8c33726c529f20fa4df003114f63741aa7f1a 0.00 ba91b5f7e9f278b1e9dc41e8a779d3ef3503f5b8 1.00 741174f3b8d6ad5b55171293f21896571d39d65b 3.00 3b3c880b86419cad71dd3984d78f74dff8b0e3c2 0.00 5542902dc091087011443a4bfe830fd4ab350972 1.00 c7571e74bfc10e822d686e0bc5cc253245e6376f 3.00 f96a4be826d39298ec3d9de2aa353128cf174c81 1.00 c155979d2df3437ceb476c35827c7ac74c82c212 0.00 310d932c897ff805d05bfd82ccbae4958e283154 0.00 93f6cf71dfd5b7695977a1f0678eb46b5e23c458 0.00 c31462dbc917793c2a8b207f2bdb03ab86b5f559 0.00 e7defa945afafadf4b07170c4699bcb23084b68e 1.00 d6da6736e5cfb6f188fecb96896d442338f9357d 2.00 7ff739ee4f3b267e57fda44693cfec76cfe3c60f 0.00 4d399c12dc6991996090884fe94dafc17942bacd 0.00 ... 88088f52ea4a0335c60f014cbfffb3014b57c39d 1.00 77be73c2d6d79d373d621405cc809cd8d91aa537 0.00 ba83628125d0abfeb37479dd3400ade6aa70a6d3 1.00 cd8784b24f847f8aeb24df9343b8251db38587ca 1.00 770d10c8666f5951a87651d6c5ac5ab20fd289bb 1.00 b96b7b0ac8401a3bf864d3475ef1f5bf1d059ccb 4.00 5a525512b43737f09e672ec188dc2a3fe84d4da0 1.00 bdbec64c39b129d37e36a63acd3df574bab8ad13 2.00 3ee9f77b11ee9875dc648fa6921a68233560afee 0.00 3a3f02c7865c743ab8535ea3900776c320d4f763 0.00 3d8cfec0612656eca397dc6a065e4b2fa24474ce 0.00 e1b81f9f159a50ee87b54b144289d518f6c0d7d9 11.00 b597762e798241ea1a5b3539cf67c33ec7036227 0.00 e950ed8e42c9afc3f36164c18e9e1103e4a3639b 3.00 2edb3930c982349109970498a465f51688b47440 6.00 ffc8942d80078db665c60d34c4fcca0d45bcb35c 1.00 61d2c71d22072cf56e5131e65c60bc7a29421561 0.00 15b6d9d0497ac4c931925be571400036693bab40 0.00 c8bd5bc0e73962ff84cd7a1f4e2b2a3d85453cbd 1.00 560980f8907258fabf6c2ae2dfba116a4970c629 1.00 e5687aeb20b4f4310aa0c9d89a7ba37bf362f63c 1.00 2f6933177b8ad230eff56a85bd0aceb65f27e6d0 4.00 4744dee9e500e0981c1dce7cdb26d79bfaf5de60 1.00 f7bcf3e1e1f833643709a7c36c020b3e3c3fe3d8 2.00 744aca0e14a5d36a2d8441ce96c98fb71ea947ed 0.00 867e9f70464bdf4f428dcfda83c8782629403461 0.00 033414864ddc21c06b87d3515e01b89e66a1b0c1 1.00 a452e40c6f5ed9e609f35bbad66aaf7da60cc389 3.00 d26c5e600e9a5983fd72dfe66b648f9c4b196af9 0.00 8bc9b8238e93fc592e1e3be8911c284b0f536c4c 0.00 Name: front_score, Length: 148391, dtype: float64
One way a publisher has of promoting content is to post to their brand page. The significance of doing so is stronger when the brand page has more followers (likes).
$$ facebookPromotionProposed1_a = 15 \left( \frac {brandPageLikes_a} {\max(brandPageLikes)} \right) $$Now lets explore the data to see if that makes sense. tr;dr the formula above is incorrect
data.fb_brand_page_likes.max()
43749031.0
facebook_promotion_proposed_1 = np.ceil((15 * (data.fb_brand_page_likes / data.fb_brand_page_likes.max())).fillna(0))
facebook_promotion_proposed_1.value_counts().sort_index().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x10eacad68>
facebook_promotion_proposed_1.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 6.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
bbc_co_uk | 12344.00 | 0.65 | 3.07 | 0.00 | 0.00 | 0.00 | 0.00 | 15.00 |
breitbart_com | 2778.00 | 0.79 | 0.98 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
brexitcentral_com | 23.00 | 0.78 | 0.42 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
buzzfeed_com | 2006.00 | 0.22 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
cnn_com | 3733.00 | 2.54 | 4.35 | 0.00 | 0.00 | 0.00 | 10.00 | 10.00 |
dailymail_co_uk | 24170.00 | 0.58 | 1.60 | 0.00 | 0.00 | 0.00 | 0.00 | 5.00 |
economist_com | 533.00 | 1.86 | 1.46 | 0.00 | 0.00 | 3.00 | 3.00 | 3.00 |
evolvepolitics_com | 68.00 | 0.94 | 0.24 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
foxnews_com | 6935.00 | 0.64 | 1.86 | 0.00 | 0.00 | 0.00 | 0.00 | 6.00 |
ft_com | 3002.00 | 0.66 | 0.94 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
huffingtonpost_com | 11210.00 | 0.48 | 1.30 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
independent_co_uk | 6236.00 | 0.67 | 1.25 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
indy100_com | 528.00 | 0.46 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 |
libdemvoice_org | 95.00 | 0.99 | 0.10 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
mirror_co_uk | 10541.00 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
nbcnews_com | 1891.00 | 2.08 | 2.00 | 0.00 | 0.00 | 4.00 | 4.00 | 4.00 |
newstatesman_com | 472.00 | 0.74 | 0.44 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
npr_org | 2049.00 | 1.46 | 1.50 | 0.00 | 0.00 | 0.00 | 3.00 | 3.00 |
nytimes_com | 4857.00 | 1.54 | 2.31 | 0.00 | 0.00 | 0.00 | 5.00 | 5.00 |
order-order_com | 158.00 | 0.78 | 0.41 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
propublica_org | 46.00 | 0.85 | 0.36 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
reuters_com | 5228.00 | 0.65 | 0.94 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
rt_com | 2160.00 | 1.06 | 1.00 | 0.00 | 0.00 | 2.00 | 2.00 | 2.00 |
skwawkbox_org | 99.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
telegraph_co_uk | 7340.00 | 0.56 | 0.90 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
thecanary_co | 225.00 | 0.98 | 0.15 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
theguardian_com | 8376.00 | 0.53 | 1.14 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
thetimes_co_uk | 8693.00 | 0.06 | 0.23 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
washingtonpost_com | 22061.00 | 0.21 | 0.77 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
westmonster_com | 306.00 | 0.27 | 0.45 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 |
yournewswire_com | 222.00 | 0.06 | 0.24 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
That's too much variation: sites like the Guardian, which have a respectable 7.5m likes, should not be scoring a 3. Lets try applying a log to it, and then standard feature scaling again.
data.fb_brand_page_likes.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 330139.00 bbc_co_uk 43749031.00 breitbart_com 3603063.00 brexitcentral_com 8877.00 buzzfeed_com 2664517.00 cnn_com 28192524.00 dailymail_co_uk 12290016.00 economist_com 8257357.00 evolvepolitics_com 111989.00 foxnews_com 15707378.00 ft_com 3630731.00 huffingtonpost_com 9712213.00 independent_co_uk 7387560.00 indy100_com 219441.00 libdemvoice_org 8590.00 mirror_co_uk 2836406.00 nbcnews_com 9198972.00 newstatesman_com 154139.00 npr_org 6146183.00 nytimes_com 14517296.00 order-order_com 44064.00 propublica_org 348034.00 reuters_com 3848086.00 rt_com 4563650.00 skwawkbox_org 5007.00 telegraph_co_uk 4329376.00 thecanary_co 154553.00 theguardian_com 7669393.00 thetimes_co_uk 687493.00 washingtonpost_com 5999523.00 westmonster_com 13110.00 yournewswire_com 26183.00 Name: fb_brand_page_likes, dtype: float64
np.log(2149)
7.6727578966425103
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max())
publisher_id anotherangryvoice_blogspot_co_uk 12.71 bbc_co_uk 17.59 breitbart_com 15.10 brexitcentral_com 9.09 buzzfeed_com 14.80 cnn_com 17.15 dailymail_co_uk 16.32 economist_com 15.93 evolvepolitics_com 11.63 foxnews_com 16.57 ft_com 15.10 huffingtonpost_com 16.09 independent_co_uk 15.82 indy100_com 12.30 libdemvoice_org 9.06 mirror_co_uk 14.86 nbcnews_com 16.03 newstatesman_com 11.95 npr_org 15.63 nytimes_com 16.49 order-order_com 10.69 propublica_org 12.76 reuters_com 15.16 rt_com 15.33 skwawkbox_org 8.52 telegraph_co_uk 15.28 thecanary_co 11.95 theguardian_com 15.85 thetimes_co_uk 13.44 washingtonpost_com 15.61 westmonster_com 9.48 yournewswire_com 10.17 Name: fb_brand_page_likes, dtype: float64
That's more like it, but the lower numbers should be smaller.
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max() / 1000)
publisher_id anotherangryvoice_blogspot_co_uk 5.80 bbc_co_uk 10.69 breitbart_com 8.19 brexitcentral_com 2.18 buzzfeed_com 7.89 cnn_com 10.25 dailymail_co_uk 9.42 economist_com 9.02 evolvepolitics_com 4.72 foxnews_com 9.66 ft_com 8.20 huffingtonpost_com 9.18 independent_co_uk 8.91 indy100_com 5.39 libdemvoice_org 2.15 mirror_co_uk 7.95 nbcnews_com 9.13 newstatesman_com 5.04 npr_org 8.72 nytimes_com 9.58 order-order_com 3.79 propublica_org 5.85 reuters_com 8.26 rt_com 8.43 skwawkbox_org 1.61 telegraph_co_uk 8.37 thecanary_co 5.04 theguardian_com 8.94 thetimes_co_uk 6.53 washingtonpost_com 8.70 westmonster_com 2.57 yournewswire_com 3.27 Name: fb_brand_page_likes, dtype: float64
scaled_fb_brand_page_likes = (data.fb_brand_page_likes / 1000)
facebook_promotion_proposed_2 = np.ceil(\
(15 * \
(np.log(scaled_fb_brand_page_likes) / np.log(scaled_fb_brand_page_likes.max()))\
)\
).fillna(0)
facebook_promotion_proposed_2.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 9.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 4.00 buzzfeed_com 12.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 7.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 8.00 libdemvoice_org 4.00 mirror_co_uk 12.00 nbcnews_com 13.00 newstatesman_com 8.00 npr_org 13.00 nytimes_com 14.00 order-order_com 6.00 propublica_org 9.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 3.00 telegraph_co_uk 12.00 thecanary_co 8.00 theguardian_com 13.00 thetimes_co_uk 10.00 washingtonpost_com 13.00 westmonster_com 4.00 yournewswire_com 5.00 Name: fb_brand_page_likes, dtype: float64
LGTM. So the equation is
$$ facebookPromotion_a = 15 \left( \frac {\log(\frac {brandPageLikes_a}{1000})} {\log(\frac {\max(brandPageLikes)}{1000}))} \right) $$Now, let's try applying standard feature scaling approch to this, rather than using a magic number of 1,000. That equation would be:
\begin{align} unscaledFacebookPromotion_a &= \log(brandPageLikes_a) \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \min(unscaledFacebookPromotion)}{\max(unscaledFacebookPromotion) - \min(unscaledFacebookPromotion)} \\ \\ \text{The scaling can be simplified to:} \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \\ \\ \text{Meaning the overall equation becomes:} \\ facebookPromotion_a &= 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \end{align}facebook_promotion_proposed_3 = np.ceil(
(14 *
(
(np.log(data.fb_brand_page_likes) - np.log(data.fb_brand_page_likes.min()) ) /
(np.log(data.fb_brand_page_likes.max()) - np.log(data.fb_brand_page_likes.min()))
)
) + 1
)
facebook_promotion_proposed_3.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 2.00 buzzfeed_com 11.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 6.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 7.00 libdemvoice_org 2.00 mirror_co_uk 11.00 nbcnews_com 13.00 newstatesman_com 7.00 npr_org 12.00 nytimes_com 14.00 order-order_com 5.00 propublica_org 8.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 2.00 telegraph_co_uk 12.00 thecanary_co 7.00 theguardian_com 13.00 thetimes_co_uk 9.00 washingtonpost_com 12.00 westmonster_com 3.00 yournewswire_com 4.00 Name: fb_brand_page_likes, dtype: float64
data["facebook_promotion_score"] = facebook_promotion_proposed_3.fillna(0.0)
data["promotion_score"] = (data.lead_score + data.front_score + data.facebook_promotion_score)
data["attention_index"] = (data.promotion_score + data.response_score)
data.promotion_score.plot.hist(bins=np.arange(50), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x11120ec50>
data.attention_index.plot.hist(bins=np.arange(100), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x110165160>
data.attention_index.value_counts().sort_index()
0.00 22256 1.00 19523 2.00 12051 3.00 8175 4.00 6257 5.00 5327 6.00 4421 7.00 3790 8.00 3578 9.00 3139 10.00 2826 11.00 2662 12.00 2523 13.00 2470 14.00 2380 15.00 2117 16.00 2063 17.00 1951 18.00 1878 19.00 1656 20.00 1591 21.00 1587 22.00 1487 23.00 1591 24.00 1406 25.00 1490 26.00 1423 27.00 1381 28.00 1286 29.00 1350 ... 65.00 118 66.00 130 67.00 76 68.00 82 69.00 72 70.00 68 71.00 50 72.00 52 73.00 39 74.00 42 75.00 26 76.00 28 77.00 25 78.00 30 79.00 27 80.00 20 81.00 21 82.00 25 83.00 16 84.00 13 85.00 10 86.00 18 87.00 14 88.00 13 89.00 4 90.00 7 91.00 7 92.00 2 93.00 4 94.00 4 Name: attention_index, Length: 95, dtype: int64
# and lets see the articles with the biggest attention index
data.sort_values("attention_index", ascending=False)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
5bc8df6cf7ba6fda075fdec7a8a27e5b00a054fb | http://www.bbc.co.uk/news/world-europe-40965581 | Injuries as van hits crowds in Barcelona | 2017-08-17 15:21:04.046 | 2017-08-17 15:19:47.000 | 173441 | 675.67 | 2017-08-17T17:10:00.551Z | 35777 | 107789 | 29875 | ... | True | 43501654.00 | 2017-08-17T21:03:24.000Z | 96 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
f9a431a718aea4611db8680cb9a2b32935932221 | https://www.buzzfeed.com/anupkaphle/south-asia... | South Asia Is Also Experiencing The Worst Floo... | 2017-08-29 10:21:13.664 | 2017-08-29 09:38:35.000 | 405094 | 182.59 | 2017-08-30T01:03:59.596Z | 29194 | 303992 | 71908 | ... | True | 2663443.00 | 2017-09-03T00:10:00.000Z | 147 | 50.00 | 19.00 | 14.00 | 11.00 | 44.00 | 94.00 |
9d3d837c118469c6be614d62b79ccc67d4222ed6 | http://www.cnn.com/2017/08/20/asia/us-navy-des... | US destroyer collides with merchant ship near ... | 2017-08-21 00:03:14.430 | 2017-08-20 23:58:37.000 | 137608 | 267.58 | 2017-08-21T02:27:59.624Z | 42828 | 72468 | 22312 | ... | True | 28047682.00 | 2017-08-21T00:30:29.000Z | 105 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
2f7b62a0d2a4675bfaa7b32ec314ce5acf001ac9 | http://www.cnn.com/2017/08/12/us/charlottesvil... | Torch-bearing white nationalists march ahead o... | 2017-08-12 06:27:27.189 | 2017-08-12 06:22:15.000 | 422742 | 704.31 | 2017-08-12T16:41:59.688Z | 117325 | 250533 | 54884 | ... | True | 28007316.00 | 2017-08-12T16:32:58.000Z | 105 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
5c22bca605c658ebd6d1afbadd26884feb2f2fb4 | https://www.buzzfeed.com/davidmack/ehs-cheerle... | Police Are Investigating This Video Of A Teen ... | 2017-08-24 15:30:12.427 | 2017-08-24 15:04:18.000 | 117849 | 170.40 | 2017-08-24T17:33:59.839Z | 35187 | 70329 | 12333 | ... | True | 2649391.00 | 2017-08-24T23:34:00.000Z | 147 | 50.00 | 19.00 | 13.00 | 11.00 | 43.00 | 93.00 |
8e3d4660760facbf3164d02034b7b726ffd7716b | https://www.buzzfeed.com/kristinharris/taylor-... | Taylor Swift Just Announced Her New Album And ... | 2017-08-23 18:05:10.321 | 2017-08-23 17:02:32.000 | 112303 | 878.50 | 2017-08-23T19:06:02.548Z | 21292 | 82021 | 8990 | ... | True | 2646799.00 | 2017-08-23T18:02:40.000Z | 147 | 50.00 | 19.00 | 13.00 | 11.00 | 43.00 | 93.00 |
2a7d9f5e137fe37f993840df08c956a63ea8a9f1 | http://www.cnn.com/2017/08/25/us/hurricane-har... | Hurricane Harvey strengthens to Category 2 | 2017-08-25 06:18:13.423 | 2017-08-25 06:15:26.000 | 168217 | 305.26 | 2017-08-25T16:54:00.927Z | 38640 | 111205 | 18372 | ... | True | 28100799.00 | 2017-08-25T06:45:02.000Z | 105 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
f5f4bbb52b32df81f30e9f83cb79fd090c90b048 | http://www.cnn.com/2017/08/31/us/hurricane-irm... | Powerful Hurricane Irma could be next weather ... | 2017-08-31 23:09:18.131 | 2017-08-31 23:04:08.000 | 155332 | 295.08 | 2017-09-01T03:14:00.674Z | 30702 | 95261 | 29369 | ... | True | 28173209.00 | 2017-09-01T03:00:46.000Z | 105 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
513f414d516e4d00495c7d9bf28359787432cbd5 | http://www.cnn.com/2017/08/03/politics/mueller... | One year into the FBI's Russia investigation, ... | 2017-08-03 20:03:27.148 | 2017-08-03 19:59:07.000 | 96561 | 427.50 | 2017-08-03T21:16:03.290Z | 19602 | 66142 | 10817 | ... | True | 27957228.00 | 2017-08-03T20:07:04.000Z | 105 | 49.00 | 20.00 | 8.00 | 15.00 | 43.00 | 92.00 |
112452b06d5c5258740297738f9e06d00aa9c9b5 | http://www.cnn.com/2017/08/20/entertainment/je... | Comedian Jerry Lewis dies at 91, publicist says | 2017-08-20 18:36:11.103 | 2017-08-20 18:33:48.000 | 79243 | 145.03 | 2017-08-20T19:14:01.425Z | 9078 | 52326 | 17839 | ... | True | 28046550.00 | 2017-08-20T18:36:25.000Z | 105 | 48.00 | 20.00 | 9.00 | 15.00 | 44.00 | 92.00 |
792373740c48788621261190c2aa513bcbb30584 | http://www.cnn.com/2017/08/18/politics/heather... | Mother of Charlottesville victim says she won'... | 2017-08-18 12:30:19.614 | 2017-08-18 12:23:01.000 | 89112 | 280.08 | 2017-08-18T13:46:01.819Z | 16659 | 65132 | 7321 | ... | True | 28038684.00 | 2017-08-18T13:00:13.000Z | 105 | 48.00 | 20.00 | 8.00 | 15.00 | 43.00 | 91.00 |
ff28c43982b9fbace7a7037fedee1ee77de2c288 | http://www.cnn.com/2017/08/23/politics/donald-... | Exclusive: Top Trump aide's email draws new sc... | 2017-08-23 23:08:52.155 | 2017-08-23 23:00:30.000 | 70927 | 727.59 | 2017-08-24T01:10:00.187Z | 11787 | 48412 | 10728 | ... | True | 28096369.00 | 2017-08-23T23:17:40.000Z | 105 | 47.00 | 20.00 | 9.00 | 15.00 | 44.00 | 91.00 |
7e11b41f3b01e4600644f96180e797a0ee4d8fa2 | http://www.cnn.com/2017/08/25/politics/sheriff... | Trump pardons former Sheriff Joe Arpaio | 2017-08-26 00:12:13.114 | 2017-08-26 00:08:53.000 | 367691 | 1471.54 | 2017-08-26T00:38:01.696Z | 109735 | 227193 | 30763 | ... | True | 28106048.00 | 2017-08-26T00:17:16.000Z | 105 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
814b1b017e3be7459408e36be2a5206f4c348631 | http://www.bbc.co.uk/news/world-us-canada-4094... | Trump stance on Charlottesville violence anger... | 2017-08-16 10:03:07.016 | 2017-08-16 10:00:31.000 | 62964 | 149.17 | 2017-08-17T01:58:00.704Z | 3159 | 56289 | 3516 | ... | True | 43487706.00 | 2017-08-17T01:00:32.000Z | 96 | 46.00 | 20.00 | 10.00 | 15.00 | 45.00 | 91.00 |
9157b50690c17e478aae8c0cbbfbb9c3db6b038c | https://www.buzzfeed.com/tahliapritchard/wtf-l... | Anna Faris And Chris Pratt Have Separated And ... | 2017-08-07 04:15:07.564 | 2017-08-07 03:51:14.000 | 394990 | 1458.83 | 2017-08-07T04:28:02.423Z | 96493 | 268928 | 29569 | ... | True | 2628374.00 | 2017-08-07T04:12:23.000Z | 147 | 50.00 | 19.00 | 11.00 | 11.00 | 41.00 | 91.00 |
f52d21c6a22899376b57b365c99a4523b7cad841 | http://www.cnn.com/2017/08/19/us/uss-indianapo... | USS Indianapolis wreckage found after 72 years | 2017-08-19 21:48:54.983 | 2017-08-19 21:42:53.000 | 74720 | 143.53 | 2017-08-20T09:04:00.053Z | 8148 | 54663 | 11909 | ... | True | 28043784.00 | 2017-08-19T23:00:09.000Z | 105 | 47.00 | 20.00 | 9.00 | 15.00 | 44.00 | 91.00 |
01df60472742a7c78b459655420a8853276dc27d | http://www.cnn.com/2017/08/15/politics/trump-n... | A Trump meltdown for the ages | 2017-08-16 00:09:12.292 | 2017-08-16 00:04:13.000 | 91711 | 122.42 | 2017-08-16T02:36:01.474Z | 25210 | 57318 | 9183 | ... | True | 28025271.00 | 2017-08-16T07:31:02.000Z | 105 | 48.00 | 20.00 | 8.00 | 15.00 | 43.00 | 91.00 |
4ec50c8407848cb16f316e3e0c2e14aaa304d179 | http://www.cnn.com/2017/08/12/us/charlottesvil... | Charlottesville Car Crash Suspect ID'd | 2017-08-13 00:48:16.687 | 2017-08-13 00:44:39.000 | 58241 | 386.00 | 2017-08-13T01:26:00.541Z | 18589 | 33135 | 6517 | ... | True | 28009351.00 | 2017-08-13T01:13:53.000Z | 105 | 46.00 | 20.00 | 9.00 | 15.00 | 44.00 | 90.00 |
b22459b17924232209165651ca966817e62bae0d | http://www.huffingtonpost.com/entry/mother-of-... | Mother Of Charlottesville Victim Heather Heyer... | 2017-08-13 17:18:10.345 | 2017-08-13 17:13:55.007 | 362776 | 1219.61 | 2017-08-13T18:31:59.604Z | 18095 | 319256 | 25425 | ... | True | 9667599.00 | 2017-08-13T18:15:02.000Z | 215 | 50.00 | 17.00 | 10.00 | 13.00 | 40.00 | 90.00 |
eb1ea0977d72adf0202b9b664a4a654518e1ce9d | http://www.huffingtonpost.com/entry/boston-ral... | Provocative ‘Free Speech’ Rally In Boston Rail... | 2017-08-19 15:18:14.650 | 2017-08-19 15:12:13.499 | 179591 | 984.25 | 2017-08-19T18:46:01.039Z | 13594 | 156017 | 9980 | ... | True | 9688346.00 | 2017-08-19T15:30:14.000Z | 215 | 50.00 | 17.00 | 10.00 | 13.00 | 40.00 | 90.00 |
6e2393cf8bac39690bf4b94992ccfba6db274864 | http://money.cnn.com/2017/08/01/media/rod-whee... | Lawsuit: Fox News concocted Seth Rich story wi... | 2017-08-01 15:05:04.309 | 2017-08-01 14:57:38.000 | 81594 | 6806.29 | 2017-08-25T05:16:00.976Z | 23323 | 42758 | 15513 | ... | True | 27943487.00 | 2017-08-01T15:14:04.000Z | 105 | 48.00 | 20.00 | 7.00 | 15.00 | 42.00 | 90.00 |
a5a37e29d0b2e7640e19f583868777c1493555e3 | http://www.cnn.com/2017/08/17/europe/barcelona... | Barcelona: Van hits crowd near Las Ramblas tou... | 2017-08-17 15:30:14.333 | 2017-08-17 15:25:03.000 | 111971 | 318.36 | 2017-08-18T15:08:00.299Z | 19298 | 75050 | 17623 | ... | True | 28034233.00 | 2017-08-17T15:54:05.000Z | 105 | 50.00 | 20.00 | 5.00 | 15.00 | 40.00 | 90.00 |
e2e204c9d6597400ca2e5de9814a6fda4c4bfe70 | http://www.bbc.co.uk/news/world-latin-america-... | Brazil opens vast Amazon reserve to mining | 2017-08-24 01:00:02.079 | 2017-08-24 00:58:16.000 | 59106 | 63.31 | 2017-08-24T14:41:14.857Z | 5037 | 48396 | 5673 | ... | True | 43604821.00 | 2017-08-24T10:31:05.000Z | 96 | 46.00 | 20.00 | 9.00 | 15.00 | 44.00 | 90.00 |
f828a5f05babd9e0649d18d84b772da0e7ef8401 | https://www.buzzfeed.com/katebubacz/17-photos-... | 17 Photos Show Just How Bad The Flooding In Ho... | 2017-08-29 13:36:12.544 | 2017-08-29 13:35:39.000 | 517210 | 379.25 | 2017-08-31T09:05:59.786Z | 78481 | 363072 | 75657 | ... | True | 2658046.00 | 2017-08-29T19:47:00.000Z | 147 | 50.00 | 19.00 | 10.00 | 11.00 | 40.00 | 90.00 |
098aede0a3d2b2599c5b811ccd9753b831bad17e | http://www.bbc.co.uk/news/world-us-canada-4098... | Top White House aide Bannon out | 2017-08-18 16:57:02.011 | 2017-08-18 16:55:20.000 | 71003 | 710.07 | 2017-08-18T17:10:02.373Z | 14694 | 48131 | 8178 | ... | True | 43515983.00 | 2017-08-18T16:59:10.000Z | 96 | 47.00 | 20.00 | 7.00 | 15.00 | 42.00 | 89.00 |
2fce23ab50ee1745425b42e66c742e9825e48a23 | http://www.cnn.com/2017/08/23/politics/james-c... | James Clapper questions Trump's fitness for of... | 2017-08-23 05:12:20.942 | 2017-08-23 05:09:07.000 | 78220 | 246.93 | 2017-08-23T13:26:01.041Z | 11993 | 55529 | 10698 | ... | True | 28095696.00 | 2017-08-23T10:00:43.000Z | 105 | 47.00 | 20.00 | 7.00 | 15.00 | 42.00 | 89.00 |
a0bdb495002919e09d1914f3bb24dcf4d5cbc08d | https://www.buzzfeed.com/franciswhittaker/hund... | Hundreds Of Torch-Wielding White Nationalists ... | 2017-08-12 11:24:16.334 | 2017-08-12 11:13:01.000 | 188331 | 558.53 | 2017-08-12T17:41:59.747Z | 31255 | 144074 | 13002 | ... | True | 2637450.00 | 2017-08-12T12:21:00.000Z | 147 | 50.00 | 19.00 | 9.00 | 11.00 | 39.00 | 89.00 |
31ef7cec6f7147227efec4dd20fc6b8a41a10335 | http://www.cnn.com/2017/08/08/politics/trump-r... | Trump retweets Fox News story containing class... | 2017-08-08 14:45:16.587 | 2017-08-08 14:43:33.000 | 60213 | 207.37 | 2017-08-08T16:48:00.411Z | 17783 | 32555 | 9875 | ... | True | 27979804.00 | 2017-08-08T16:30:54.000Z | 105 | 46.00 | 20.00 | 8.00 | 15.00 | 43.00 | 89.00 |
2d0ddfd661c23998c276541a016b256730b8ca1b | http://www.cnn.com/2017/08/09/politics/north-k... | North Korea 'seriously examining' a strike dir... | 2017-08-09 22:51:15.957 | 2017-08-09 22:46:41.000 | 46202 | 69.01 | 2017-08-10T00:18:01.784Z | 15566 | 23075 | 7561 | ... | True | 27991392.00 | 2017-08-10T07:00:02.000Z | 105 | 44.00 | 20.00 | 9.00 | 15.00 | 44.00 | 88.00 |
be905b7cba28714bc3ad434ada164772afed06de | https://www.nytimes.com/2017/08/01/us/politics... | Justice Dept. to Take On Affirmative Action in... | 2017-08-02 00:09:00.463 | 2017-08-02 00:06:27.000 | 292124 | 663.59 | 2017-08-02T01:10:00.882Z | 71964 | 173472 | 46688 | ... | True | 14358478.00 | 2017-08-02T00:13:17.000Z | 120 | 50.00 | 20.00 | 4.00 | 14.00 | 38.00 | 88.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ea230b14069c1fe526e7f42d4f4ba7c38bbc88cf | http://www.dailymail.co.uk/tvshowbiz/article-4... | Roxy Jacenko makes light of her rough year | 2017-08-28 05:30:16.907 | 2017-08-28 05:27:23.000 | 0 | 0.00 | 2017-08-28T05:44:01.142Z | 0 | 0 | 0 | ... | False | nan | NaN | 158 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
feff1a6e1719b32589f732d78f0674b16eda7362 | http://www.dailymail.co.uk/news/article-480178... | Footage emerges police pursuing three teens in... | 2017-08-18 08:57:15.138 | 2017-08-18 08:52:22.000 | 1 | 0.02 | 2017-08-18T14:20:02.194Z | 0 | 0 | 1 | ... | False | nan | NaN | 158 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
09adeb0c3ea57cca3be66948b344ae5f11d9fd9c | https://www.washingtonpost.com/sports/wizards/... | AP source: Pelicans get Clark on 1-year deal a... | 2017-08-02 14:30:10.098 | 2017-08-02 14:27:18.000 | 0 | 0.00 | 2017-08-03T00:02:01.056Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2449b63cc77bb3703652f186d50c61471d5e5413 | http://www.bbc.co.uk/news/av/world-middle-east... | Syria: Fighters and families leave Lebanese bo... | 2017-08-02 14:27:04.281 | 2017-08-02 14:24:42.000 | 0 | 0.00 | 2017-08-03T02:02:00.328Z | 0 | 0 | 0 | ... | False | nan | NaN | 96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2871dd160a1d8304572bd7252fbc453570d8e6c4 | http://www.dailymail.co.uk/tvshowbiz/article-4... | Bachelor Leah Costa spruiking whitening on Ins... | 2017-08-18 08:39:16.278 | 2017-08-18 08:36:48.000 | 0 | 0.00 | 2017-08-18T10:56:01.931Z | 0 | 0 | 0 | ... | False | nan | NaN | 158 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
e7bff28fd2679057599d7ed22b60c38c4180a063 | https://www.washingtonpost.com/sports/colleges... | Stadium Formerly Known as Turner Field begins ... | 2017-08-28 06:03:08.409 | 2017-08-28 05:55:27.000 | 0 | 0.00 | 2017-08-28T06:16:01.250Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
e0e12393c45c788a21b219e7f11514e3259d6516 | https://www.washingtonpost.com/politics/estima... | Estimates of North Korea’s nuclear weapons har... | 2017-08-18 08:30:02.897 | 2017-08-18 08:24:12.000 | 0 | 0.00 | 2017-08-18T11:50:01.962Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
41585d376ffd7791f8c499de2dfdff11d10c3c60 | https://www.theguardian.com/education/2017/aug... | Clearing: how to book a campus visit | 2017-08-18 08:03:00.966 | 2017-08-18 08:00:18.000 | 0 | 0.00 | 2017-08-18T11:22:02.489Z | 0 | 0 | 0 | ... | False | nan | NaN | 142 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
922ae172676d3ac344c7271f1f474babc439fb4b | https://www.washingtonpost.com/world/asia_paci... | IS cleric granted early release in Indonesia i... | 2017-08-18 08:09:10.806 | 2017-08-18 08:02:17.000 | 1 | 0.08 | 2017-08-18T10:38:01.961Z | 0 | 0 | 1 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3da538b715310642f0052392dd4a2dcbb1970a28 | https://www.washingtonpost.com/world/europe/2-... | 2 bears, not 1, killed Swedish wildlife park e... | 2017-08-18 08:09:10.617 | 2017-08-18 08:04:14.000 | 0 | 0.00 | 2017-08-18T11:28:03.101Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
94d289cea79893321d4e461b6b6f3dad2ff0c55b | http://www.huffingtonpost.com/entry/when-the-c... | When the City Went Dark | 2017-08-18 08:18:15.815 | 2017-08-18 08:06:36.831 | 0 | 0.00 | 2017-08-18T11:38:02.348Z | 0 | 0 | 0 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5ec7b3f8d818d6fd77eb5cef16779a709f7e2792 | https://www.washingtonpost.com/national/ap-pho... | AP PHOTOS: Editor selections from the past wee... | 2017-08-28 06:48:09.793 | 2017-08-28 06:04:09.000 | 0 | 0.00 | 2017-08-28T07:02:01.176Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
534aceb294e07e723db4511fe18de4e24010daef | https://www.washingtonpost.com/sports/dcunited... | Jackson-Hamel scores late to rally Impact past... | 2017-08-06 04:18:06.971 | 2017-08-06 04:15:11.000 | 0 | 0.00 | 2017-08-07T03:16:00.321Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
bae1995a13356f0253db5122dd5677a29737b7b3 | https://www.washingtonpost.com/sports/redskins... | As he so often did on the field, LT stole the ... | 2017-08-06 04:18:07.131 | 2017-08-06 04:10:52.000 | 0 | 0.00 | 2017-08-07T03:16:00.322Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5876875e2c6c0b653ce4b13d120d5aa1c4b260d9 | https://www.washingtonpost.com/world/europe/au... | Austrians first in Europe to make bacon on ind... | 2017-08-18 08:21:06.251 | 2017-08-18 08:16:11.000 | 0 | 0.00 | 2017-08-18T11:40:02.479Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
848a780fc988417184a63cf9fae5fec626700494 | https://www.washingtonpost.com/national/health... | APNewsBreak: Judge retires after leave for alc... | 2017-08-02 14:30:11.544 | 2017-08-02 14:23:16.000 | 1 | 0.02 | 2017-08-02T18:52:01.387Z | 0 | 0 | 1 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f1878be211ca091760999efaa1376302dff74b4c | http://www.huffingtonpost.com/entry/keeping-st... | Keeping Student Loan Debt Manageable | 2017-08-18 08:33:19.258 | 2017-08-18 08:24:09.443 | 0 | 0.00 | 2017-08-18T13:56:01.615Z | 0 | 0 | 0 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
0c59cf32da069488867a72869d4b0aa6715b12e6 | https://www.washingtonpost.com/world/africa/si... | Sierra Leone mudslides death toll now above 40... | 2017-08-18 08:30:09.886 | 2017-08-18 08:28:19.000 | 0 | 0.00 | 2017-08-18T11:50:01.976Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1656ad7fc35b3762dd6e8ed3df3fc28d5505984d | http://www.huffingtonpost.com/entry/8-common-c... | 8 Common Cover Letter Mistakes To Avoid | 2017-08-18 08:48:21.492 | 2017-08-18 08:36:12.612 | 1 | 0.00 | 2017-08-18T14:12:00.928Z | 0 | 0 | 1 | ... | False | nan | NaN | 215 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
8e142110e7b53e0939ddd4fee7c6cfd77887fc9f | https://www.washingtonpost.com/world/asia_paci... | The Latest: Disagreements in ASEAN delay joint... | 2017-08-06 04:09:09.650 | 2017-08-06 04:06:28.000 | 1 | 0.02 | 2017-08-06T19:52:00.178Z | 0 | 0 | 1 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
ec63335baa0d359e0a21ec3d739f60b12cbb3744 | http://www.telegraph.co.uk/travel/destinations... | Taj Jai Mahal Palace | 2017-08-18 08:33:00.274 | 2017-08-18 08:30:11.000 | 0 | 0.00 | 2017-08-18T13:56:00.979Z | 0 | 0 | 0 | ... | False | nan | NaN | 370 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
9bbfbfa88f510e893f9cd9351feeb2e804ed653b | https://www.washingtonpost.com/world/europe/it... | Italian authorities sequester German group’s r... | 2017-08-02 14:33:09.928 | 2017-08-02 14:24:18.000 | 0 | 0.00 | 2017-08-03T00:03:59.812Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
d691879d8e93adaac69382b77139e947e78d2900 | https://www.theguardian.com/artanddesign/2017/... | Giacometti and Rashid Johnson: this week’s bes... | 2017-08-18 08:33:01.161 | 2017-08-18 08:30:18.000 | 1 | 0.08 | 2017-08-18T08:46:01.439Z | 0 | 0 | 1 | ... | False | nan | NaN | 142 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
916d02e97272090b68aa4a7f174db860e599f8f0 | https://www.washingtonpost.com/world/africa/si... | Sierra Leone mudslides death toll now above 40... | 2017-08-18 08:42:12.821 | 2017-08-18 08:30:50.000 | 0 | 0.00 | 2017-08-18T11:00:02.361Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
144b7032d6d5f4c5bf515120000b20c449076201 | http://www.telegraph.co.uk/films/2017/08/18/br... | Brad Pitt loses court case after failing to pa... | 2017-08-18 08:36:00.369 | 2017-08-18 08:31:26.000 | 0 | 0.00 | 2017-08-18T12:58:00.255Z | 0 | 0 | 0 | ... | False | nan | NaN | 370 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
87dcccefc3866303de4d1ce37bcf1752cbaa6393 | http://www.bbc.co.uk/news/uk-england-dorset-40... | Bournemouth flats 'fake bomb': Man charged | 2017-08-18 08:36:02.405 | 2017-08-18 08:33:39.000 | 0 | 0.00 | 2017-08-18T12:58:01.643Z | 0 | 0 | 0 | ... | False | nan | NaN | 96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
895ce04b567747b53e4019158f2eeb2096ec5084 | https://www.washingtonpost.com/world/asia_paci... | Vietnam battles dengue outbreaks with 42 perce... | 2017-08-18 08:42:11.430 | 2017-08-18 08:34:13.000 | 0 | 0.00 | 2017-08-18T11:00:02.358Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
363e164d7ef68963ee54da261b3b40bcfabadbea | https://www.washingtonpost.com/sports/redskins... | FANTASY PLAYS: Prospects for old faces in new ... | 2017-08-28 06:03:15.513 | 2017-08-28 05:57:18.000 | 0 | 0.00 | 2017-08-28T06:16:01.257Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
e62c5553a5f0b912f2467075cb7c21e266f85e9f | https://www.washingtonpost.com/world/europe/ca... | Catalan regional president: at least one more ... | 2017-08-18 08:42:12.091 | 2017-08-18 08:36:09.000 | 0 | 0.00 | 2017-08-18T11:00:02.360Z | 0 | 0 | 0 | ... | False | nan | NaN | 191 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
084a7ac1c07cb222a7aa283f5b6d84d484cb54ec | https://www.theguardian.com/crosswords/crosswo... | Crossword blog: Cracking the Cryptics in real-... | 2017-08-21 09:51:00.501 | 2017-08-21 09:48:42.000 | 1 | 0.02 | 2017-08-21T18:20:00.629Z | 0 | 0 | 1 | ... | False | nan | NaN | 142 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
148391 rows × 25 columns
data["score_diff"] = data.promotion_score - data.response_score
# promoted but low response
data.sort_values("score_diff", ascending=False).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
a96c040af2ed0a2561a1bf0f4c19297f271ae3ed | http://www.bbc.co.uk/news/world-europe-41057304 | Man shot after attacking Brussels troops | 2017-08-25 19:33:00.838 | 2017-08-25 19:30:37 | 0 | 147.96 | 2017-08-25T21:10:02.357Z | 0 | 0 | 0 | ... | 43641284.00 | 2017-08-25T19:37:22.000Z | 96 | 0.00 | 20.00 | 10.00 | 15.00 | 45.00 | 45.00 | 45.00 |
5dd70fc9ea42b2d127033bcd64660f1940030c92 | http://www.bbc.co.uk/news/world-europe-40982926 | Key Barcelona attack suspect confirmed dead | 2017-08-18 20:30:03.116 | 2017-08-18 20:29:14 | 0 | 1668.82 | 2017-08-19T21:52:00.484Z | 0 | 0 | 0 | ... | 43518815.00 | 2017-08-18T20:35:08.000Z | 96 | 0.00 | 20.00 | 10.00 | 15.00 | 45.00 | 45.00 | 45.00 |
a891d138c2bcf31d948afcfb856a59f991d78e4f | http://www.bbc.co.uk/news/world-us-canada-4090... | Trump: US 'locked and loaded' on N Korea | 2017-08-11 11:42:01.493 | 2017-08-11 11:39:05 | 0 | 2549.83 | 2017-08-12T19:19:59.811Z | 0 | 0 | 0 | ... | 43416675.00 | 2017-08-11T11:48:41.000Z | 96 | 0.00 | 20.00 | 8.00 | 15.00 | 43.00 | 43.00 | 43.00 |
7871898fe8e346056041debe5afffde02118dd33 | https://www.buzzfeed.com/melissasegura/sixteen... | Sixteen Years Ago He Was Granted A New Hearing... | 2017-08-25 09:51:14.099 | 2017-08-23 01:44:46 | 0 | 0.00 | 2017-08-25T10:04:01.518Z | 0 | 0 | 0 | ... | 2650108.00 | 2017-08-25T14:48:20.000Z | 147 | 0.00 | 19.00 | 13.00 | 11.00 | 43.00 | 43.00 | 43.00 |
2eee49e0cf12b7f87a2cdafb4cabfd17bdb53965 | https://www.buzzfeed.com/borzoudaragahi/these-... | These Are The Western Fighters Who Volunteered... | 2017-08-13 14:06:14.093 | 2017-08-09 14:00:57 | 1 | 0.08 | 2017-08-13T14:20:01.132Z | 0 | 0 | 1 | ... | 2638724.00 | 2017-08-13T18:33:00.000Z | 147 | 0.00 | 19.00 | 13.00 | 11.00 | 43.00 | 43.00 | 43.00 |
de58848c3fba3550be9fa164194024dc17baacb3 | https://www.buzzfeed.com/claudiarosenbaum/tayl... | Taylor Swift's Trial Over A DJ Allegedly Grabb... | 2017-08-05 12:21:17.776 | 2017-08-01 18:54:02 | 3 | 0.25 | 2017-08-05T12:34:01.753Z | 0 | 0 | 3 | ... | 2626421.00 | 2017-08-05T17:39:00.000Z | 147 | 1.00 | 19.00 | 14.00 | 11.00 | 44.00 | 45.00 | 43.00 |
e0e54d8a3ca097df78e14a8ca6d30abb26e5ecb9 | https://www.buzzfeed.com/leticiamiranda/scams-... | People Are Losing A Bunch Of Money From These ... | 2017-08-17 15:42:18.068 | 2017-08-16 15:28:55 | 2 | 0.17 | 2017-08-17T15:56:01.851Z | 0 | 0 | 2 | ... | 2643708.00 | 2017-08-17T18:09:00.000Z | 147 | 1.00 | 19.00 | 14.00 | 11.00 | 44.00 | 45.00 | 43.00 |
6666e26a4ec8a7875c73b728cad8f353e2045bc4 | https://www.buzzfeed.com/borzoudaragahi/the-wa... | The US Is Far More Deeply Involved In Syria Th... | 2017-08-06 15:12:15.694 | 2017-08-05 15:12:13 | 3 | 0.25 | 2017-08-06T15:26:00.707Z | 0 | 0 | 3 | ... | 2627238.00 | 2017-08-06T22:40:00.000Z | 147 | 1.00 | 19.00 | 14.00 | 11.00 | 44.00 | 45.00 | 43.00 |
927c56158391f50dfc3b1549195deebf175e2478 | https://www.buzzfeed.com/tylerkingkade/want-to... | Want To Fire A Professor For Sexual Harassment... | 2017-08-17 14:09:15.759 | 2017-08-15 15:41:29 | 1 | 0.08 | 2017-08-17T14:22:01.807Z | 0 | 0 | 1 | ... | 2643633.00 | 2017-08-17T16:39:00.000Z | 147 | 0.00 | 19.00 | 12.00 | 11.00 | 42.00 | 42.00 | 42.00 |
d6047b1d2d2a850fa9053b6bd9c0aeef7df8fdae | http://www.bbc.co.uk/news/world-asia-40957725 | US-South Korea set for divisive military drills | 2017-08-21 00:33:02.151 | 2017-08-21 00:30:00 | 0 | 53.16 | 2017-08-21T03:02:00.081Z | 0 | 0 | 0 | ... | 43546090.00 | 2017-08-21T02:37:12.000Z | 96 | 0.00 | 20.00 | 5.00 | 15.00 | 40.00 | 40.00 | 40.00 |
a611b0ede73ab49dc8a7b82ef7a47fb5b0c8e0d8 | https://www.buzzfeed.com/buzzfeednews/taylor-s... | Taylor Swift Prepares To Battle DJ In Court Ov... | 2017-08-07 18:57:18.050 | 2017-08-07 18:54:12 | 37 | 0.11 | 2017-08-08T18:55:59.837Z | 6 | 9 | 22 | ... | 2636375.00 | 2017-08-12T00:13:39.000Z | 147 | 5.00 | 19.00 | 13.00 | 11.00 | 43.00 | 48.00 | 38.00 |
7c61a13b28222804ce718ff15fc6c5349e246369 | https://www.theguardian.com/us-news/2017/aug/1... | Charlottesville: violent clashes break out bet... | 2017-08-12 15:51:00.487 | 2017-08-12 15:48:52 | 0 | 2478.92 | 2017-08-14T10:24:00.752Z | 0 | 0 | 0 | ... | 7643415.00 | 2017-08-12T15:59:29.000Z | 142 | 0.00 | 19.00 | 4.00 | 13.00 | 36.00 | 36.00 | 36.00 |
2711716851862d1880226d271c35931b00411551 | https://www.buzzfeed.com/shannonrosenberg/do-y... | Do You Need To Give Up Alcohol If You're Tryin... | 2017-08-24 18:48:19.569 | 2017-08-24 00:59:23 | 63 | 0.37 | 2017-08-25T00:12:01.927Z | 16 | 27 | 20 | ... | 2650910.00 | 2017-08-26T17:44:00.000Z | 147 | 7.00 | 19.00 | 13.00 | 11.00 | 43.00 | 50.00 | 36.00 |
3ba536b7e826a1d9d8873c786d733f3e4a49cbc8 | https://www.theguardian.com/us-news/2017/aug/2... | Donald Trump says he'll expand US military int... | 2017-08-22 01:57:00.862 | 2017-08-22 01:55:06 | 0 | 6.50 | 2017-08-22T02:10:01.693Z | 0 | 0 | 0 | ... | 7655513.00 | 2017-08-22T08:20:00.000Z | 142 | 0.00 | 19.00 | 4.00 | 13.00 | 36.00 | 36.00 | 36.00 |
2ebf576335435e03111cda6dca10e98892c8f3d2 | https://www.buzzfeed.com/scaachikoul/hey-ameri... | Hey, America, Now You Have Our Worst People. Y... | 2017-08-19 14:25:03.772 | 2017-08-18 21:12:12 | 60 | 0.25 | 2017-08-19T14:38:01.259Z | 4 | 23 | 33 | ... | 2645223.00 | 2017-08-20T00:30:00.000Z | 147 | 7.00 | 19.00 | 12.00 | 11.00 | 42.00 | 49.00 | 35.00 |
e7222e4d2eb1eea9c874b5f07b5fe82b6167f006 | https://www.buzzfeed.com/mikehayes/police-and-... | Police Around The US Are Bracing For Newly Emb... | 2017-08-18 19:06:16.126 | 2017-08-18 16:26:10 | 64 | 0.50 | 2017-08-18T19:20:03.083Z | 7 | 17 | 40 | ... | 2645358.00 | 2017-08-20T21:03:00.000Z | 147 | 7.00 | 19.00 | 11.00 | 11.00 | 41.00 | 48.00 | 34.00 |
89008fd54697f1ab1c7ce17b39827ff84d5af7a3 | https://www.buzzfeed.com/mitchprothero/the-big... | The Big Question for Investigators: Was The Ch... | 2017-08-15 02:09:13.889 | 2017-08-15 01:05:44 | 70 | 0.24 | 2017-08-15T03:24:01.573Z | 3 | 49 | 18 | ... | 2640309.00 | 2017-08-15T15:00:24.000Z | 147 | 8.00 | 19.00 | 11.00 | 11.00 | 41.00 | 49.00 | 33.00 |
c8feb6222f1e07129625c8562c2ec6b9153c5a08 | https://www.buzzfeed.com/johnstanton/what-its-... | What It’s Like To Be Deported To Mexico After ... | 2017-08-03 14:24:16.235 | 2017-08-01 17:19:53 | 5 | 0.42 | 2017-08-03T14:38:02.873Z | 0 | 0 | 5 | ... | nan | NaN | 147 | 1.00 | 19.00 | 15.00 | 0.00 | 34.00 | 35.00 | 33.00 |
89a7ab6236bad1abd7afc15172b9e8a953eea9d4 | http://www.foxnews.com/politics/2017/08/01/pen... | Pentagon investigators find ‘security risks’ i... | 2017-08-01 22:05:04.897 | 2017-08-01 22:00:00 | 0 | 130.83 | 2017-08-02T01:18:00.243Z | 0 | 0 | 0 | ... | 15585866.00 | 2017-08-02T09:30:00.000Z | 285 | 0.00 | 17.00 | 2.00 | 14.00 | 33.00 | 33.00 | 33.00 |
46b0401e9b8415dc7d7980c1c777748c5807ca72 | http://www.dailymail.co.uk/news/article-476969... | Sinead O'Connor says she is suicidal and livin... | 2017-08-07 23:24:14.335 | 2017-08-07 23:21:28 | 0 | 79.36 | 2017-08-08T00:52:01.245Z | 0 | 0 | 0 | ... | 12014442.00 | 2017-08-08T00:15:10.000Z | 158 | 0.00 | 18.00 | 1.00 | 14.00 | 33.00 | 33.00 | 33.00 |
adf8f08cd9b3c9d8308dfe6c72c4d61cff4fd797 | https://www.buzzfeed.com/dominicholden/6-reaso... | 6 Reasons Why Trump Would Hit A Wall If He Tri... | 2017-08-25 14:03:21.184 | 2017-08-23 15:45:55 | 1 | 0.08 | 2017-08-25T14:16:01.942Z | 0 | 0 | 1 | ... | nan | NaN | 147 | 0.00 | 19.00 | 14.00 | 0.00 | 33.00 | 33.00 | 33.00 |
73580f28cdc22b2cf6f133f5d8d0c76f2f92da92 | http://www.dailymail.co.uk/news/article-482068... | Neighbour calls autistic boy 'it' in horrifyin... | 2017-08-24 17:30:14.417 | 2017-08-24 17:29:55 | 0 | 43.08 | 2017-08-24T20:50:00.443Z | 0 | 0 | 0 | ... | 12173156.00 | 2017-08-24T19:45:28.000Z | 158 | 0.00 | 18.00 | 1.00 | 14.00 | 33.00 | 33.00 | 33.00 |
cbce8eb31a83066977187d0dde3cbe4d29793374 | https://www.theguardian.com/politics/2017/aug/... | UK will 'mirror' much of EU customs system for... | 2017-08-15 11:33:00.655 | 2017-08-15 11:31:05 | 0 | 77.77 | 2017-08-16T09:11:59.702Z | 0 | 0 | 0 | ... | 7646421.00 | 2017-08-15T12:56:36.000Z | 142 | 0.00 | 19.00 | 1.00 | 13.00 | 33.00 | 33.00 | 33.00 |
d896f7176dd863588481711c0eb8e19905b961ca | https://www.theguardian.com/education/2017/aug... | GCSE overhaul means results are ‘incomparable ... | 2017-08-23 23:03:03.669 | 2017-08-23 23:01:03 | 12 | 1.00 | 2017-08-24T22:30:02.773Z | 2 | 4 | 6 | ... | 7657721.00 | 2017-08-24T06:19:07.000Z | 142 | 2.00 | 19.00 | 3.00 | 13.00 | 35.00 | 37.00 | 33.00 |
034dadb1c814df55b110fb56d45f220a0af64ba8 | https://www.buzzfeed.com/rosebuchanan/karate-kid | Meet The Little Syrian Girl Who Wants To Be A ... | 2017-08-15 16:57:17.726 | 2017-08-14 22:17:29 | 13 | 0.08 | 2017-08-15T17:10:02.786Z | 0 | 6 | 7 | ... | 2641800.00 | 2017-08-16T04:54:00.000Z | 147 | 3.00 | 19.00 | 6.00 | 11.00 | 36.00 | 39.00 | 33.00 |
25 rows × 26 columns
# high response but not promoted
data.sort_values("score_diff", ascending=True).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
18189c494eb422d1b0ab26167076175e05a18985 | https://www.washingtonpost.com/local/the-lates... | The Latest: Police identify helicopter, troope... | 2017-08-13 01:06:07.377 | 2017-08-13 01:04:19.000 | 339268 | 4231.07 | 2017-08-13T05:28:00.473Z | 90555 | 207253 | 41460 | ... | nan | NaN | 191 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
bf626f6b535a51ccffbebca41b61027d9dc8bf6d | http://www.cnn.com/2016/01/06/health/adult-col... | Why adult coloring books are good for you | 2017-08-01 17:42:17.336 | 2017-08-01 17:37:24.000 | 255341 | 1.31 | 2017-08-02T16:39:59.574Z | 52448 | 162431 | 40462 | ... | nan | NaN | 105 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
ecd92911b177889b3a902a7ddfde0d1ff086f8c4 | http://www.huffingtonpost.com/entry/remove-him... | Remove Him Now | 2017-08-19 18:03:21.545 | 2017-08-19 17:48:07.484 | 105278 | 213.55 | 2017-08-20T00:40:00.669Z | 10282 | 87549 | 7447 | ... | nan | NaN | 215 | 49.00 | 0.00 | 0.00 | 0.00 | 0.00 | 49.00 | -49.00 |
284b3454ef8d5235469a5c32f9ea20ba029b9ff4 | http://www.foxnews.com/us/2017/08/14/boston-ho... | Boston Holocaust Memorial smashed to pieces | 2017-08-15 00:25:03.656 | 2017-08-15 00:09:00.000 | 130061 | 90.42 | 2017-08-15T20:36:00.218Z | 26095 | 87829 | 16137 | ... | nan | NaN | 285 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
976ee1af9f57e6a45c1a02cc7a05cc6eccfa498b | http://www.huffingtonpost.com/entry/the-proble... | The Problem With Melania Isn't Her Shoes | 2017-08-31 01:03:14.487 | 2017-08-31 00:47:41.413 | 95153 | 207.42 | 2017-09-02T15:08:00.267Z | 18286 | 67948 | 8919 | ... | nan | NaN | 215 | 49.00 | 0.00 | 0.00 | 0.00 | 0.00 | 49.00 | -49.00 |
9d5416e1d850656cec86528c97dfeef0b738d032 | http://yournewswire.com/charlottesville-hillar... | Charlottesville Killer Was Hillary Supporter, ... | 2017-08-13 20:35:16.681 | 2017-08-13 19:04:01.000 | 162946 | 102.64 | 2017-08-14T14:12:00.081Z | 65217 | 66428 | 31301 | ... | nan | NaN | 22568 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
0a5ef01bb7614fe47fe1e2bff6ed8d90b3bf88d1 | https://www.rt.com/on-air/400998-barcelona-ant... | Anti-terrorism march in Barcelona in wake of a... | 2017-08-26 16:20:14.530 | 2017-08-26 16:20:14.530 | 175444 | 2826.47 | 2017-08-26T23:48:01.315Z | 24072 | 10417 | 140955 | ... | nan | NaN | 365 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
558e0d04a7977fafaee4d2c5e7f78d514470d153 | https://www.buzzfeed.com/alexandrearagao/esse-... | Esse é o tamanho da área que Temer liberou par... | 2017-08-24 20:33:16.676 | 2017-08-24 20:32:56.000 | 79416 | 185.92 | 2017-08-25T19:52:00.145Z | 4568 | 65462 | 9386 | ... | nan | NaN | 147 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
597bc58ed2caa997c80434e8db5fa7f37d0167bc | http://www.foxnews.com/us/2017/08/04/planned-p... | Planned Parenthood: Teach your preschoolers 't... | 2017-08-04 15:15:06.479 | 2017-08-04 12:29:00.000 | 275925 | 166.25 | 2017-08-07T16:22:00.043Z | 100038 | 147626 | 28261 | ... | nan | NaN | 285 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
8e862d36fc0c1b6f14b78bb64ddebfe62c9426fd | https://www.washingtonpost.com/news/post-polit... | Trump doubles down on initial Charlottesville ... | 2017-08-15 20:35:02.695 | 2017-08-15 20:31:00.000 | 95511 | 2332.74 | 2017-08-15T23:36:01.987Z | 31677 | 54688 | 9146 | ... | nan | NaN | 191 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
6404530570733ce02f88d193eb64a0597ac829ae | http://www.cnn.com/2017/08/20/entertainment/je... | Jerry Lee Lewis dies at 91 | 2017-08-20 18:24:15.530 | 2017-08-20 18:20:54.000 | 80427 | 322.55 | 2017-08-20T21:42:01.255Z | 9256 | 52707 | 18464 | ... | nan | NaN | 105 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
eb4883d5690803f0708b472e24ff3345b586f2c9 | https://www.nytimes.com/2017/08/17/us/politics... | Trump ‘Sad’ Over Removal of ‘Our Beautiful Sta... | 2017-08-17 13:24:00.082 | 2017-08-17 13:22:51.000 | 92464 | 263.08 | 2017-08-17T15:01:59.775Z | 29592 | 54307 | 8565 | ... | nan | NaN | 120 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
f026f1d592c1ea91ee7a2a69531b74e7fada42cc | http://www.foxnews.com/entertainment/2017/08/2... | ESPN pulls Asian-American announcer from Virgi... | 2017-08-23 01:35:04.690 | 2017-08-23 01:07:00.000 | 287252 | 357.92 | 2017-08-23T13:36:01.174Z | 123252 | 133264 | 30736 | ... | nan | NaN | 285 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
d7300e64ac7f445fd36cf88631473f8a0814a919 | https://www.washingtonpost.com/national/black-... | Black-clad anarchists storm Berkeley rally, as... | 2017-08-28 00:42:14.796 | 2017-08-28 00:34:51.000 | 100159 | 577.62 | 2017-08-28T19:32:00.579Z | 53284 | 34686 | 12189 | ... | nan | NaN | 191 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
8ea9f43ab02d1b7c56cdef6169ecf819fff2bf6c | https://www.buzzfeed.com/beatrizserranomolina/... | 11 actos de pura bondad tras el atentado para ... | 2017-08-17 19:36:12.550 | 2017-08-17 19:31:30.000 | 81083 | 188.33 | 2017-08-18T03:04:00.413Z | 2054 | 66208 | 12821 | ... | nan | NaN | 147 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
d59d637fc9a6a7afc160f03452e26f3fff3eef93 | http://www.foxnews.com/sports/2017/08/23/hank-... | Hank Aaron says he won't watch NFL because of ... | 2017-08-23 16:15:57.261 | 2017-08-23 15:57:00.000 | 106640 | 7311.08 | 2017-08-25T09:16:00.526Z | 9576 | 80768 | 16296 | ... | nan | NaN | 285 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
146b733457836a9790f68079b387d3a64fc9908c | https://www.theguardian.com/sport/2017/aug/30/... | Aaron Rodgers: Racial injustice is 'real' and ... | 2017-08-30 17:03:20.697 | 2017-08-30 16:59:10.000 | 75615 | 186.68 | 2017-09-01T17:04:00.374Z | 9522 | 57462 | 8631 | ... | nan | NaN | 142 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
9c2aefdd909a169662e267ac4674e29cd5ddf360 | http://www.foxnews.com/opinion/2017/08/23/cour... | Court rules high school football coach cannot ... | 2017-08-23 23:30:10.887 | 2017-08-23 23:15:00.000 | 421505 | 466.92 | 2017-08-25T13:04:00.035Z | 90276 | 275289 | 55940 | ... | nan | NaN | 285 | 50.00 | 0.00 | 3.00 | 0.00 | 3.00 | 53.00 | -47.00 |
7364eb9055281cb9954c8444bee1d5366d35a1ad | https://www.nytimes.com/2017/08/18/internation... | Crusader Who Saved Elephants From Poachers Is ... | 2017-08-18 17:48:02.978 | 2017-08-18 17:47:46.000 | 74154 | 130.18 | 2017-08-19T03:43:59.592Z | 4033 | 64964 | 5157 | ... | nan | NaN | 120 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
6fc9f70c910c724431d9f1cfc073e02e95a30a67 | http://www.foxnews.com/us/2017/08/29/florida-p... | Florida professor who tweeted Texans deserved ... | 2017-08-29 17:05:04.018 | 2017-08-29 16:00:00.000 | 90959 | 99.39 | 2017-08-30T02:05:59.449Z | 27757 | 53203 | 9999 | ... | nan | NaN | 285 | 48.00 | 0.00 | 2.00 | 0.00 | 2.00 | 50.00 | -46.00 |
f6f5d20cf87c8e50560130566d95203f4a3bef05 | https://www.washingtonpost.com/news/energy-env... | The Trump administration just disbanded a fede... | 2017-08-20 11:09:09.339 | 2017-08-20 11:00:32.000 | 87692 | 192.43 | 2017-08-21T15:07:59.825Z | 11504 | 66531 | 9657 | ... | nan | NaN | 191 | 48.00 | 0.00 | 2.00 | 0.00 | 2.00 | 50.00 | -46.00 |
8dc913cfa338ad0c6ffe32dc68abcdb9214dcb19 | http://www.foxnews.com/us/2017/08/18/two-polic... | Two police officers slain near Orlando, Fla. | 2017-08-19 04:10:03.038 | 2017-08-19 03:50:00.000 | 68737 | 161.97 | 2017-08-19T13:34:00.065Z | 11217 | 49251 | 8269 | ... | nan | NaN | 285 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
0051d5757667245ff75ba7ffae654e5007c0fb47 | http://www.foxnews.com/us/2017/08/15/college-s... | College student, 22, arrested for her role in ... | 2017-08-15 22:30:03.616 | 2017-08-15 21:58:00.000 | 67150 | 390.96 | 2017-08-16T23:00:01.574Z | 12099 | 49717 | 5334 | ... | nan | NaN | 285 | 47.00 | 0.00 | 2.00 | 0.00 | 2.00 | 49.00 | -45.00 |
e0d90ecadd5a0983f52e7fe1006e87d1dea782be | http://www.dailymail.co.uk/tvshowbiz/article-4... | SPOILER ALERT! Pia Miller’s Home And Away char... | 2017-08-10 07:42:16.320 | 2017-08-10 07:41:08.000 | 49578 | 139.86 | 2017-08-10T10:24:02.068Z | 27929 | 17486 | 4163 | ... | nan | NaN | 158 | 45.00 | 0.00 | 0.00 | 0.00 | 0.00 | 45.00 | -45.00 |
cc2ddab940518388e6fdba6cb66fc182b187eb3d | https://www.nytimes.com/2017/08/16/business/tr... | Trump’s Council of C.E.O.’s in Disarray Follow... | 2017-08-16 16:27:00.512 | 2017-08-16 16:23:56.000 | 52568 | 216.16 | 2017-08-16T17:42:02.788Z | 9229 | 32937 | 10402 | ... | nan | NaN | 120 | 45.00 | 0.00 | 0.00 | 0.00 | 0.00 | 45.00 | -45.00 |
25 rows × 26 columns
Write that data to a file. Note that the scores here are provisional for two reasons:
data.to_csv("articles_with_provisional_scores_2017-08-01_2017-08-31.csv")
The attention index of an article is comprised of four components:
Or, in other words:
\begin{align} attentionIndex_a &= leadScore_a + frontScore_a + facebookPromotionScore_a + responseScore_a \\ leadScore_a &= 20 \cdot \left(\frac{\min(minsAsLead_a, 60)}{alexaRank_a}\right) \cdot \left( \frac{\min(alexaRank)}{60} \right) \\ frontScore_a &= 15 \cdot \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \cdot \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right) \\ facebookPromotion_a &= \begin{cases} 0 \text{ if not shared on brand page }\\ 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \text{ otherwise } \end{cases} \\ responseScore_a &= \begin{cases} 0 \text{ if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} \text{ if } engagements_a > 0 \end{cases} \\ \end{align}