%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
date_filename = "2018-01-01_2018-01-31"
data = pd.read_csv("articles_" + date_filename + ".csv", index_col="id", \
parse_dates=["published", "discovered"])
data.head()
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
6709592f25cbe37c8c42157a3e31529a6997d919 | https://www.thetimes.co.uk/article/the-ashes-e... | The Ashes: England need 90mph-plus bowlers, sa... | 2018-01-01 11:44:10.414 | 2018-01-01 | 6 | 0.032787 | 2018-01-01T19:02:07.273Z | 0 | 4 | 2 | The Times | thetimes_co_uk | 0 | 735 | 224.0 | False | NaN | NaN | 6435 | 125.0 |
e11638f9f5eb587404372cc1f1554a828ae6ef90 | https://www.thetimes.co.uk/article/wilfried-za... | Wilfried Zaha did well and should not face ban... | 2018-01-01 11:44:10.834 | 2018-01-01 | 0 | 0.000000 | 2018-01-01T19:02:07.293Z | 0 | 0 | 0 | The Times | thetimes_co_uk | 0 | 270 | 224.0 | False | NaN | NaN | 6435 | 125.0 |
9b64c5e6ba5cfd51dad3619a8ea341eba1a12d01 | https://www.thetimes.co.uk/article/crystal-pal... | Crystal Palace worked hard to stem Manchester ... | 2018-01-01 11:44:12.831 | 2018-01-01 | 0 | 0.000000 | 2018-01-01T19:02:07.303Z | 0 | 0 | 0 | The Times | thetimes_co_uk | 0 | 735 | 224.0 | False | NaN | NaN | 6435 | 125.0 |
bc098ae138bc4659f0d723dd13a6df94bf27de1c | https://www.thetimes.co.uk/article/british-ceo... | British CEO Richard Cousins and family killed ... | 2018-01-01 10:44:08.985 | 2018-01-01 | 2 | 0.032787 | 2018-01-01T13:57:17.529Z | 0 | 0 | 2 | The Times | thetimes_co_uk | 724 | 795 | 224.0 | False | NaN | NaN | 6435 | 125.0 |
295570fc56f70ffec0f5c1c13028640a988f90e1 | https://www.thetimes.co.uk/article/referee-jon... | Referee Jon Moss in wrong place to award penalty | 2018-01-01 11:39:07.331 | 2018-01-01 | 0 | 0.000000 | 2018-01-01T19:57:06.317Z | 0 | 0 | 0 | The Times | thetimes_co_uk | 0 | 740 | 224.0 | False | NaN | NaN | 6435 | 125.0 |
The response score is a number between 0 and 50 that indicates the level of response to an article.
Perhaps in the future we may choose to include other factors, but for now we just include engagements on Facebook. The maximum score of 50 should be achieved by an article that does really well compared with others.
pd.options.display.float_format = '{:.2f}'.format
data.fb_engagements.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 153117.00 mean 1108.49 std 8278.99 min 0.00 50% 26.00 75% 256.00 90% 1621.00 95% 4095.20 99% 19918.88 99.5% 32935.68 99.9% 89239.94 max 1077082.00 Name: fb_engagements, dtype: float64
There's 1 article with more than 1 million engagements this month.
data[data.fb_engagements > 1000000]
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
bff3221285a08a6b4a931a08fa55a8dab8dcd83a | https://www.washingtonpost.com/politics/trump-... | Trump attacks protections for immigrants from ... | 2018-01-11 21:43:16.311 | 2018-01-11 21:37:00 | 1077082 | 2928.83 | 2018-01-11T22:16:05.460Z | 362995 | 622561 | 91526 | The Washington Post | washingtonpost_com | 725 | 2158 | 87.00 | True | 6113598.00 | 2018-01-11T21:41:41.000Z | 191 | 313.00 |
data.fb_engagements.mode()
0 0 dtype: int64
november Going back to the enagement counts, we see the mean is 1,117, mode is zero, median is 24, 90th percentile is 1,453, 99th percentile is 21,166, 99.5th percentile is 33,982. The standard deviation is 8,083, significantly higher than the mean, so this is not a normal distribution.
december Going back to the enagement counts, we see the mean is 1,106, mode is zero, median is 24, 90th percentile is 1,545, 99th percentile is 20,228, 99.5th percentile is 32,446. The standard deviation is 9,852, significantly higher than the mean, so this is not a normal distribution.
january 2018 Going back to the enagement counts, we see the mean is 1,108, mode is zero, median is 26, 90th percentile is 1,621, 99th percentile is 19,918, 99.5th percentile is 32,935. The standard deviation is 8,278, significantly higher than the mean, so this is not a normal distribution.
Key publishers stats
data.groupby("publisher_id").agg({'url': 'count', 'fb_engagements': ['sum', 'median', 'mean']})
url | fb_engagements | |||
---|---|---|---|---|
count | sum | median | mean | |
publisher_id | ||||
anotherangryvoice_blogspot_co_uk | 28 | 68554 | 1762.00 | 2448.36 |
bbc_co_uk | 12656 | 8743859 | 29.00 | 690.89 |
breitbart_com | 2661 | 11465343 | 275.00 | 4308.66 |
brexitcentral_com | 49 | 39390 | 232.00 | 803.88 |
buzzfeed_com | 1457 | 4642842 | 215.00 | 3186.58 |
cnn_com | 4435 | 19180181 | 541.00 | 4324.73 |
dailymail_co_uk | 24668 | 16357051 | 28.00 | 663.09 |
economist_com | 484 | 122211 | 36.00 | 252.50 |
evolvepolitics_com | 64 | 200556 | 1325.50 | 3133.69 |
foxnews_com | 6824 | 17698744 | 48.00 | 2593.60 |
ft_com | 4725 | 400374 | 4.00 | 84.74 |
huffingtonpost_com | 6443 | 10846012 | 25.00 | 1683.38 |
independent_co_uk | 6278 | 4906592 | 38.00 | 781.55 |
indy100_com | 525 | 555345 | 94.00 | 1057.80 |
lemonde_fr | 3835 | 1942400 | 66.00 | 506.49 |
libdemvoice_org | 156 | 1952 | 7.00 | 12.51 |
mirror_co_uk | 11168 | 8388679 | 45.00 | 751.14 |
nbcnews_com | 2061 | 6240679 | 432.00 | 3027.99 |
newstatesman_com | 470 | 65882 | 20.00 | 140.17 |
npr_org | 2105 | 7155373 | 258.00 | 3399.23 |
nytimes_com | 4605 | 18833832 | 249.00 | 4089.87 |
order-order_com | 246 | 86229 | 162.50 | 350.52 |
propublica_org | 31 | 46538 | 279.00 | 1501.23 |
reuters_com | 5849 | 1964369 | 21.00 | 335.85 |
rt_com | 2521 | 2049423 | 297.00 | 812.94 |
skwawkbox_org | 121 | 45462 | 241.00 | 375.72 |
telegraph_co_uk | 6619 | 2342935 | 21.00 | 353.97 |
thecanary_co | 175 | 172030 | 636.00 | 983.03 |
theguardian_com | 8208 | 8937561 | 125.00 | 1088.88 |
thetimes_co_uk | 9479 | 342359 | 1.00 | 36.12 |
washingtonpost_com | 23426 | 13972718 | 0.00 | 596.46 |
westmonster_com | 330 | 348369 | 34.00 | 1055.66 |
yournewswire_com | 415 | 1564515 | 137.00 | 3769.92 |
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
non_zero_fb_enagagements = data.fb_engagements[data.fb_engagements > 0]
That's a bit better, but still way too clustered at the low end. Let's look at a log normal distribution.
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(12,4.5))
plt.hist(np.log(non_zero_fb_enagagements + median), bins=50)
plt.axvline(np.log(mean), linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(np.log(median), label=f'Median ({median:,.0f})', color='green')
plt.axvline(np.log(ninety), linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(np.log(ninetyfive), linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
plt.axvline(np.log(ninetynine), linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
eng = data.fb_engagements[(data.fb_engagements < 5000)]
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(15,7))
plt.hist(eng, bins=50)
plt.title("Article count by engagements")
plt.axvline(median, label=f'Median ({median:,.0f})', color='green')
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(ninety, linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(ninetyfive, linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
# plt.axvline(ninetynine, linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
log_engagements = (non_zero_fb_enagagements
.clip_upper(data.fb_engagements.quantile(.999))
.apply(lambda x: np.log(x + median))
)
log_engagements.describe()
count 124439.00 mean 5.04 std 1.75 min 3.30 25% 3.56 50% 4.44 75% 6.10 max 11.40 Name: fb_engagements, dtype: float64
Use standard feature scaling to bring that to a 1 to 50 range
def scale_log_engagements(engagements_logged):
return np.ceil(
50 * (engagements_logged - log_engagements.min()) / (log_engagements.max() - log_engagements.min())
)
def scale_engagements(engagements):
return scale_log_engagements(np.log(engagements + median))
scaled_non_zero_engagements = scale_log_engagements(log_engagements)
scaled_non_zero_engagements.describe()
count 124439.00 mean 11.22 std 10.81 min 0.00 25% 2.00 50% 8.00 75% 18.00 max 50.00 Name: fb_engagements, dtype: float64
# add in the zeros, as zero
scaled_engagements = pd.concat([scaled_non_zero_engagements, data.fb_engagements[data.fb_engagements == 0]])
proposed = pd.DataFrame({"fb_engagements": data.fb_engagements, "response_score": scaled_engagements})
proposed.response_score.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x11db3e320>
Looks good to me, lets save that.
data["response_score"] = proposed.response_score
The maximum of 50 points is awarded when the engagements are greater than the 99.9th percentile, rolling over the last month.
i.e. where $limit$ is the 99.5th percentile of engagements calculated over the previous month, the response score for article $a$ is:
\begin{align} basicScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ \log(\min(engagements_a,limit) + median(engagements)) & \text{if } engagements_a > 0 \end{cases} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{basicScore_a - \min(basicScore)}{\max(basicScore) - \min(basicScore)} & \text{if } engagements_a > 0 \end{cases} \\ \\ \text{The latter equation can be expanded to:} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} & \text{if } engagements_a > 0 \end{cases} \\ \end{align}The aim of the promotion score is to indicate how important the article was to the publisher, by tracking where they chose to promote it. This is a number between 0 and 50 comprised of:
The first two should be scaled by the popularity/reach of the home page, for which we use the alexa page rank as a proxy.
The last should be scaled by the popularity/reach of the brand page, for which we use the number of likes the brand page has.
data.mins_as_lead.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 153117.00 mean 9.31 std 95.51 min 0.00 50% 0.00 75% 0.00 90% 0.00 95% 0.00 99% 274.00 99.5% 565.00 99.9% 1204.88 max 11522.00 Name: mins_as_lead, dtype: float64
As expected, the vast majority of articles don't make it as lead. Let's explore how long typically publishers put something as lead for.
lead_articles = data[data.mins_as_lead > 0]
lead_articles.mins_as_lead.describe([0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 4319.00 mean 329.99 std 466.47 min 4.00 25% 89.00 50% 180.00 75% 399.00 90% 834.00 95% 1065.50 99% 1691.22 99.5% 2546.28 99.9% 5381.40 max 11522.00 Name: mins_as_lead, dtype: float64
lead_articles.mins_as_lead.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x117db3eb8>
For lead, it's a significant thing for an article to be lead at all, so although we want to penalise articles that were lead for a very short time, mostly we want to score the maximum even if it wasn't lead for ages. So we'll give maximum points when something has been lead for an hour.
lead_articles.mins_as_lead.clip_upper(60).plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x1161f5f60>
We also want to scale this by the alexa page rank, such that the maximum score of 20 points is for an article that was on the front for 4 hours for the most popular site.
So lets explore the alexa nunbers.
alexa_ranks = data.groupby(by="publisher_id").alexa_rank.mean().sort_values()
alexa_ranks
publisher_id bbc_co_uk 96 cnn_com 105 nytimes_com 120 theguardian_com 142 buzzfeed_com 147 dailymail_co_uk 158 washingtonpost_com 191 huffingtonpost_com 215 foxnews_com 285 rt_com 365 telegraph_co_uk 370 independent_co_uk 386 reuters_com 497 npr_org 594 lemonde_fr 618 mirror_co_uk 706 nbcnews_com 826 breitbart_com 994 ft_com 1596 economist_com 1825 indy100_com 5014 thetimes_co_uk 6435 newstatesman_com 12769 thecanary_co 15686 propublica_org 16066 yournewswire_com 22568 order-order_com 32515 anotherangryvoice_blogspot_co_uk 77827 westmonster_com 97775 evolvepolitics_com 119412 skwawkbox_org 152475 libdemvoice_org 344992 brexitcentral_com 469149 Name: alexa_rank, dtype: int64
alexa_ranks.plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x118a3afd0>
Let's try the simple option first: just divide the number of minutes as lead by the alexa rank. What's the scale of numbers we get then.
lead_proposal_1 = lead_articles.mins_as_lead.clip_upper(60) / lead_articles.alexa_rank
lead_proposal_1.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x110d1b4e0>
Looks like there's too much of a cluster around 0. Have we massively over penalised the publishers with a high alexa rank?
lead_proposal_1.groupby(data.publisher_id).mean().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x119a66860>
Yes. Let's try taking the log of the alexa rank and see if that looks better.
lead_proposal_2 = (lead_articles.mins_as_lead.clip_upper(60) / np.log(lead_articles.alexa_rank))
lead_proposal_2.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x119df6f60>
lead_proposal_2.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 28.00 | 5.33 | 0.00 | 5.33 | 5.33 | 5.33 | 5.33 | 5.33 |
bbc_co_uk | 111.00 | 12.88 | 1.25 | 5.26 | 13.15 | 13.15 | 13.15 | 13.15 |
breitbart_com | 202.00 | 8.42 | 1.16 | 1.30 | 8.69 | 8.69 | 8.69 | 8.69 |
brexitcentral_com | 40.00 | 4.59 | 0.00 | 4.59 | 4.59 | 4.59 | 4.59 | 4.59 |
buzzfeed_com | 331.00 | 11.55 | 1.69 | 2.00 | 12.02 | 12.02 | 12.02 | 12.02 |
cnn_com | 212.00 | 12.39 | 1.82 | 1.07 | 12.89 | 12.89 | 12.89 | 12.89 |
dailymail_co_uk | 180.00 | 11.63 | 1.11 | 2.96 | 11.85 | 11.85 | 11.85 | 11.85 |
economist_com | 65.00 | 7.54 | 1.67 | 0.67 | 7.99 | 7.99 | 7.99 | 7.99 |
foxnews_com | 106.00 | 10.45 | 1.09 | 0.88 | 10.61 | 10.61 | 10.61 | 10.61 |
ft_com | 98.00 | 7.62 | 1.56 | 0.54 | 8.14 | 8.14 | 8.14 | 8.14 |
huffingtonpost_com | 162.00 | 10.96 | 1.19 | 2.61 | 11.17 | 11.17 | 11.17 | 11.17 |
independent_co_uk | 133.00 | 9.67 | 1.63 | 0.67 | 10.07 | 10.07 | 10.07 | 10.07 |
indy100_com | 132.00 | 5.97 | 1.83 | 0.47 | 5.75 | 7.04 | 7.04 | 7.04 |
lemonde_fr | 203.00 | 8.09 | 2.51 | 0.62 | 8.56 | 9.34 | 9.34 | 9.34 |
libdemvoice_org | 135.00 | 4.60 | 0.47 | 1.49 | 4.71 | 4.71 | 4.71 | 4.71 |
mirror_co_uk | 316.00 | 8.61 | 1.53 | 0.61 | 9.15 | 9.15 | 9.15 | 9.15 |
nbcnews_com | 115.00 | 8.52 | 1.48 | 0.74 | 8.93 | 8.93 | 8.93 | 8.93 |
newstatesman_com | 64.00 | 5.88 | 1.28 | 1.59 | 6.35 | 6.35 | 6.35 | 6.35 |
npr_org | 166.00 | 8.95 | 1.45 | 0.78 | 9.39 | 9.39 | 9.39 | 9.39 |
nytimes_com | 61.00 | 12.32 | 1.29 | 2.92 | 12.53 | 12.53 | 12.53 | 12.53 |
order-order_com | 243.00 | 4.59 | 1.52 | 0.48 | 3.37 | 5.78 | 5.78 | 5.78 |
propublica_org | 15.00 | 6.20 | 0.00 | 6.20 | 6.20 | 6.20 | 6.20 | 6.20 |
reuters_com | 97.00 | 9.52 | 0.99 | 2.42 | 9.66 | 9.66 | 9.66 | 9.66 |
rt_com | 154.00 | 9.56 | 1.98 | 0.68 | 10.17 | 10.17 | 10.17 | 10.17 |
skwawkbox_org | 120.00 | 4.76 | 0.70 | 1.68 | 5.03 | 5.03 | 5.03 | 5.03 |
telegraph_co_uk | 104.00 | 9.79 | 1.67 | 0.85 | 10.15 | 10.15 | 10.15 | 10.15 |
thecanary_co | 162.00 | 4.87 | 1.65 | 0.93 | 4.04 | 5.69 | 6.21 | 6.21 |
theguardian_com | 156.00 | 11.59 | 1.90 | 1.01 | 12.11 | 12.11 | 12.11 | 12.11 |
thetimes_co_uk | 70.00 | 6.80 | 0.35 | 3.88 | 6.84 | 6.84 | 6.84 | 6.84 |
washingtonpost_com | 78.00 | 11.22 | 1.08 | 2.67 | 11.42 | 11.42 | 11.42 | 11.42 |
westmonster_com | 76.00 | 5.01 | 0.72 | 0.87 | 5.22 | 5.22 | 5.22 | 5.22 |
yournewswire_com | 184.00 | 5.68 | 0.64 | 3.29 | 5.99 | 5.99 | 5.99 | 5.99 |
lead_proposal_2.groupby(data.publisher_id).min().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x11ae24710>
That looks about right, as long as the smaller publishers were closer to zero. So let's apply feature scaling to this, to give a number between 1 and 20. (Anything not as lead will pass though as zero.)
def rescale(series):
return (series - series.min()) / (series.max() - series.min())
lead_proposal_3 = np.ceil(20 * rescale(lead_proposal_2))
lead_proposal_2.min(), lead_proposal_2.max()
(0.46948415885821004, 13.145359968846892)
lead_proposal_3.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x11a20cdd8>
lead_proposal_3.groupby(data.publisher_id).median().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x11c6ea278>
data["lead_score"] = pd.concat([lead_proposal_3, data.mins_as_lead[data.mins_as_lead==0]])
data.lead_score.value_counts().sort_index()
0.00 148799 1.00 37 2.00 46 3.00 40 4.00 59 5.00 58 6.00 79 7.00 242 8.00 249 9.00 310 10.00 171 11.00 176 12.00 87 13.00 307 14.00 531 15.00 254 16.00 359 17.00 266 18.00 250 19.00 441 20.00 356 Name: lead_score, dtype: int64
data.lead_score.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 20.00 breitbart_com 13.00 brexitcentral_com 7.00 buzzfeed_com 19.00 cnn_com 20.00 dailymail_co_uk 18.00 economist_com 12.00 evolvepolitics_com 0.00 foxnews_com 17.00 ft_com 13.00 huffingtonpost_com 17.00 independent_co_uk 16.00 indy100_com 11.00 lemonde_fr 14.00 libdemvoice_org 7.00 mirror_co_uk 14.00 nbcnews_com 14.00 newstatesman_com 10.00 npr_org 15.00 nytimes_com 20.00 order-order_com 9.00 propublica_org 10.00 reuters_com 15.00 rt_com 16.00 skwawkbox_org 8.00 telegraph_co_uk 16.00 thecanary_co 10.00 theguardian_com 19.00 thetimes_co_uk 11.00 washingtonpost_com 18.00 westmonster_com 8.00 yournewswire_com 9.00 Name: lead_score, dtype: float64
In summary then, score for article $a$ is:
$$ unscaledLeadScore_a = \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)}\\ leadScore_a = 19 \cdot \frac{unscaledLeadScore_a - \min(unscaledLeadScore)} {\max(unscaledLeadScore) - \min(unscaledLeadScore)} + 1 $$Since the minium value of $minsAsLead$ is 1, $\min(unscaledLeadScore)$ is pretty insignificant. So we can simplify this to:
$$ leadScore_a = 20 \cdot \frac{unscaledLeadScore_a } {\max(unscaledLeadScore)} $$or:
$$ leadScore_a = 20 \cdot \frac{\frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} } {\frac{60}{\log(\max(alexaRank))}} $$$$ leadScore_a = \left( 20 \cdot \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} \cdot {\frac{\log(\max(alexaRank))}{60}} \right) $$This is similar to time as lead, so lets try doing the same calculation, except we also want to factor in the number of slots on the front:
$$frontScore_a = 15 \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right)$$(data.alexa_rank * data.num_articles_on_front).min() / 1440
2.4500000000000002
time_on_front_proposal_1 = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15)
time_on_front_proposal_1.plot.hist(figsize=(15, 7), bins=15)
<matplotlib.axes._subplots.AxesSubplot at 0x11c7cdf98>
time_on_front_proposal_1.value_counts().sort_index()
1.00 75945 2.00 7136 3.00 4789 4.00 4278 5.00 878 6.00 691 7.00 586 8.00 461 9.00 785 10.00 265 11.00 360 12.00 242 13.00 143 14.00 70 15.00 49 dtype: int64
time_on_front_proposal_1.groupby(data.publisher_id).sum()
publisher_id anotherangryvoice_blogspot_co_uk 28.00 bbc_co_uk 15065.00 breitbart_com 2511.00 brexitcentral_com 49.00 buzzfeed_com 10691.00 cnn_com 12825.00 dailymail_co_uk 14849.00 economist_com 306.00 evolvepolitics_com 60.00 foxnews_com 7982.00 ft_com 3173.00 huffingtonpost_com 8120.00 independent_co_uk 4474.00 indy100_com 524.00 lemonde_fr 3831.00 libdemvoice_org 156.00 mirror_co_uk 10157.00 nbcnews_com 1916.00 newstatesman_com 469.00 npr_org 2713.00 nytimes_com 9621.00 order-order_com 244.00 propublica_org 31.00 reuters_com 6819.00 rt_com 4496.00 skwawkbox_org 121.00 telegraph_co_uk 4614.00 thecanary_co 173.00 theguardian_com 12648.00 thetimes_co_uk 9453.00 washingtonpost_com 9329.00 westmonster_com 312.00 yournewswire_com 415.00 dtype: float64
That looks good to me.
data["front_score"] = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15).fillna(0)
data.front_score
id 6709592f25cbe37c8c42157a3e31529a6997d919 1.00 e11638f9f5eb587404372cc1f1554a828ae6ef90 1.00 9b64c5e6ba5cfd51dad3619a8ea341eba1a12d01 1.00 bc098ae138bc4659f0d723dd13a6df94bf27de1c 1.00 295570fc56f70ffec0f5c1c13028640a988f90e1 1.00 ff04d53263605f5efc6c6d76caf017ba8e5b8509 1.00 970471ce0301861cd2b0779962dc3472b046ecea 1.00 4a168ba227abeb2a58f82f30fff5b32a03f29385 1.00 c7597aad01cd3c53687acb8b61508bda6e5d225a 1.00 56f399ce2d7a57364e10d40374cdc97179a081ad 1.00 f4e3b123fc07a47ccf453898247445941fce5698 1.00 cde01d750345c11f090de5878b5670911764023a 1.00 bff2cf0f5dccd112114bee2bef44b9b997614c8f 1.00 80ac264faa1e7d33d259e7fb8fd23e407b206a5f 1.00 e0c6d47ad0eff33ec01de70358e9ac08ac174ceb 1.00 cabb3998d53aad09eccecf39fdff2439f4f5a633 1.00 61203ee49335a3ef45f2ce2569add6ee81218656 1.00 64062be203288cee65392ecf145b251530a6bcae 1.00 bed3040ac7e43058687d705dd6a81e0f4bd68fdc 1.00 bc7d67564c937e73a95973a15df2cf188a048599 1.00 fcda9a52108c426979159fdf116b91633e72e154 1.00 f2e678a5666b127f3c6fda0d3ce314dcd6cec070 1.00 e47f06fdafae541bc8dda3119ab54bc2ca70baa5 1.00 12758258a478a89fa010a9d4da88325261c2f61a 1.00 9677da854a3c51cd71e35be22f360a406fc828a3 1.00 cf7182e390027ca370b7662bcb7f3324219320bc 1.00 09a289c1510acd123077b19d8ab48e2095da68eb 1.00 bfa1de8df0c0d5c7e97d009d4feb92c4de3ca1d2 1.00 cffe241f08e165304a328d0046b6d92065afad36 1.00 4f7e173e04e51edb008f9d4e3a4502ef097fb1ad 1.00 ... 3fe9dca55849d2b25cdc29941d7e011b2ca955fd 1.00 af56fd11c764c962f6120e27158ecaa3339674da 2.00 8bcef6d882aae991f7d3c80b1ed25465770e7c19 0.00 7c57ac5a380f98e778257ab9dace202c21423097 1.00 de3f92d120372ac0679142290e43c77bfc44e691 1.00 16b149d23d8cd5e353b3fb8f0d0bb3bacf585467 2.00 7c3e0bc18a1e846a754f6ca18aa30651f3b4979b 0.00 4a909f0dac387fb517c56fb38f2f426239846d79 1.00 fe6a25d446813e1b17c191497a4286a8ad60a5a2 1.00 3a872555ceb6978f73bdd9fc5446a79c4fc52d81 1.00 513b1d928f48dcf4c4b3528a85729cd12ef36cd5 1.00 7d69cce8b0296b1cb157c121067d541a222384e6 1.00 eeb47ed17507d1860e21f679d416b8be24b7b2e5 6.00 1a668ae2a2056e73783bbc0e53ced3ec15cd1e5a 1.00 4b9796f822b53388cff5916ff001a27de904546c 6.00 ec3827bc109c277faa6541717f1e33d26d15b503 2.00 89cfcaea8a42e46f1a0f57394dd0cdd5441c123b 3.00 41919868fff4a70e0be44172a93c2bf4064b1232 0.00 20fbb21c7b25ec5bfe0c1c85b958d49850ea5cd9 1.00 911ed79e4f81474fc3d362d1b3c686164a01af2b 0.00 528a9514debb452f9913933c8495be5007f07ce9 9.00 cf03a379bfda577f6e45c2adbc30b356e5ceebe7 1.00 2f11cfa0a6aef8b8662e2a41198a97343a036e12 1.00 fee7917d380be4de51d8c4469839777b4370209b 1.00 cd48ab167cb8db2a22f7d973852124129076487f 2.00 d4f5db1d3ea8b9ec6c71b7f5684e83ad9c18bc93 1.00 e6b89227d71f71a54c69e254378762144ab25afb 1.00 643beda1c6c5fe47f57e957b4cfa8825def2f62a 1.00 c97fcbdf099cbdc1509aadb77ae3458b202f4c1b 0.00 1d3eef43fc0e56f2cb8592fde4ecec7cb7d27264 1.00 Name: front_score, Length: 153117, dtype: float64
One way a publisher has of promoting content is to post to their brand page. The significance of doing so is stronger when the brand page has more followers (likes).
$$ facebookPromotionProposed1_a = 15 \left( \frac {brandPageLikes_a} {\max(brandPageLikes)} \right) $$Now lets explore the data to see if that makes sense. tr;dr the formula above is incorrect
data.fb_brand_page_likes.max()
45711259.0
facebook_promotion_proposed_1 = np.ceil((15 * (data.fb_brand_page_likes / data.fb_brand_page_likes.max())).fillna(0))
facebook_promotion_proposed_1.value_counts().sort_index().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x11f534860>
facebook_promotion_proposed_1.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 28.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
bbc_co_uk | 12656.00 | 0.60 | 2.94 | 0.00 | 0.00 | 0.00 | 0.00 | 15.00 |
breitbart_com | 2661.00 | 0.83 | 0.99 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
brexitcentral_com | 49.00 | 0.98 | 0.14 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
buzzfeed_com | 1457.00 | 0.42 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 |
cnn_com | 4435.00 | 2.33 | 4.23 | 0.00 | 0.00 | 0.00 | 0.00 | 10.00 |
dailymail_co_uk | 24668.00 | 0.57 | 1.59 | 0.00 | 0.00 | 0.00 | 0.00 | 5.00 |
economist_com | 484.00 | 2.50 | 1.12 | 0.00 | 3.00 | 3.00 | 3.00 | 3.00 |
evolvepolitics_com | 64.00 | 0.84 | 0.37 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
foxnews_com | 6824.00 | 0.60 | 1.81 | 0.00 | 0.00 | 0.00 | 0.00 | 6.00 |
ft_com | 4725.00 | 0.46 | 0.84 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
huffingtonpost_com | 6443.00 | 0.74 | 1.55 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
independent_co_uk | 6278.00 | 0.57 | 1.18 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
indy100_com | 525.00 | 0.64 | 0.48 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
lemonde_fr | 3835.00 | 0.72 | 0.96 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
libdemvoice_org | 156.00 | 0.87 | 0.34 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
mirror_co_uk | 11168.00 | 0.23 | 0.42 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
nbcnews_com | 2061.00 | 2.34 | 1.97 | 0.00 | 0.00 | 4.00 | 4.00 | 4.00 |
newstatesman_com | 470.00 | 0.79 | 0.41 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
npr_org | 2105.00 | 1.41 | 1.50 | 0.00 | 0.00 | 0.00 | 3.00 | 3.00 |
nytimes_com | 4605.00 | 1.78 | 2.52 | 0.00 | 0.00 | 0.00 | 5.00 | 6.00 |
order-order_com | 246.00 | 0.80 | 0.40 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
propublica_org | 31.00 | 0.87 | 0.34 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
reuters_com | 5849.00 | 0.68 | 0.95 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
rt_com | 2521.00 | 1.07 | 1.00 | 0.00 | 0.00 | 2.00 | 2.00 | 2.00 |
skwawkbox_org | 121.00 | 0.99 | 0.09 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
telegraph_co_uk | 6619.00 | 0.50 | 0.87 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
thecanary_co | 175.00 | 0.97 | 0.18 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
theguardian_com | 8208.00 | 0.53 | 1.14 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
thetimes_co_uk | 9479.00 | 0.06 | 0.23 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
washingtonpost_com | 23426.00 | 0.16 | 0.68 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
westmonster_com | 330.00 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
yournewswire_com | 415.00 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
That's too much variation: sites like the Guardian, which have a respectable 7.5m likes, should not be scoring a 3. Lets try applying a log to it, and then standard feature scaling again.
data.fb_brand_page_likes.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 330551.00 bbc_co_uk 45711259.00 breitbart_com 3803903.00 brexitcentral_com 14522.00 buzzfeed_com 2902337.00 cnn_com 29591651.00 dailymail_co_uk 14227830.00 economist_com 8450845.00 evolvepolitics_com 126771.00 foxnews_com 16183276.00 ft_com 3743010.00 huffingtonpost_com 9846648.00 independent_co_uk 8048146.00 indy100_com 235483.00 lemonde_fr 4019902.00 libdemvoice_org 8629.00 mirror_co_uk 2970189.00 nbcnews_com 9522368.00 newstatesman_com 154752.00 npr_org 6281946.00 nytimes_com 15274145.00 order-order_com 45461.00 propublica_org 376979.00 reuters_com 3951592.00 rt_com 4967624.00 skwawkbox_org 6920.00 telegraph_co_uk 4420609.00 thecanary_co 158053.00 theguardian_com 7869117.00 thetimes_co_uk 741434.00 washingtonpost_com 6129074.00 westmonster_com 17216.00 yournewswire_com 27926.00 Name: fb_brand_page_likes, dtype: float64
np.log(2149)
7.6727578966425103
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max())
publisher_id anotherangryvoice_blogspot_co_uk 12.71 bbc_co_uk 17.64 breitbart_com 15.15 brexitcentral_com 9.58 buzzfeed_com 14.88 cnn_com 17.20 dailymail_co_uk 16.47 economist_com 15.95 evolvepolitics_com 11.75 foxnews_com 16.60 ft_com 15.14 huffingtonpost_com 16.10 independent_co_uk 15.90 indy100_com 12.37 lemonde_fr 15.21 libdemvoice_org 9.06 mirror_co_uk 14.90 nbcnews_com 16.07 newstatesman_com 11.95 npr_org 15.65 nytimes_com 16.54 order-order_com 10.72 propublica_org 12.84 reuters_com 15.19 rt_com 15.42 skwawkbox_org 8.84 telegraph_co_uk 15.30 thecanary_co 11.97 theguardian_com 15.88 thetimes_co_uk 13.52 washingtonpost_com 15.63 westmonster_com 9.75 yournewswire_com 10.24 Name: fb_brand_page_likes, dtype: float64
That's more like it, but the lower numbers should be smaller.
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max() / 1000)
publisher_id anotherangryvoice_blogspot_co_uk 5.80 bbc_co_uk 10.73 breitbart_com 8.24 brexitcentral_com 2.68 buzzfeed_com 7.97 cnn_com 10.30 dailymail_co_uk 9.56 economist_com 9.04 evolvepolitics_com 4.84 foxnews_com 9.69 ft_com 8.23 huffingtonpost_com 9.19 independent_co_uk 8.99 indy100_com 5.46 lemonde_fr 8.30 libdemvoice_org 2.16 mirror_co_uk 8.00 nbcnews_com 9.16 newstatesman_com 5.04 npr_org 8.75 nytimes_com 9.63 order-order_com 3.82 propublica_org 5.93 reuters_com 8.28 rt_com 8.51 skwawkbox_org 1.93 telegraph_co_uk 8.39 thecanary_co 5.06 theguardian_com 8.97 thetimes_co_uk 6.61 washingtonpost_com 8.72 westmonster_com 2.85 yournewswire_com 3.33 Name: fb_brand_page_likes, dtype: float64
scaled_fb_brand_page_likes = (data.fb_brand_page_likes / 1000)
facebook_promotion_proposed_2 = np.ceil(\
(15 * \
(np.log(scaled_fb_brand_page_likes) / np.log(scaled_fb_brand_page_likes.max()))\
)\
).fillna(0)
facebook_promotion_proposed_2.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 9.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 4.00 buzzfeed_com 12.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 7.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 8.00 lemonde_fr 12.00 libdemvoice_org 4.00 mirror_co_uk 12.00 nbcnews_com 13.00 newstatesman_com 8.00 npr_org 13.00 nytimes_com 14.00 order-order_com 6.00 propublica_org 9.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 3.00 telegraph_co_uk 12.00 thecanary_co 8.00 theguardian_com 13.00 thetimes_co_uk 10.00 washingtonpost_com 13.00 westmonster_com 4.00 yournewswire_com 5.00 Name: fb_brand_page_likes, dtype: float64
LGTM. So the equation is
$$ facebookPromotion_a = 15 \left( \frac {\log(\frac {brandPageLikes_a}{1000})} {\log(\frac {\max(brandPageLikes)}{1000}))} \right) $$Now, let's try applying standard feature scaling approch to this, rather than using a magic number of 1,000. That equation would be:
\begin{align} unscaledFacebookPromotion_a &= \log(brandPageLikes_a) \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \min(unscaledFacebookPromotion)}{\max(unscaledFacebookPromotion) - \min(unscaledFacebookPromotion)} \\ \\ \text{The scaling can be simplified to:} \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \\ \\ \text{Meaning the overall equation becomes:} \\ facebookPromotion_a &= 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \end{align}facebook_promotion_proposed_3 = np.ceil(
(14 *
(
(np.log(data.fb_brand_page_likes) - np.log(data.fb_brand_page_likes.min()) ) /
(np.log(data.fb_brand_page_likes.max()) - np.log(data.fb_brand_page_likes.min()))
)
) + 1
)
facebook_promotion_proposed_3.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 3.00 buzzfeed_com 11.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 6.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 7.00 lemonde_fr 12.00 libdemvoice_org 2.00 mirror_co_uk 11.00 nbcnews_com 13.00 newstatesman_com 7.00 npr_org 12.00 nytimes_com 14.00 order-order_com 5.00 propublica_org 8.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 2.00 telegraph_co_uk 12.00 thecanary_co 7.00 theguardian_com 13.00 thetimes_co_uk 9.00 washingtonpost_com 12.00 westmonster_com 3.00 yournewswire_com 4.00 Name: fb_brand_page_likes, dtype: float64
data["facebook_promotion_score"] = facebook_promotion_proposed_3.fillna(0.0)
data["promotion_score"] = (data.lead_score + data.front_score + data.facebook_promotion_score)
data["attention_index"] = (data.promotion_score + data.response_score)
data.promotion_score.plot.hist(bins=np.arange(50), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x11d8b4550>
data.attention_index.plot.hist(bins=np.arange(100), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x12190ea20>
data.attention_index.value_counts().sort_index()
0.00 23633 1.00 18731 2.00 12787 3.00 8984 4.00 6282 5.00 5323 6.00 4475 7.00 3959 8.00 3472 9.00 3364 10.00 2965 11.00 2604 12.00 2532 13.00 2487 14.00 2270 15.00 2156 16.00 1953 17.00 1869 18.00 1865 19.00 1697 20.00 1754 21.00 1655 22.00 1566 23.00 1541 24.00 1598 25.00 1537 26.00 1548 27.00 1507 28.00 1414 29.00 1415 ... 65.00 137 66.00 126 67.00 85 68.00 96 69.00 65 70.00 56 71.00 54 72.00 45 73.00 41 74.00 47 75.00 28 76.00 38 77.00 33 78.00 30 79.00 25 80.00 29 81.00 22 82.00 22 83.00 19 84.00 16 85.00 13 86.00 11 87.00 12 88.00 15 89.00 8 90.00 7 91.00 4 92.00 3 93.00 3 94.00 2 Name: attention_index, Length: 95, dtype: int64
# and lets see the articles with the biggest attention index
data.sort_values("attention_index", ascending=False)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
d3e36a9aec548acb3aa6e52fb2960b896ed2bb30 | http://www.cnn.com/2018/01/18/politics/kfile-c... | Trump appointee Carl Higbie made racist, sexis... | 2018-01-18 23:22:22.097 | 2018-01-18 23:19:53.000 | 82049 | 143.27 | 2018-01-19T02:15:17.846Z | 21737 | 44033 | 16279 | ... | 29503636.00 | 2018-01-19T02:00:26.000Z | 105 | 1895.00 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
46dbfbb92005da970c527bb3252280bb9a80a59b | http://www.cnn.com/2018/01/03/politics/bannon-... | Bannon: 2016 Trump Tower meeting was 'treasonous' | 2018-01-03 14:28:22.755 | 2018-01-03 14:23:20.000 | 82456 | 239.51 | 2018-01-03T16:14:12.826Z | 30056 | 40458 | 11942 | ... | 29378732.00 | 2018-01-03T14:39:05.000Z | 105 | 213.00 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
836e90a16e36d69e3bae8639d593540609a492bb | https://www.buzzfeed.com/mollyhensleyclancy/am... | Amazon CEO Says He Will Give $33 Million To DA... | 2018-01-12 18:13:09.758 | 2018-01-12 17:41:10.000 | 87223 | 90.60 | 2018-01-13T03:48:08.456Z | 3014 | 79490 | 4719 | ... | 2832832.00 | 2018-01-12T21:23:51.000Z | 147 | 394.00 | 50.00 | 19.00 | 13.00 | 11.00 | 43.00 | 93.00 |
0c2106548a33f2a7484f41b2837ea474f3f3ba28 | http://www.cnn.com/2018/01/13/politics/hawaii-... | Missile threat alert for Hawaii a false alarm | 2018-01-13 18:49:25.317 | 2018-01-13 18:47:27.000 | 127664 | 423.06 | 2018-01-13T20:14:11.190Z | 46730 | 64532 | 16402 | ... | 29450153.00 | 2018-01-13T19:00:05.000Z | 105 | 151.00 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
031074c129960b10fef8e16248004635dec0c4bc | http://money.cnn.com/2018/01/08/media/oprah-go... | Oprah's Golden Globes speech sounds like the s... | 2018-01-08 14:24:28.728 | 2018-01-08 13:43:13.000 | 270420 | 488.99 | 2018-01-08T16:56:07.305Z | 118451 | 136331 | 15638 | ... | 29403330.00 | 2018-01-08T14:30:12.000Z | 105 | nan | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
5449f01e35cad5f97ef4c67fc9e7426f1c3a9c32 | https://www.buzzfeed.com/darrensands/maxine-wa... | Maxine Waters Is Giving A National Address On ... | 2018-01-26 23:24:14.272 | 2018-01-26 23:17:14.000 | 66115 | 48.89 | 2018-01-27T18:41:10.932Z | 12582 | 46117 | 7416 | ... | 2840906.00 | 2018-01-27T03:31:24.000Z | 147 | 506.00 | 49.00 | 19.00 | 13.00 | 11.00 | 43.00 | 92.00 |
b9d5a53dc38085bebff40395f900b84bfa4ee6e4 | https://www.buzzfeed.com/claudiarosenbaum/glee... | "Glee" Star Mark Salling Has Been Found Dead A... | 2018-01-30 18:43:57.665 | 2018-01-30 18:29:47.000 | 71995 | 241.28 | 2018-01-30T19:06:07.944Z | 29088 | 36326 | 6581 | ... | 2872459.00 | 2018-01-30T18:42:41.000Z | 147 | 340.00 | 49.00 | 19.00 | 13.00 | 11.00 | 43.00 | 92.00 |
a89037c0b486187e70b6f78bcf6a1b107f90a5fc | https://www.buzzfeed.com/claudiarosenbaum/yout... | YouTube Cuts Business Ties With Logan Paul Ami... | 2018-01-11 00:22:17.301 | 2018-01-11 00:17:37.000 | 62951 | 277.34 | 2018-01-11T22:04:14.039Z | 9702 | 50765 | 2484 | ... | 2831171.00 | 2018-01-11T01:15:22.000Z | 147 | 435.00 | 48.00 | 19.00 | 14.00 | 11.00 | 44.00 | 92.00 |
325a9e972176c2e8346af4898db4b02db0533467 | http://www.bbc.co.uk/news/world-europe-42851668 | Ikea founder Kamprad dies at 91 | 2018-01-28 10:43:09.711 | 2018-01-28 10:40:44.000 | 46121 | 691.93 | 2018-01-28T14:04:13.609Z | 6628 | 33679 | 5814 | ... | 45634425.00 | 2018-01-28T11:14:27.000Z | 96 | 63.00 | 46.00 | 20.00 | 10.00 | 15.00 | 45.00 | 91.00 |
bf757059d2ede473ff44491ea8936e6fee6bec7b | https://www.buzzfeed.com/shylawatson/cranberri... | Cranberries Lead Singer Dolores O'Riordan Is D... | 2018-01-15 17:53:14.623 | 2018-01-15 17:41:31.000 | 60778 | 515.74 | 2018-01-15T18:15:09.484Z | 9310 | 45640 | 5828 | ... | 2833812.00 | 2018-01-15T17:51:08.000Z | 147 | 15.00 | 48.00 | 19.00 | 13.00 | 11.00 | 43.00 | 91.00 |
3686ef425ec9504f3a69ca163125389e54f01140 | http://www.cnn.com/2018/01/16/politics/cory-bo... | Booker slams DHS secretary's 'amnesia' on Trum... | 2018-01-16 21:13:29.763 | 2018-01-16 21:07:58.000 | 82362 | 291.65 | 2018-01-16T22:48:10.235Z | 15537 | 60149 | 6676 | ... | 29471640.00 | 2018-01-16T22:30:23.000Z | 105 | 224.00 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
12f0856aa2335764f662826ce5b06f426994c0bc | http://www.cnn.com/2018/01/02/politics/donald-... | Trump tweets about nuclear war with North Korea | 2018-01-03 01:25:16.734 | 2018-01-03 01:22:43.000 | 89667 | 409.64 | 2018-01-03T02:09:08.916Z | 34543 | 45550 | 9574 | ... | 29377421.00 | 2018-01-03T01:31:55.000Z | 105 | 168.00 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
ebf2d44938694f691015401e8ce69ff868ecb3bb | http://www.bbc.co.uk/news/health-42736764 | Cancer blood test ‘enormously exciting’ | 2018-01-19 00:28:05.929 | 2018-01-19 00:25:31.000 | 31304 | 79.54 | 2018-01-19T09:20:11.225Z | 1653 | 24318 | 5333 | ... | 45529689.00 | 2018-01-19T09:07:02.000Z | 96 | 537.00 | 44.00 | 20.00 | 11.00 | 15.00 | 46.00 | 90.00 |
848356f1267dee63b461fa50fe499575953ebba9 | https://www.buzzfeed.com/maryanngeorgantopoulo... | Here Are Powerful Quotes From More Than 100 Yo... | 2018-01-24 17:10:23.558 | 2018-01-24 17:07:26.000 | 50168 | 59.18 | 2018-01-24T18:34:15.709Z | 3417 | 39932 | 6819 | ... | 2838754.00 | 2018-01-24T17:15:10.000Z | 147 | 3410.00 | 47.00 | 19.00 | 13.00 | 11.00 | 43.00 | 90.00 |
72c2fa696ffd709cc532fda038a40ef15a188a52 | http://www.bbc.co.uk/news/world-us-canada-4258... | Trump and Republicans to plot 2018 plans in Ca... | 2018-01-06 11:55:09.306 | 2018-01-06 11:52:24.000 | 26287 | 106.85 | 2018-01-06T17:11:14.528Z | 7174 | 15981 | 3132 | ... | 45396536.00 | 2018-01-06T17:02:47.000Z | 96 | 600.00 | 43.00 | 20.00 | 12.00 | 15.00 | 47.00 | 90.00 |
aedb3f16b21bae6a49c98785f50a5a16f636a990 | http://www.cnn.com/2018/01/11/politics/immigra... | Trump decries 'people from shithole countries'... | 2018-01-11 22:07:21.799 | 2018-01-11 22:03:10.000 | 244631 | 458.05 | 2018-01-12T01:18:14.256Z | 102187 | 122190 | 20254 | ... | 29421925.00 | 2018-01-12T01:01:22.000Z | 105 | 56.00 | 50.00 | 20.00 | 5.00 | 15.00 | 40.00 | 90.00 |
8972d20a6774e5b2c625d0471b2c10e62adf7136 | https://www.buzzfeed.com/delaneystrunk/logan-p... | People Are Calling For Logan Paul To Be Banned... | 2018-01-02 09:09:49.158 | 2018-01-02 04:19:39.000 | 54646 | 76.06 | 2018-01-02T16:10:05.453Z | 16626 | 34273 | 3747 | ... | 2826086.00 | 2018-01-02T12:33:09.000Z | 147 | nan | 47.00 | 19.00 | 13.00 | 11.00 | 43.00 | 90.00 |
6d17ce33359a0a32b86aeed9b48dab6b3352a18b | https://www.buzzfeed.com/michaelblackmon/kylie... | Kylie Jenner Just Gave Birth To A Baby Girl | 2018-02-04 20:44:12.166 | 2018-01-12 21:40:07.000 | 71668 | 414.65 | 2018-02-04T21:27:11.843Z | 12953 | 55680 | 3035 | ... | 2899130.00 | 2018-02-04T21:02:53.000Z | 147 | 52.00 | 49.00 | 19.00 | 11.00 | 11.00 | 41.00 | 90.00 |
00b6faac3688806beb622d7c1d02cf6d3522bfe3 | https://www.cnn.com/2018/01/24/politics/muelle... | Robert Mueller: Bombshells takes probe to crit... | 2018-01-24 06:10:10.962 | 2018-01-24 06:04:24.000 | 41852 | 102.31 | 2018-01-24T12:48:09.226Z | 11790 | 24560 | 5502 | ... | 29534935.00 | 2018-01-24T12:30:08.000Z | 105 | 1356.00 | 46.00 | 20.00 | 9.00 | 15.00 | 44.00 | 90.00 |
13de1765a9c9cf6c5c809840fa8d71a0aefe5a4f | https://www.buzzfeed.com/buzzfeednews/womens-m... | Live Updates: Women Are Marching Around The Wo... | 2018-01-20 11:19:18.412 | 2018-01-20 09:46:24.000 | 48669 | 275.43 | 2018-01-20T22:29:15.546Z | 3039 | 43873 | 1757 | ... | 2836672.00 | 2018-01-20T18:42:49.000Z | 147 | nan | 47.00 | 19.00 | 12.00 | 11.00 | 42.00 | 89.00 |
39d81ac1e023ec82e9ad4bfce2655210d77510c2 | https://www.buzzfeed.com/jasonleopold/newly-un... | Newly Uncovered Russian Payments Are A Focus O... | 2018-01-17 14:49:41.390 | 2018-01-17 02:46:23.000 | 32816 | 90.50 | 2018-01-18T03:59:16.208Z | 3943 | 23138 | 5735 | ... | 2834077.00 | 2018-01-17T14:57:02.000Z | 147 | 2531.00 | 44.00 | 19.00 | 15.00 | 11.00 | 45.00 | 89.00 |
97edc2871fb76d5a8f59dbc8e7be6269d238a3d6 | https://www.huffingtonpost.com/entry/trump-cam... | Trump Says U.S. 'Not Going To Look Foolish As ... | 2018-01-06 20:19:10.834 | 2018-01-06 20:12:33.000 | 85751 | 156.57 | 2018-01-07T01:22:07.736Z | 23443 | 57278 | 5030 | ... | 9832747.00 | 2018-01-06T20:30:04.000Z | 215 | 1020.00 | 50.00 | 17.00 | 9.00 | 13.00 | 39.00 | 89.00 |
d5315965530fabd2333bd46a88c9b5275044df1d | https://www.buzzfeed.com/keelyflaherty/game-of... | Cancel 2018, "Game Of Thrones" Isn't Coming Ba... | 2018-01-04 19:34:16.207 | 2018-01-04 19:23:19.000 | 76651 | 1321.58 | 2018-01-04T22:04:13.144Z | 36484 | 36430 | 3737 | ... | 2827447.00 | 2018-01-05T01:11:58.000Z | 147 | nan | 50.00 | 19.00 | 9.00 | 11.00 | 39.00 | 89.00 |
1ba515b11e5f0d45896160491c7c94bf45bbf148 | https://www.buzzfeed.com/paulmcleod/shithole | Trump Complained That People From "Shithole Co... | 2018-01-11 22:19:39.664 | 2018-01-11 22:18:05.000 | 31425 | 121.99 | 2018-01-11T22:52:08.006Z | 10213 | 19719 | 1493 | ... | 2832149.00 | 2018-01-11T22:21:58.000Z | 147 | 241.00 | 44.00 | 19.00 | 15.00 | 11.00 | 45.00 | 89.00 |
dd992c75fb96d54aca0ba862cd4ea159790a3f10 | http://www.cnn.com/2018/01/07/politics/donald-... | Bannon: 'I regret' delay in responding to book | 2018-01-07 17:22:14.673 | 2018-01-07 17:16:19.000 | 41322 | 78.23 | 2018-01-08T12:56:06.616Z | 8158 | 29504 | 3660 | ... | 29400200.00 | 2018-01-07T17:56:30.000Z | 105 | 311.00 | 46.00 | 20.00 | 8.00 | 15.00 | 43.00 | 89.00 |
756c9e900e156eae2902ec087cf8135146181656 | http://www.bbc.co.uk/news/world-us-canada-4265... | Trump 'in Oval Office foul-mouthed outburst ab... | 2018-01-11 23:01:17.664 | 2018-01-11 22:59:29.000 | 51483 | 162.92 | 2018-01-12T02:33:16.046Z | 17113 | 30336 | 4034 | ... | 45448225.00 | 2018-01-12T02:14:19.000Z | 96 | 72.00 | 47.00 | 20.00 | 7.00 | 15.00 | 42.00 | 89.00 |
ce03d6fd535f50bc38a7fbef158ebe83d2af34bb | https://www.huffingtonpost.com/entry/aziz-ansa... | On Aziz Ansari And Sex That Feels Violating Ev... | 2018-01-16 19:49:10.416 | 2018-01-16 19:37:55.000 | 76796 | 48.56 | 2018-01-16T23:28:11.380Z | 23482 | 45683 | 7631 | ... | 9839670.00 | 2018-01-16T21:30:09.000Z | 215 | 2126.00 | 50.00 | 17.00 | 9.00 | 13.00 | 39.00 | 89.00 |
8fe599949b04acba25005bf768607532b04174e7 | https://www.cnn.com/2018/01/28/entertainment/h... | Cher, Snoop, Hillary Clinton audition for 'Fir... | 2018-01-29 03:37:15.642 | 2018-01-29 03:32:10.000 | 57567 | 320.40 | 2018-01-29T04:49:08.716Z | 11307 | 42601 | 3659 | ... | 29555404.00 | 2018-01-29T04:01:20.000Z | 105 | 207.00 | 48.00 | 20.00 | 5.00 | 15.00 | 40.00 | 88.00 |
d62d6fd2c668fc7327433dab7074e7fd5abf30cf | http://www.cnn.com/2018/01/16/us/california-tu... | California parents accused of torture after th... | 2018-01-16 11:40:30.702 | 2018-01-16 11:34:47.000 | 32614 | 55.16 | 2018-01-16T13:04:11.080Z | 10836 | 18654 | 3124 | ... | 29463455.00 | 2018-01-16T12:30:14.000Z | 105 | 759.00 | 44.00 | 20.00 | 9.00 | 15.00 | 44.00 | 88.00 |
eba48f562cee9726b2fc6cf761bfc8f913013849 | https://www.buzzfeed.com/juliareinstein/judge-... | People Are Praising The Judge Who Sentenced La... | 2018-01-24 19:34:14.937 | 2018-01-24 19:05:41.000 | 36384 | 80.06 | 2018-01-25T14:11:11.473Z | 1611 | 33116 | 1657 | ... | 2838830.00 | 2018-01-24T20:13:48.000Z | 147 | 60.00 | 45.00 | 19.00 | 13.00 | 11.00 | 43.00 | 88.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12d1078d1a64054ab19d4e6c9a20efcade7635ef | https://www.washingtonpost.com/sports/colleges... | Towns 30, Harvard tops Brown 86-77, unbeaten i... | 2018-01-28 01:58:18.029 | 2018-01-28 01:45:11.000 | 0 | 0.00 | 2018-01-28T10:16:06.337Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 179.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2d3c497bc72010912866aafeafe3669385e2d90a | https://www.washingtonpost.com/sports/colleges... | Clark, Evans lead No. 9 Cincinnati past Memphi... | 2018-01-28 01:58:17.411 | 2018-01-28 01:44:23.000 | 0 | 0.00 | 2018-01-28T08:14:06.365Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 354.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
0ccfadcf8948ecfad361f40ef2a44f21934d00b6 | http://www.dailymail.co.uk/sport/football/arti... | Chievo 0-2 Juventus: Khedira and Higuain punis... | 2018-01-28 01:46:22.698 | 2018-01-28 01:43:54.000 | 1 | 0.09 | 2018-01-28T01:58:06.441Z | 0 | 0 | 1 | ... | nan | NaN | 158 | 397.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c8ad968acb7a237b7006fc9e9d9df82260a0381e | https://www.washingtonpost.com/national/today-... | Today in History | 2018-01-08 05:19:21.965 | 2018-01-08 05:04:28.000 | 0 | 0.00 | 2018-01-08T05:31:06.306Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 805.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
995650c539d34d66087dcca1ea8028ee49dec9ed | https://www.mirror.co.uk/news/world-news/briti... | British man, 21, "critically injured" after fa... | 2018-01-16 06:59:32.175 | 2018-01-14 22:34:15.000 | 0 | 0.00 | 2018-01-16T07:11:07.434Z | 0 | 0 | 0 | ... | nan | NaN | 706 | 149.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
aab3bd1d10aeffcbe9d5698a3aef7294d8752dda | https://www.washingtonpost.com/sports/colleges... | Barham scores 24, Florida A&M beats Hampton 75-71 | 2018-01-28 02:16:22.229 | 2018-01-28 02:05:13.000 | 0 | 0.00 | 2018-01-28T08:33:16.510Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 194.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
da946e19481606d8888cf11718085c0d607d4206 | https://www.washingtonpost.com/sports/colleges... | Walters leads Middle Tennessee past UTEP in 81... | 2018-01-28 02:37:12.663 | 2018-01-28 02:27:09.000 | 0 | 0.00 | 2018-01-28T08:52:07.587Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 172.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
555fedb56d18686a19dcc56cca6d1cbf22388e65 | https://www.washingtonpost.com/lifestyle/style... | Hints From Heloise: The thief in the lunchroom | 2018-01-08 05:07:13.477 | 2018-01-08 05:00:00.000 | 0 | 0.00 | 2018-01-08T06:19:06.445Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 537.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3ac84db6e700bc67b50546171c9041257be08e51 | https://www.washingtonpost.com/lifestyle/style... | Ask Amy: Abuse survivor can’t handle family qu... | 2018-01-08 05:07:13.567 | 2018-01-08 05:00:00.000 | 0 | 0.00 | 2018-01-08T06:19:06.448Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 778.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
32d66b7075a085a130bbb206bb9a2ce95ed2e86f | https://www.washingtonpost.com/lifestyle/style... | Miss Manners: Parsing the feminine honorifics | 2018-01-08 05:07:15.628 | 2018-01-08 05:00:00.000 | 1 | 0.02 | 2018-01-08T07:19:10.971Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 533.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
10c192fcc6f47a813527cadf281785cd55a0434f | https://www.washingtonpost.com/business/why-af... | Why Africa’s Top Oil Producer Is Low on Gasoli... | 2018-01-08 12:31:22.321 | 2018-01-08 05:00:04.000 | 1 | 0.02 | 2018-01-08T13:44:05.462Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 694.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
7b71ec5d35fd22ef408e862e224059312c9a32fa | https://www.mirror.co.uk/tv/tv-news/celebrity-... | Celebrity Big Brother fans complain as Shane J... | 2018-01-16 06:58:54.111 | 2018-01-14 22:37:16.000 | 0 | 0.00 | 2018-01-16T07:10:13.467Z | 0 | 0 | 0 | ... | nan | NaN | 706 | 259.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
27c068a05917d13ab5cd1d6180c142b710d2e3c9 | http://www.dailymail.co.uk/sport/esports/artic... | FIFA 18 new patch fixes kick off glitch | 2018-01-23 15:19:30.946 | 2018-01-23 15:17:24.000 | 0 | 0.00 | 2018-01-24T01:39:22.348Z | 0 | 0 | 0 | ... | nan | NaN | 158 | 309.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
4b29cf2d2bca67071dd4a5dbb76612aa23332a8b | http://www.dailymail.co.uk/sport/football/arti... | Newport striker Padraig Amond - One to Eleven | 2018-01-14 22:37:17.965 | 2018-01-14 22:35:28.000 | 1 | 0.02 | 2018-01-14T23:49:15.377Z | 0 | 0 | 1 | ... | nan | NaN | 158 | 466.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c4d5ad119e86b0ed065496eb5e111bab03276d95 | https://www.washingtonpost.com/sports/colleges... | William & Mary hangs on to beat UNC Wilmington... | 2018-01-28 02:37:10.362 | 2018-01-28 02:29:28.000 | 0 | 0.00 | 2018-01-28T08:52:07.535Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 169.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c4abfd5f6cc6a25a9f43a52ddf611f044af0d69f | https://www.washingtonpost.com/sports/wizards/... | Wall has recurrence of knee pain, held out aga... | 2018-01-28 02:37:10.425 | 2018-01-28 02:28:30.000 | 0 | 0.00 | 2018-01-28T08:52:07.555Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 238.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25b1599e5e93f4373ef3acbebb3b0e58cdbd514b | https://www.washingtonpost.com/sports/colleges... | Brodeur helps Penn hold off Saint Joseph’s 67-... | 2018-01-28 02:37:11.811 | 2018-01-28 02:28:12.000 | 0 | 0.00 | 2018-01-28T08:52:07.572Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 174.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
e8232dbc2cf75aa01ae8cf315a6acda16eda177c | http://www.dailymail.co.uk/tvshowbiz/article-5... | Jennifer Hawkins cuddles up to husband Jake Wall | 2018-01-28 02:28:17.464 | 2018-01-28 02:25:12.000 | 0 | 0.00 | 2018-01-28T08:44:04.294Z | 0 | 0 | 0 | ... | nan | NaN | 158 | 558.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f5e709aa5794d90ea0e4d01b866abc441e68d1be | https://www.washingtonpost.com/sports/colleges... | Old Dominion beats Charlotte 88-66, wins 4th s... | 2018-01-28 02:16:19.338 | 2018-01-28 02:06:10.000 | 0 | 0.00 | 2018-01-28T08:33:16.494Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 176.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b7a120b7f2681899aa1d43d9f3f2a140a4ba47e9 | http://www.dailymail.co.uk/sport/football/arti... | Daniel Farke eyes free shot at glory against C... | 2018-01-14 22:37:21.897 | 2018-01-14 22:35:11.000 | 1 | 0.02 | 2018-01-14T23:49:15.398Z | 0 | 0 | 1 | ... | nan | NaN | 158 | 638.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
51014e4fb3e49c23b3b3a12496675e00eba9dfae | https://www.washingtonpost.com/sports/colleges... | Casimir has 20 points, Iona turns back Manhatt... | 2018-01-28 02:31:06.992 | 2018-01-28 02:20:22.000 | 0 | 0.00 | 2018-01-28T08:46:05.113Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 179.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
ffbfcdd8dc5c84acd9c50fcdd189acdb65a0fcaa | https://www.washingtonpost.com/sports/colleges... | FGCU closes with 8-0 run to beat Jacksonville ... | 2018-01-28 02:31:07.067 | 2018-01-28 02:20:15.000 | 0 | 0.00 | 2018-01-28T08:46:05.132Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 163.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
16d0a3ec7ed8c13d5aa82566cb9b817b61d04532 | https://www.washingtonpost.com/national/rapper... | Rapper Nelly, fan file competing versions of s... | 2018-01-28 02:28:14.924 | 2018-01-28 02:18:22.000 | 0 | 0.00 | 2018-01-28T08:44:04.276Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 603.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
327faa8adec736a7debe104db7d512c3233e884f | https://www.washingtonpost.com/sports/colleges... | Anderson delivers late, Navy tops Lehigh 77-75 | 2018-01-28 02:25:04.583 | 2018-01-28 02:17:15.000 | 0 | 0.00 | 2018-01-28T08:40:03.187Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 216.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2ef956f187a178dcd6b1817a1e56010f8628a057 | https://www.ft.com/content/abdeb618-f180-11e7-... | We are already suffering the damaging effects ... | 2018-01-08 05:04:28.615 | 2018-01-08 05:00:55.000 | 1 | 0.02 | 2018-01-08T06:17:04.843Z | 0 | 0 | 1 | ... | nan | NaN | 1596 | nan | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f4c3da59ef44df98bec31fc15abdb3973b169ca7 | http://www.washingtonpost.com/video/world/aval... | Avalanche engulfs skiers after Japan volcano e... | 2018-01-23 15:31:16.693 | 2018-01-23 15:18:51.000 | 0 | 0.00 | 2018-01-24T02:52:12.755Z | 0 | 0 | 0 | ... | nan | NaN | 191 | nan | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
9c92d18449aff15dfdc7c655d06f5148161428b9 | https://www.washingtonpost.com/sports/colleges... | Louisiana Tech cruises in 89-66 win over South... | 2018-01-28 02:16:15.187 | 2018-01-28 02:10:13.000 | 0 | 0.00 | 2018-01-28T08:33:16.490Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 153.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
aca036363a230a24ee00a0e64c57e963b841991e | https://www.huffingtonpost.com/entry/how-to-cl... | How to clean a child’s bedroom? | 2018-01-14 22:46:20.704 | 2018-01-14 22:35:00.803 | 0 | 0.00 | 2018-01-14T23:59:09.146Z | 0 | 0 | 0 | ... | nan | NaN | 215 | 843.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
4f3f840a6b6dcb799c688c2fc1b851d1b12c2b4c | https://www.washingtonpost.com/sports/colleges... | Whitley lifts Norfolk State over Bethune-Cookm... | 2018-01-28 02:16:15.146 | 2018-01-28 02:06:16.000 | 0 | 0.00 | 2018-01-28T08:33:16.477Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 183.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a20d1203cc1ce26ace6dff72dc271ea8346cd27b | https://www.washingtonpost.com/business/weeken... | Weekend derailment is latest black eye for DC ... | 2018-01-16 22:16:18.353 | 2018-01-16 22:09:27.000 | 0 | 0.00 | 2018-01-17T00:29:15.035Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 859.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
153117 rows × 26 columns
data["score_diff"] = data.promotion_score - data.response_score
# promoted but low response
data.sort_values("score_diff", ascending=False).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
068b7cac1ba2fb9fbbdec4f11a72939a7f50c246 | https://www.buzzfeed.com/nicolenguyen/how-to-s... | How To Save Money, According To People Who Are... | 2018-01-24 23:22:11.826 | 2018-01-11 20:23:03 | 0 | 0.00 | 2018-01-24T23:33:10.999Z | 0 | 0 | 0 | ... | 2018-01-25T03:44:32.000Z | 147 | 1564.00 | 0.00 | 19.00 | 13.00 | 11.00 | 43.00 | 43.00 | 43.00 |
20bcf4e6d32506b0ae7be2a902e442bcbaf7b560 | https://www.buzzfeed.com/stephaniemlee/food-in... | Here’s How The Food Industry Justifies Adverti... | 2018-01-06 15:44:13.488 | 2018-01-05 01:45:54 | 4 | 0.10 | 2018-01-06T15:55:15.366Z | 0 | 0 | 4 | ... | 2018-01-06T18:14:55.000Z | 147 | 1009.00 | 1.00 | 19.00 | 14.00 | 11.00 | 44.00 | 45.00 | 43.00 |
2d6b7abf9b304cea66b7a876e276408374574957 | https://www.buzzfeed.com/jtes/shes-17-and-want... | She’s 17 And Wants To Be A Politician. Her Dad... | 2018-01-16 15:04:30.976 | 2018-01-09 20:55:30 | 3 | 0.27 | 2018-01-16T15:16:07.887Z | 0 | 0 | 3 | ... | 2018-01-20T17:32:05.000Z | 147 | 3933.00 | 1.00 | 19.00 | 13.00 | 11.00 | 43.00 | 44.00 | 42.00 |
f4b77490386f612eb0a3cb4a692a8dd7155c54cb | https://www.buzzfeed.com/thomasfrank/secret-mo... | Secret Money: How Trump Made Millions Selling ... | 2018-01-12 14:37:21.997 | 2018-01-10 22:27:30 | 5 | 0.50 | 2018-01-12T14:48:13.783Z | 0 | 0 | 5 | ... | 2018-01-12T15:55:07.000Z | 147 | 3863.00 | 1.00 | 19.00 | 13.00 | 11.00 | 43.00 | 44.00 | 42.00 |
968d49aee54385668d5cbb27da9406781f04d699 | https://www.buzzfeed.com/remysmidt/mila-emma-k... | Here’s What It’s Like To Have A Toddler Who Is... | 2018-01-25 13:46:20.267 | 2018-01-23 21:56:06 | 4 | 0.39 | 2018-01-25T13:57:24.612Z | 0 | 2 | 2 | ... | 2018-01-25T14:23:03.000Z | 147 | 2551.00 | 1.00 | 19.00 | 13.00 | 11.00 | 43.00 | 44.00 | 42.00 |
01bc6bc560d98aaee1d28dcbc8cd713dbd5b9869 | https://www.buzzfeed.com/holgerroonemaa/he-bui... | He Built An Empire From Angry Birds. Now He Wa... | 2018-01-17 16:01:28.336 | 2018-01-12 22:27:34 | 4 | 0.37 | 2018-01-17T16:13:13.090Z | 0 | 2 | 2 | ... | 2018-01-21T15:03:14.000Z | 147 | 1411.00 | 1.00 | 19.00 | 13.00 | 11.00 | 43.00 | 44.00 | 42.00 |
d2645225548e8c3f460c983f79d2b5a875e4d295 | https://www.buzzfeed.com/gabrielsanchez/the-ho... | 23 Pictures That Capture The Horrors Of The Ho... | 2018-01-27 16:01:15.901 | 2018-01-25 17:01:28 | 2 | 0.18 | 2018-01-27T16:13:08.100Z | 0 | 0 | 2 | ... | 2018-01-27T21:42:59.000Z | 147 | nan | 1.00 | 19.00 | 12.00 | 11.00 | 42.00 | 43.00 | 41.00 |
2e3e87dfe855f9cc6268036577e2fdf743a22c40 | https://www.buzzfeed.com/peteraldhous/trump-tw... | How Trump’s Tweets Shaped A Year In Politics | 2018-01-23 15:29:12.177 | 2018-01-23 11:50:57 | 45 | 0.15 | 2018-01-23T21:46:14.335Z | 5 | 17 | 23 | ... | 2018-01-23T19:44:39.000Z | 147 | 1017.00 | 6.00 | 19.00 | 14.00 | 11.00 | 44.00 | 50.00 | 38.00 |
6a35a172d4c5c2a9dda4b89cd53f1c1787b7246c | https://www.buzzfeed.com/paulmcleod/the-fate-o... | The Fate Of DACA Recipients May Come Down To F... | 2018-01-10 20:40:17.887 | 2018-01-10 20:37:02 | 30 | 0.11 | 2018-01-10T23:54:09.386Z | 2 | 2 | 26 | ... | 2018-01-13T19:21:54.000Z | 147 | 1046.00 | 5.00 | 19.00 | 13.00 | 11.00 | 43.00 | 48.00 | 38.00 |
a0267241a3ff57f538b29cf4dca6e2b3aa3cf08b | https://www.buzzfeed.com/johnhudson/trump-lets... | Trump Lets The Iran Deal Live — For Another Th... | 2018-01-12 19:22:25.497 | 2018-01-12 19:21:22 | 40 | 0.25 | 2018-01-12T21:35:09.636Z | 11 | 12 | 17 | ... | 2018-01-12T20:42:11.000Z | 147 | 411.00 | 6.00 | 19.00 | 13.00 | 11.00 | 43.00 | 49.00 | 37.00 |
22ebd00bcf2cd95645cb1d16347e22b5021dfb93 | https://www.buzzfeed.com/juliareinstein/i-went... | I Went To This Year's Puppy Bowl. Here's Every... | 2018-02-04 14:37:12.609 | 2018-01-31 22:00:52 | 1 | 0.10 | 2018-02-04T14:48:07.097Z | 0 | 0 | 1 | ... | 2018-02-04T22:19:02.000Z | 147 | 642.00 | 0.00 | 19.00 | 7.00 | 11.00 | 37.00 | 37.00 | 37.00 |
93ca7c4fe189518498d77c800ebeae14a093aef2 | https://www.buzzfeed.com/paulmcleod/the-house-... | The House Just Voted To Keep The Government Op... | 2018-01-19 00:40:22.163 | 2018-01-19 00:39:03 | 62 | 0.20 | 2018-01-19T02:54:07.176Z | 11 | 25 | 26 | ... | 2018-01-20T05:08:03.000Z | 147 | 574.00 | 8.00 | 19.00 | 15.00 | 11.00 | 45.00 | 53.00 | 37.00 |
652d2f4057c2019354903a162b55b5ae1e1b4203 | https://www.buzzfeed.com/melissasegura/will-ch... | Will Chicago Prosecutors Let Guevara’s Defenda... | 2018-01-22 14:04:24.260 | 2018-01-19 22:38:05 | 3 | 0.28 | 2018-01-22T14:16:06.968Z | 0 | 0 | 3 | ... | 2018-01-22T14:22:51.000Z | 147 | 5466.00 | 1.00 | 19.00 | 7.00 | 11.00 | 37.00 | 38.00 | 36.00 |
8eb8e48cfacb1432585819e4836b6ea7134ef312 | https://www.buzzfeed.com/zahrahirji/superfund-... | A Government Watchdog Is Investigating Whether... | 2018-01-05 15:29:12.897 | 2018-01-05 15:10:35 | 46 | 0.17 | 2018-01-05T19:43:10.925Z | 5 | 26 | 15 | ... | 2018-01-05T18:44:36.000Z | 147 | 355.00 | 7.00 | 19.00 | 13.00 | 11.00 | 43.00 | 50.00 | 36.00 |
98726f5b62c51f14804408a84d7540072418ceff | https://www.buzzfeed.com/leticiamiranda/two-ti... | Two Tinder Security Flaws Mean Strangers Can S... | 2018-01-23 23:13:26.707 | 2018-01-23 19:18:23 | 32 | 0.32 | 2018-01-24T03:28:21.263Z | 10 | 15 | 7 | ... | 2018-01-24T02:32:09.000Z | 147 | 516.00 | 5.00 | 19.00 | 11.00 | 11.00 | 41.00 | 46.00 | 36.00 |
c247bc552abec2dc3b5843661d0eecc380311ba0 | https://www.theguardian.com/us-news/2018/jan/1... | 'Unkind, divisive, elitist': international out... | 2018-01-12 04:01:12.095 | 2018-01-12 03:58:54 | 5 | 37.60 | 2018-01-12T14:57:03.228Z | 1 | 1 | 3 | ... | 2018-01-12T14:40:00.000Z | 142 | 688.00 | 1.00 | 19.00 | 4.00 | 13.00 | 36.00 | 37.00 | 35.00 |
fdf1427a0e1cde9a0c5c156df566ae6eaed9d658 | https://www.buzzfeed.com/lissandravilla/house-... | The House Just Released A Bill To Overhaul The... | 2018-01-18 18:37:19.473 | 2018-01-18 17:05:56 | 57 | 0.49 | 2018-01-18T18:48:22.993Z | 2 | 13 | 42 | ... | 2018-01-18T19:02:41.000Z | 147 | 552.00 | 7.00 | 19.00 | 12.00 | 11.00 | 42.00 | 49.00 | 35.00 |
6d936b4b490c7dd59d30e8576320d2f00e811ff7 | https://www.buzzfeed.com/emmaloop/house-invest... | House Investigators Vow To Get Answers Out Of ... | 2018-01-17 22:01:21.002 | 2018-01-17 21:58:15 | 95 | 0.61 | 2018-01-18T01:15:10.445Z | 13 | 68 | 14 | ... | 2018-01-18T00:07:52.000Z | 147 | 806.00 | 10.00 | 19.00 | 14.00 | 11.00 | 44.00 | 54.00 | 34.00 |
f5f921383c99bf3d1428e6a181d2101c494f4ebc | https://www.buzzfeed.com/krystieyandoli/bryan-... | Bryan Singer Leaves FX's "Legion" As He Faces ... | 2018-01-05 22:54:12.914 | 2018-01-05 22:29:15 | 108 | 0.30 | 2018-01-06T05:08:12.670Z | 27 | 53 | 28 | ... | 2018-01-06T03:42:34.000Z | 147 | 272.00 | 10.00 | 19.00 | 13.00 | 11.00 | 43.00 | 53.00 | 33.00 |
7ed734430d65005ef540ff22b35de52f58880d3d | https://www.buzzfeed.com/susancheng/talent-age... | Will Time’s Up Help Talent Agencies Rebuild Tr... | 2018-01-06 15:13:10.239 | 2018-01-06 15:10:15 | 96 | 0.28 | 2018-01-06T21:27:16.662Z | 6 | 60 | 30 | ... | 2018-01-06T20:14:11.000Z | 147 | 1102.00 | 10.00 | 19.00 | 13.00 | 11.00 | 43.00 | 53.00 | 33.00 |
cda1e43d83e0efe0dd6870459b1e0ca1f36a681d | https://www.buzzfeed.com/venessawong/how-much-... | Here's What A $75K Salary Gets You In Six Diff... | 2018-01-25 17:01:17.219 | 2018-01-24 21:11:36 | 103 | 0.69 | 2018-01-25T18:13:17.201Z | 13 | 53 | 37 | ... | 2018-01-25T17:26:09.000Z | 147 | 4270.00 | 10.00 | 19.00 | 13.00 | 11.00 | 43.00 | 53.00 | 33.00 |
7de66844bf72e95af45dc613b3e59083656e77cb | https://www.buzzfeed.com/rosebuchanan/donald-t... | Donald Trump Has Tweeted About Iranians "Final... | 2018-01-02 14:25:19.415 | 2018-01-02 13:16:17 | 129 | 0.30 | 2018-01-02T16:38:08.016Z | 33 | 55 | 41 | ... | 2018-01-02T16:05:10.000Z | 147 | 644.00 | 11.00 | 19.00 | 14.00 | 11.00 | 44.00 | 55.00 | 33.00 |
f3eab5a109a2a8f3a7d601a58aaab2395e55090d | https://www.buzzfeed.com/buzzfeednews/trump-so... | Trump’s First State Of The Union Address Is To... | 2018-01-30 12:01:18.015 | 2018-01-30 11:45:40 | 144 | 0.48 | 2018-01-30T17:16:21.822Z | 74 | 49 | 21 | ... | 2018-01-30T16:28:33.000Z | 147 | 786.00 | 12.00 | 19.00 | 15.00 | 11.00 | 45.00 | 57.00 | 33.00 |
776e0e6559937a4fb0f3db20935adf7bfed033b7 | https://www.buzzfeed.com/tariniparti/the-stars... | The Stars Of The Trump Show, Season 2 | 2018-01-31 15:04:26.400 | 2018-01-31 01:44:52 | 96 | 0.46 | 2018-01-31T19:19:15.015Z | 9 | 41 | 46 | ... | 2018-01-31T18:50:59.000Z | 147 | 1508.00 | 10.00 | 19.00 | 12.00 | 11.00 | 42.00 | 52.00 | 32.00 |
39927be4665f707b073fecee5a28dd887a52772d | https://www.buzzfeed.com/paulmcleod/senate-gro... | Senate Group Reaches Tentative Deal To Protect... | 2018-01-12 00:49:18.168 | 2018-01-12 00:48:10 | 170 | 0.67 | 2018-01-12T07:04:13.216Z | 68 | 83 | 19 | ... | 2018-01-12T06:22:48.000Z | 147 | 643.00 | 13.00 | 19.00 | 15.00 | 11.00 | 45.00 | 58.00 | 32.00 |
25 rows × 27 columns
# high response but not promoted
data.sort_values("score_diff", ascending=True).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
803b607134c251c7f072a2971a268e9d7df235f1 | https://www.huffingtonpost.com/entry/detroit-h... | Black Beekeepers Are Transforming Detroit’s Va... | 2018-01-30 22:04:27.140 | 2018-01-30 21:55:24.297 | 99088 | 191.45 | 2018-01-31T22:05:12.382Z | 5852 | 81139 | 12097 | ... | NaN | 215 | 583.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
a393b7e68704544e101afc19f12d7fccf1e4ab29 | https://www.nytimes.com/2018/01/06/us/politics... | Trump Defends His Mental Capacity, Calling Him... | 2018-01-06 13:07:03.430 | 2018-01-06 13:05:21.000 | 328205 | 1162.29 | 2018-01-07T04:03:08.078Z | 85369 | 219767 | 23069 | ... | NaN | 120 | 176.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
771937ba1ca44a5bab7f2bf908ed2a617a461b5f | http://www.cnn.com/2015/08/21/europe/france-tr... | 2 U.S. service members overpower attacker on t... | 2018-01-08 22:04:37.055 | 2018-01-08 22:01:45.000 | 286779 | 0.03 | 2018-01-08T23:17:08.269Z | 16743 | 251594 | 18442 | ... | NaN | 105 | 746.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
bdcb4edf76aa052524088d63507d58de12c6f08c | http://www.foxnews.com/entertainment/2018/01/2... | Joy Villa turns heads with pro-life outfit at ... | 2018-01-28 21:24:14.205 | 2018-01-28 21:15:38.000 | 145813 | 341.15 | 2018-01-29T23:27:11.734Z | 10493 | 127537 | 7783 | ... | NaN | 285 | 361.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
fdeb3ce25505712faa29accfaa210a61d7dc208a | https://www.rt.com/on-air/415184-orthodox-chri... | Orthodox Christmas service in Moscow | 2018-01-06 20:09:34.285 | 2018-01-06 20:09:34.285 | 180211 | 2985.36 | 2018-01-07T00:22:14.681Z | 24841 | 11360 | 144010 | ... | NaN | 365 | nan | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
47d6d90f0ccf562ad2636799d240cea22854b5d8 | http://www.breitbart.com/big-government/2018/0... | Pentagon: Troops Will Not Be Paid if Governmen... | 2018-01-17 18:09:33.475 | 2018-01-17 07:43:57.000 | 82550 | 50.97 | 2018-01-20T15:48:07.846Z | 38299 | 33144 | 11107 | ... | NaN | 994 | 896.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
0f7d6665afef43e9f21cf944435ae79bacce5428 | http://www.dailymail.co.uk/news/article-533244... | Police chief demands end of soft treatment for... | 2018-01-30 23:58:23.681 | 2018-01-30 23:55:30.000 | 126418 | 1322.37 | 2018-01-31T12:05:10.269Z | 51315 | 42165 | 32938 | ... | NaN | 158 | 1235.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
4ec1a6cfd6e184ef18478392685313d0c52fe46e | http://www.foxnews.com/entertainment/2018/01/1... | ‘OUT!’ Trump orders CNN star Jim Acosta to lea... | 2018-01-16 20:44:11.564 | 2018-01-16 20:41:23.000 | 81022 | 105.32 | 2018-01-17T02:56:06.018Z | 18036 | 55513 | 7473 | ... | NaN | 285 | 502.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
10ea42828a1a94dcfeecd8238529bd4c43370c98 | https://www.cnn.com/2018/01/24/us/rachael-denh... | Read Rachael Denhollander's full victim impact... | 2018-01-24 19:10:15.381 | 2018-01-24 19:05:22.000 | 89632 | 135.76 | 2018-01-26T15:09:07.110Z | 11543 | 70530 | 7559 | ... | NaN | 105 | 5852.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
85e73241452d2301d06e92c0a9dada5937a0dcf2 | http://www.cnn.com/2015/08/22/europe/france-tr... | The men who averted a massacre aboard a French... | 2018-01-08 22:04:38.355 | 2018-01-08 22:00:15.000 | 70815 | 0.02 | 2018-02-05T23:35:15.141Z | 5609 | 56048 | 9158 | ... | NaN | 105 | 1104.00 | 49.00 | 0.00 | 0.00 | 0.00 | 0.00 | 49.00 | -49.00 |
9d5b90f00e28072211f83e233767d503b694c7e7 | http://www.cnn.com/2018/01/15/europe/garbage-c... | Garbage collectors open library with abandoned... | 2018-01-16 01:52:17.539 | 2018-01-16 01:47:26.000 | 67487 | 40.98 | 2018-01-16T20:29:19.466Z | 2594 | 58094 | 6799 | ... | NaN | 105 | 376.00 | 49.00 | 0.00 | 0.00 | 0.00 | 0.00 | 49.00 | -49.00 |
8396eb9804a17111232bd6065cf74e0106bf0f16 | http://www.foxnews.com/politics/2018/01/25/joh... | John Kerry reportedly coaches Palestinians not... | 2018-01-25 14:54:17.351 | 2018-01-25 14:41:19.000 | 71119 | 148.48 | 2018-01-25T16:53:23.350Z | 26574 | 35451 | 9094 | ... | NaN | 285 | 576.00 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
ebd68d73a5584df555ea06cc93d6c75c7b7e20d6 | http://www.foxnews.com/entertainment/2018/01/0... | CNN revels in pot smoke during New Year's Eve ... | 2018-01-01 17:39:13.048 | 2018-01-01 17:30:33.000 | 145395 | 207.93 | 2018-01-03T14:01:08.231Z | 49007 | 83848 | 12540 | ... | NaN | 285 | 378.00 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
38c1f7857d079e8c85349de720fc405e884bfbc6 | http://www.independent.co.uk/arts-entertainmen... | Millennials watching Friends for first time on... | 2018-01-11 21:34:22.306 | 2018-01-11 21:20:47.000 | 66764 | 30.12 | 2018-01-12T00:48:06.381Z | 28096 | 34561 | 4107 | ... | NaN | 386 | 298.00 | 49.00 | 0.00 | 1.00 | 0.00 | 1.00 | 50.00 | -48.00 |
777440766ccbb1859bcda2428f28c48706a1655b | http://www.foxnews.com/opinion/2018/01/22/nfl-... | NFL rejects veterans group's ad urging people ... | 2018-01-23 01:49:20.399 | 2018-01-23 01:46:23.000 | 456321 | 521.90 | 2018-01-24T13:31:03.385Z | 120206 | 272865 | 63250 | ... | NaN | 285 | 227.00 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
becc40665f8af441b01017ff2ed59146473afe54 | http://www.cnn.com/2015/08/22/europe/france-tr... | French train suspect carried two guns, lots of... | 2018-01-08 22:04:29.818 | 2018-01-08 22:00:00.000 | 58310 | 0.00 | 2018-01-10T22:32:10.633Z | 5850 | 44025 | 8435 | ... | NaN | 105 | 1018.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
2547709a1be4a83d35a3def8acd2f5f468973e76 | https://www.nytimes.com/2018/01/18/opinion/sod... | The Case for the Health Taxes | 2018-01-18 13:19:08.621 | 2018-01-18 13:17:01.000 | 451200 | 2088.07 | 2018-01-19T16:03:03.902Z | 360 | 450591 | 249 | ... | NaN | 120 | 679.00 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
80f993f862ddfd00733b670a0ee7fad69a8c937a | http://www.mirror.co.uk/tv/tv-news/boy-no-brai... | Boy with 'no brain' stuns doctors as he learns... | 2018-01-09 17:10:21.222 | 2018-01-09 16:54:00.000 | 52371 | 0.11 | 2018-01-09T22:25:11.095Z | 4150 | 42875 | 5346 | ... | NaN | 706 | 545.00 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
df2d396004ec22906597fac848babc4718e73773 | https://www.washingtonpost.com/news/get-there/... | Stop charging me to attend your celebrations —... | 2018-01-30 22:22:21.683 | 2018-01-30 22:12:06.000 | 76260 | 47.84 | 2018-02-01T18:23:15.448Z | 35866 | 34514 | 5880 | ... | NaN | 191 | 824.00 | 50.00 | 0.00 | 3.00 | 0.00 | 3.00 | 53.00 | -47.00 |
e5fc4033e043254b3b0d5f1cc21dc39fc060cc7a | http://www.foxnews.com/entertainment/2018/01/3... | Liberal author Jonathan Tasini celebrates fata... | 2018-01-31 19:39:13.991 | 2018-01-31 19:26:40.000 | 61580 | 145.70 | 2018-01-31T22:22:18.214Z | 22205 | 29730 | 9645 | ... | NaN | 285 | 518.00 | 48.00 | 0.00 | 1.00 | 0.00 | 1.00 | 49.00 | -47.00 |
ddb6d76257dbfdec72135dfa062d8837382c7341 | https://www.washingtonpost.com/news/post-polit... | Names of campaign donors to be flashed during ... | 2018-01-29 23:58:11.450 | 2018-01-29 23:52:53.000 | 126484 | 161.64 | 2018-01-30T10:02:16.802Z | 55502 | 57529 | 13453 | ... | NaN | 191 | 212.00 | 50.00 | 0.00 | 3.00 | 0.00 | 3.00 | 53.00 | -47.00 |
a8491b41e427001db8c871b55d7245c9cdd297e2 | https://www.buzzfeed.com/kristinharris/tweets-... | 22 Tweets For Everyone Still Crying After Last... | 2018-01-24 19:34:12.007 | 2018-01-24 19:26:41.000 | 94446 | 119.11 | 2018-01-25T02:58:18.891Z | 36452 | 45849 | 12145 | ... | NaN | 147 | nan | 50.00 | 0.00 | 3.00 | 0.00 | 3.00 | 53.00 | -47.00 |
aeb247d60a50c77e54b0e0ca064d46dfcc6214d3 | https://www.washingtonpost.com/opinions/no-mor... | No more excuses. Puerto Rico needs help. | 2018-01-06 00:43:23.684 | 2018-01-06 00:32:49.000 | 45808 | 43.04 | 2018-01-07T07:09:12.325Z | 3823 | 31304 | 10681 | ... | NaN | 191 | 551.00 | 46.00 | 0.00 | 0.00 | 0.00 | 0.00 | 46.00 | -46.00 |
88f0ec3f4f6cfdb08ecd766c8f4a1c636dd41919 | http://www.cnn.com/travel/article/switzerland-... | Switzerland bans cruelty towards lobster | 2018-01-12 15:16:34.298 | 2018-01-12 15:09:50.000 | 51052 | 32.85 | 2018-01-13T03:32:17.625Z | 11257 | 34322 | 5473 | ... | NaN | 105 | nan | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
5d5665bbfa2bf806355d04cb7a8b148981f99416 | https://www.washingtonpost.com/news/politics/w... | Trump: ‘Steve Bannon has nothing to do with me... | 2018-01-03 18:43:12.552 | 2018-01-03 18:34:18.000 | 51178 | 337.79 | 2018-01-04T18:04:15.584Z | 16207 | 28313 | 6658 | ... | NaN | 191 | 127.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
25 rows × 27 columns
Write that data to a file. Note that the scores here are provisional for two reasons:
data.to_csv("articles_with_provisional_scores_" + date_filename + ".csv")
The attention index of an article is comprised of four components:
Or, in other words:
\begin{align} attentionIndex_a &= leadScore_a + frontScore_a + facebookPromotionScore_a + responseScore_a \\ leadScore_a &= 20 \cdot \left(\frac{\min(minsAsLead_a, 60)}{alexaRank_a}\right) \cdot \left( \frac{\min(alexaRank)}{60} \right) \\ frontScore_a &= 15 \cdot \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \cdot \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right) \\ facebookPromotion_a &= \begin{cases} 0 \text{ if not shared on brand page }\\ 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \text{ otherwise } \end{cases} \\ responseScore_a &= \begin{cases} 0 \text{ if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} \text{ if } engagements_a > 0 \end{cases} \\ \end{align}