%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = pd.read_csv("articles_2017-11-01_2017-11-30.csv", index_col="id", \
parse_dates=["published", "discovered"])
data.head()
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
4b92d2afc3eb0becb58ee0f8a866e8bb3e0c6c3c | https://www.nytimes.com/2017/11/01/travel/new-... | How I Rolled on the Crescent: New York to New ... | 2017-11-01 10:39:10.125 | 2017-11-01 | 198 | 0.590325 | 2017-11-01T23:02:04.537Z | 40 | 103 | 55 | New York Times | nytimes_com | 0 | 2613 | 118.0 | False | NaN | NaN | 120 | NaN |
eedd25b64e2cf37cf2198783311a1a823e26965a | https://www.nytimes.com/2017/11/01/world/austr... | Australia Bans Climbing on Uluru, a Popular Si... | 2017-11-01 10:54:17.225 | 2017-11-01 | 592 | 1.197048 | 2017-11-01T13:07:04.833Z | 98 | 302 | 192 | New York Times | nytimes_com | 0 | 6475 | 120.0 | True | 14837915.0 | 2017-11-03T06:25:01.000Z | 120 | NaN |
68cf97046a5705f8c6826ab1fc3ce6e98e845a0d | https://www.thetimes.co.uk/article/in-pictures... | In pictures: terror in New York | 2017-11-01 09:59:07.929 | 2017-11-01 | 2 | 0.016398 | 2017-11-01T17:17:03.982Z | 0 | 1 | 1 | The Times | thetimes_co_uk | 0 | 840 | 275.0 | False | NaN | NaN | 6435 | NaN |
ff04a763b2f01746c4b4718bf2ad150468918bb0 | https://www.nytimes.com/2017/11/01/upshot/why-... | Why Advertising Is a Poor Choice to Tackle the... | 2017-11-01 10:03:09.094 | 2017-11-01 | 537 | 1.986755 | 2017-11-01T10:14:11.018Z | 112 | 291 | 134 | New York Times | nytimes_com | 0 | 5825 | 119.0 | True | 14830035.0 | 2017-11-01T10:00:34.000Z | 120 | NaN |
1bf0cb152322d6d4ec6f0910f7bbaac27976fe6e | https://www.nytimes.com/2017/11/01/podcasts/th... | Listen to ‘The Daily’: Mueller’s Strategy, and... | 2017-11-01 11:14:11.268 | 2017-11-01 | 164 | 1.393823 | 2017-11-01T12:26:12.465Z | 30 | 110 | 24 | New York Times | nytimes_com | 0 | 1375 | 121.0 | True | 14830043.0 | 2017-11-01T11:23:06.000Z | 120 | NaN |
The response score is a number between 0 and 50 that indicates the level of response to an article.
Perhaps in the future we may choose to include other factors, but for now we just include engagements on Facebook. The maximum score of 50 should be achieved by an article that does really well compared with others.
pd.options.display.float_format = '{:.2f}'.format
data.fb_engagements.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 155376.00 mean 1117.87 std 8083.39 min 0.00 50% 24.00 75% 233.00 90% 1454.00 95% 4059.25 99% 21166.75 99.5% 33982.38 99.9% 93932.13 max 814679.00 Name: fb_engagements, dtype: float64
There's no articles with more than 1 million engagements this month.
data[data.fb_engagements > 1000000]
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | publisher_name | publisher_id | mins_as_lead | mins_on_front | num_articles_on_front | fb_brand_page | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id |
data.fb_engagements.mode()
0 0 dtype: int64
Going back to the enagement counts, we see the mean is 1,117, mode is zero, median is 24, 90th percentile is 1,453, 99th percentile is 21,166, 99.5th percentile is 33,982. The standard deviation is 8,083, significantly higher than the mean, so this is not a normal distribution.
Key publishers stats
data.groupby("publisher_id").agg({'url': 'count', 'fb_engagements': ['sum', 'median', 'mean']})
url | fb_engagements | |||
---|---|---|---|---|
count | sum | median | mean | |
publisher_id | ||||
anotherangryvoice_blogspot_co_uk | 46 | 115733 | 1570.00 | 2515.93 |
bbc_co_uk | 12497 | 9327679 | 32.00 | 746.39 |
breitbart_com | 2702 | 11917358 | 224.50 | 4410.57 |
brexitcentral_com | 52 | 30476 | 181.50 | 586.08 |
buzzfeed_com | 1609 | 4482917 | 159.00 | 2786.15 |
cnn_com | 3425 | 17299480 | 550.00 | 5050.94 |
dailymail_co_uk | 23842 | 17151377 | 28.00 | 719.38 |
economist_com | 581 | 190826 | 30.00 | 328.44 |
evolvepolitics_com | 61 | 153821 | 1413.00 | 2521.66 |
foxnews_com | 5323 | 15572831 | 74.00 | 2925.57 |
ft_com | 4997 | 344064 | 4.00 | 68.85 |
huffingtonpost_com | 9772 | 13145495 | 11.00 | 1345.22 |
independent_co_uk | 6567 | 8859577 | 36.00 | 1349.11 |
indy100_com | 545 | 765012 | 107.00 | 1403.69 |
lemonde_fr | 3991 | 2567387 | 76.00 | 643.29 |
libdemvoice_org | 170 | 2676 | 7.00 | 15.74 |
mirror_co_uk | 10121 | 6778308 | 46.00 | 669.73 |
nbcnews_com | 2003 | 6196290 | 478.00 | 3093.50 |
newstatesman_com | 526 | 83158 | 22.00 | 158.10 |
npr_org | 2102 | 6461653 | 160.00 | 3074.05 |
nytimes_com | 4830 | 18352173 | 214.00 | 3799.62 |
order-order_com | 267 | 95238 | 143.00 | 356.70 |
propublica_org | 57 | 68028 | 365.00 | 1193.47 |
reuters_com | 5952 | 1866563 | 23.00 | 313.60 |
rt_com | 2691 | 2228324 | 207.00 | 828.07 |
skwawkbox_org | 117 | 57044 | 284.00 | 487.56 |
telegraph_co_uk | 7571 | 2685206 | 16.00 | 354.67 |
thecanary_co | 251 | 363674 | 849.00 | 1448.90 |
theguardian_com | 8493 | 10074837 | 137.00 | 1186.25 |
thetimes_co_uk | 9195 | 292184 | 1.00 | 31.78 |
washingtonpost_com | 24278 | 14213560 | 0.00 | 585.45 |
westmonster_com | 350 | 429874 | 40.00 | 1228.21 |
yournewswire_com | 392 | 1517525 | 232.00 | 3871.24 |
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
non_zero_fb_enagagements = data.fb_engagements[data.fb_engagements > 0]
That's a bit better, but still way too clustered at the low end. Let's look at a log normal distribution.
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(12,4.5))
plt.hist(np.log(non_zero_fb_enagagements + median), bins=50)
plt.axvline(np.log(mean), linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(np.log(median), label=f'Median ({median:,.0f})', color='green')
plt.axvline(np.log(ninety), linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(np.log(ninetyfive), linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
plt.axvline(np.log(ninetynine), linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
eng = data.fb_engagements[(data.fb_engagements < 5000)]
mean = data.fb_engagements.mean()
median = data.fb_engagements.median()
ninety = data.fb_engagements.quantile(.90)
ninetyfive = data.fb_engagements.quantile(.95)
ninetynine = data.fb_engagements.quantile(.99)
plt.figure(figsize=(15,7))
plt.hist(eng, bins=50)
plt.title("Article count by engagements")
plt.axvline(median, label=f'Median ({median:,.0f})', color='green')
plt.axvline(mean, linestyle=':', label=f'Mean ({mean:,.0f})', color='green')
plt.axvline(ninety, linestyle='--', label=f'90% percentile ({ninety:,.0f})', color='red')
plt.axvline(ninetyfive, linestyle='-.', label=f'95% percentile ({ninetyfive:,.0f})', color='red')
# plt.axvline(ninetynine, linestyle=':', label=f'99% percentile ({ninetynine:,.0f})', color='red')
leg = plt.legend()
log_engagements = (non_zero_fb_enagagements
.clip_upper(data.fb_engagements.quantile(.999))
.apply(lambda x: np.log(x + median))
)
log_engagements.describe()
count 124947.00 mean 4.98 std 1.76 min 3.22 25% 3.50 50% 4.39 75% 6.03 max 11.45 Name: fb_engagements, dtype: float64
Use standard feature scaling to bring that to a 1 to 50 range
def scale_log_engagements(engagements_logged):
return np.ceil(
50 * (engagements_logged - log_engagements.min()) / (log_engagements.max() - log_engagements.min())
)
def scale_engagements(engagements):
return scale_log_engagements(np.log(engagements + median))
scaled_non_zero_engagements = scale_log_engagements(log_engagements)
scaled_non_zero_engagements.describe()
count 124947.00 mean 11.17 std 10.74 min 0.00 25% 2.00 50% 8.00 75% 18.00 max 50.00 Name: fb_engagements, dtype: float64
# add in the zeros, as zero
scaled_engagements = pd.concat([scaled_non_zero_engagements, data.fb_engagements[data.fb_engagements == 0]])
proposed = pd.DataFrame({"fb_engagements": data.fb_engagements, "response_score": scaled_engagements})
proposed.response_score.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x103174c50>
Looks good to me, lets save that.
data["response_score"] = proposed.response_score
The maximum of 50 points is awarded when the engagements are greater than the 99.9th percentile, rolling over the last month.
i.e. where $limit$ is the 99.5th percentile of engagements calculated over the previous month, the response score for article $a$ is:
\begin{align} basicScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ \log(\min(engagements_a,limit) + median(engagements)) & \text{if } engagements_a > 0 \end{cases} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{basicScore_a - \min(basicScore)}{\max(basicScore) - \min(basicScore)} & \text{if } engagements_a > 0 \end{cases} \\ \\ \text{The latter equation can be expanded to:} \\ responseScore_a & = \begin{cases} 0 & \text{if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} & \text{if } engagements_a > 0 \end{cases} \\ \end{align}The aim of the promotion score is to indicate how important the article was to the publisher, by tracking where they chose to promote it. This is a number between 0 and 50 comprised of:
The first two should be scaled by the popularity/reach of the home page, for which we use the alexa page rank as a proxy.
The last should be scaled by the popularity/reach of the brand page, for which we use the number of likes the brand page has.
data.mins_as_lead.describe([0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 155376.00 mean 9.19 std 103.71 min 0.00 50% 0.00 75% 0.00 90% 0.00 95% 0.00 99% 269.00 99.5% 574.00 99.9% 1219.00 max 22314.00 Name: mins_as_lead, dtype: float64
As expected, the vast majority of articles don't make it as lead. Let's explore how long typically publishers put something as lead for.
lead_articles = data[data.mins_as_lead > 0]
lead_articles.mins_as_lead.describe([0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.995, 0.999])
count 4321.00 mean 330.52 std 529.72 min 4.00 25% 80.00 50% 174.00 75% 412.00 90% 869.00 95% 1105.00 99% 1609.80 99.5% 1938.40 99.9% 4460.68 max 22314.00 Name: mins_as_lead, dtype: float64
lead_articles.mins_as_lead.plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x10a949860>
For lead, it's a significant thing for an article to be lead at all, so although we want to penalise articles that were lead for a very short time, mostly we want to score the maximum even if it wasn't lead for ages. So we'll give maximum points when something has been lead for an hour.
lead_articles.mins_as_lead.clip_upper(60).plot.hist(bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x10b9d4198>
We also want to scale this by the alexa page rank, such that the maximum score of 20 points is for an article that was on the front for 4 hours for the most popular site.
So lets explore the alexa nunbers.
alexa_ranks = data.groupby(by="publisher_id").alexa_rank.mean().sort_values()
alexa_ranks
publisher_id bbc_co_uk 96 cnn_com 105 nytimes_com 120 theguardian_com 142 buzzfeed_com 147 dailymail_co_uk 158 washingtonpost_com 191 huffingtonpost_com 215 foxnews_com 285 rt_com 365 telegraph_co_uk 370 independent_co_uk 386 reuters_com 497 npr_org 594 lemonde_fr 618 mirror_co_uk 706 nbcnews_com 826 breitbart_com 994 ft_com 1596 economist_com 1825 indy100_com 5014 thetimes_co_uk 6435 newstatesman_com 12769 thecanary_co 15686 propublica_org 16066 yournewswire_com 22568 order-order_com 32515 anotherangryvoice_blogspot_co_uk 77827 westmonster_com 97775 evolvepolitics_com 119412 skwawkbox_org 152475 libdemvoice_org 344992 brexitcentral_com 469149 Name: alexa_rank, dtype: int64
alexa_ranks.plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10bf6d550>
Let's try the simple option first: just divide the number of minutes as lead by the alexa rank. What's the scale of numbers we get then.
lead_proposal_1 = lead_articles.mins_as_lead.clip_upper(60) / lead_articles.alexa_rank
lead_proposal_1.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10ec0ceb8>
Looks like there's too much of a cluster around 0. Have we massively over penalised the publishers with a high alexa rank?
lead_proposal_1.groupby(data.publisher_id).mean().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10c0b0b38>
Yes. Let's try taking the log of the alexa rank and see if that looks better.
lead_proposal_2 = (lead_articles.mins_as_lead.clip_upper(60) / np.log(lead_articles.alexa_rank))
lead_proposal_2.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10c6c70f0>
lead_proposal_2.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 46.00 | 5.26 | 0.47 | 2.13 | 5.33 | 5.33 | 5.33 | 5.33 |
bbc_co_uk | 95.00 | 12.89 | 1.14 | 5.26 | 13.15 | 13.15 | 13.15 | 13.15 |
breitbart_com | 183.00 | 8.43 | 0.94 | 3.48 | 8.69 | 8.69 | 8.69 | 8.69 |
brexitcentral_com | 39.00 | 4.49 | 0.63 | 0.69 | 4.59 | 4.59 | 4.59 | 4.59 |
buzzfeed_com | 302.00 | 11.75 | 1.16 | 3.01 | 12.02 | 12.02 | 12.02 | 12.02 |
cnn_com | 201.00 | 12.27 | 1.97 | 2.15 | 12.89 | 12.89 | 12.89 | 12.89 |
dailymail_co_uk | 165.00 | 11.45 | 1.58 | 2.77 | 11.85 | 11.85 | 11.85 | 11.85 |
economist_com | 65.00 | 7.65 | 1.56 | 0.53 | 7.99 | 7.99 | 7.99 | 7.99 |
evolvepolitics_com | 24.00 | 5.13 | 0.00 | 5.13 | 5.13 | 5.13 | 5.13 | 5.13 |
foxnews_com | 106.00 | 10.47 | 1.07 | 0.88 | 10.61 | 10.61 | 10.61 | 10.61 |
ft_com | 93.00 | 7.21 | 2.11 | 0.54 | 8.14 | 8.14 | 8.14 | 8.14 |
huffingtonpost_com | 177.00 | 10.92 | 1.04 | 2.79 | 11.17 | 11.17 | 11.17 | 11.17 |
independent_co_uk | 136.00 | 9.75 | 1.33 | 1.68 | 10.07 | 10.07 | 10.07 | 10.07 |
indy100_com | 192.00 | 5.52 | 2.00 | 0.59 | 3.87 | 7.04 | 7.04 | 7.04 |
lemonde_fr | 162.00 | 8.74 | 1.73 | 0.62 | 9.34 | 9.34 | 9.34 | 9.34 |
libdemvoice_org | 146.00 | 4.58 | 0.52 | 1.49 | 4.71 | 4.71 | 4.71 | 4.71 |
mirror_co_uk | 317.00 | 8.65 | 1.37 | 2.29 | 9.15 | 9.15 | 9.15 | 9.15 |
nbcnews_com | 109.00 | 8.85 | 0.74 | 1.34 | 8.93 | 8.93 | 8.93 | 8.93 |
newstatesman_com | 64.00 | 6.04 | 1.03 | 0.42 | 6.35 | 6.35 | 6.35 | 6.35 |
npr_org | 163.00 | 8.89 | 1.68 | 0.63 | 9.39 | 9.39 | 9.39 | 9.39 |
nytimes_com | 53.00 | 12.27 | 1.60 | 1.04 | 12.53 | 12.53 | 12.53 | 12.53 |
order-order_com | 258.00 | 4.48 | 1.59 | 0.39 | 3.27 | 5.68 | 5.78 | 5.78 |
propublica_org | 24.00 | 5.80 | 1.36 | 0.52 | 6.20 | 6.20 | 6.20 | 6.20 |
reuters_com | 90.00 | 9.24 | 1.73 | 0.81 | 9.66 | 9.66 | 9.66 | 9.66 |
rt_com | 128.00 | 9.22 | 2.50 | 0.85 | 10.17 | 10.17 | 10.17 | 10.17 |
skwawkbox_org | 116.00 | 4.70 | 0.92 | 0.42 | 5.03 | 5.03 | 5.03 | 5.03 |
telegraph_co_uk | 84.00 | 9.87 | 1.38 | 0.85 | 10.15 | 10.15 | 10.15 | 10.15 |
thecanary_co | 228.00 | 4.70 | 1.70 | 0.52 | 3.52 | 5.18 | 6.21 | 6.21 |
theguardian_com | 141.00 | 11.49 | 2.01 | 2.02 | 12.11 | 12.11 | 12.11 | 12.11 |
thetimes_co_uk | 73.00 | 6.82 | 0.21 | 5.02 | 6.84 | 6.84 | 6.84 | 6.84 |
washingtonpost_com | 90.00 | 11.17 | 1.20 | 3.62 | 11.42 | 11.42 | 11.42 | 11.42 |
westmonster_com | 77.00 | 4.71 | 1.23 | 0.78 | 5.22 | 5.22 | 5.22 | 5.22 |
yournewswire_com | 174.00 | 5.68 | 0.68 | 2.39 | 5.79 | 5.99 | 5.99 | 5.99 |
lead_proposal_2.groupby(data.publisher_id).min().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10d0c1eb8>
That looks about right, as long as the smaller publishers were closer to zero. So let's apply feature scaling to this, to give a number between 1 and 20. (Anything not as lead will pass though as zero.)
def rescale(series):
return (series - series.min()) / (series.max() - series.min())
lead_proposal_3 = np.ceil(20 * rescale(lead_proposal_2))
lead_proposal_2.min(), lead_proposal_2.max()
(0.38500569152790032, 13.145359968846892)
lead_proposal_3.plot.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10d9fef60>
lead_proposal_3.groupby(data.publisher_id).median().plot.bar(figsize=[10,5])
<matplotlib.axes._subplots.AxesSubplot at 0x10eb56cc0>
data["lead_score"] = pd.concat([lead_proposal_3, data.mins_as_lead[data.mins_as_lead==0]])
data.lead_score.value_counts().sort_index()
0.00 151057 1.00 47 2.00 34 3.00 54 4.00 75 5.00 78 6.00 67 7.00 272 8.00 303 9.00 312 10.00 199 11.00 193 12.00 89 13.00 101 14.00 556 15.00 375 16.00 320 17.00 280 18.00 244 19.00 403 20.00 317 Name: lead_score, dtype: int64
data.lead_score.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 20.00 breitbart_com 14.00 brexitcentral_com 7.00 buzzfeed_com 19.00 cnn_com 20.00 dailymail_co_uk 18.00 economist_com 12.00 evolvepolitics_com 8.00 foxnews_com 17.00 ft_com 13.00 huffingtonpost_com 17.00 independent_co_uk 16.00 indy100_com 11.00 lemonde_fr 15.00 libdemvoice_org 7.00 mirror_co_uk 14.00 nbcnews_com 14.00 newstatesman_com 10.00 npr_org 15.00 nytimes_com 20.00 order-order_com 9.00 propublica_org 10.00 reuters_com 15.00 rt_com 16.00 skwawkbox_org 8.00 telegraph_co_uk 16.00 thecanary_co 10.00 theguardian_com 19.00 thetimes_co_uk 11.00 washingtonpost_com 18.00 westmonster_com 8.00 yournewswire_com 9.00 Name: lead_score, dtype: float64
In summary then, score for article $a$ is:
$$ unscaledLeadScore_a = \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)}\\ leadScore_a = 19 \cdot \frac{unscaledLeadScore_a - \min(unscaledLeadScore)} {\max(unscaledLeadScore) - \min(unscaledLeadScore)} + 1 $$Since the minium value of $minsAsLead$ is 1, $\min(unscaledLeadScore)$ is pretty insignificant. So we can simplify this to:
$$ leadScore_a = 20 \cdot \frac{unscaledLeadScore_a } {\max(unscaledLeadScore)} $$or:
$$ leadScore_a = 20 \cdot \frac{\frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} } {\frac{60}{\log(\max(alexaRank))}} $$$$ leadScore_a = \left( 20 \cdot \frac{\min(minsAsLead_a, 60)}{\log(alexaRank_a)} \cdot {\frac{\log(\max(alexaRank))}{60}} \right) $$This is similar to time as lead, so lets try doing the same calculation, except we also want to factor in the number of slots on the front:
$$frontScore_a = 15 \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right)$$(data.alexa_rank * data.num_articles_on_front).min() / 1440
2.4500000000000002
time_on_front_proposal_1 = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15)
time_on_front_proposal_1.plot.hist(figsize=(15, 7), bins=15)
<matplotlib.axes._subplots.AxesSubplot at 0x10eddc390>
time_on_front_proposal_1.value_counts().sort_index()
1.00 72664 2.00 6937 3.00 4459 4.00 3846 5.00 845 6.00 539 7.00 513 8.00 706 9.00 678 10.00 222 11.00 339 12.00 298 13.00 145 14.00 96 15.00 47 dtype: int64
time_on_front_proposal_1.groupby(data.publisher_id).sum()
publisher_id anotherangryvoice_blogspot_co_uk 46.00 bbc_co_uk 15119.00 breitbart_com 2606.00 brexitcentral_com 52.00 buzzfeed_com 9698.00 cnn_com 11792.00 dailymail_co_uk 13965.00 economist_com 319.00 evolvepolitics_com 60.00 foxnews_com 6473.00 ft_com 3167.00 huffingtonpost_com 7534.00 independent_co_uk 4431.00 indy100_com 534.00 lemonde_fr 3972.00 libdemvoice_org 170.00 mirror_co_uk 9502.00 nbcnews_com 1937.00 newstatesman_com 515.00 npr_org 2449.00 nytimes_com 9488.00 order-order_com 267.00 propublica_org 57.00 reuters_com 6863.00 rt_com 4421.00 skwawkbox_org 117.00 telegraph_co_uk 4950.00 thecanary_co 249.00 theguardian_com 12252.00 thetimes_co_uk 9155.00 washingtonpost_com 8679.00 westmonster_com 327.00 yournewswire_com 392.00 dtype: float64
That looks good to me.
data["front_score"] = np.ceil(data.mins_on_front.clip_upper(1440) / (data.alexa_rank * data.num_articles_on_front) * (2.45) * 15).fillna(0)
data.front_score
id 4b92d2afc3eb0becb58ee0f8a866e8bb3e0c6c3c 4.00 eedd25b64e2cf37cf2198783311a1a823e26965a 4.00 68cf97046a5705f8c6826ab1fc3ce6e98e845a0d 1.00 ff04a763b2f01746c4b4718bf2ad150468918bb0 4.00 1bf0cb152322d6d4ec6f0910f7bbaac27976fe6e 4.00 60192d56f7ff0332b325c51b3a9c5ac59d42c525 1.00 40a59853e4642a4ababa6938176089b8775c4016 1.00 d1e05106c3ec699a37bf8f26ab63e4dbebcd4971 4.00 ed61f808ac7756c2d73dfdf6855c34786bc9de02 2.00 bbadee9289eea2f0b6b6220f9f613578c37ef621 1.00 063a022e6f66ec2763b22a815ec25471e22287e3 1.00 478f397e1c731872fba60a4189bc72c924cfc847 1.00 17f0678815a01ee8e4914e30d612c294c78c68b8 1.00 1b7a6a9b1aea0996b27cf0f5460fa23bdd179067 1.00 348163ba67b3cd7e989236ce8a739b74f81cd67e 1.00 99f97d2f01a3c2dc40da166b18abacdbc85b9634 1.00 587272809793b00754536e86b8f535cc5fbc9700 1.00 57326012898026faf47a6e5969e4e3a6ebe9005d 1.00 13ac494c2c23dfe4395decfd9882e4851a1cd679 1.00 e3a02a337c53a78300d7e3c238a7ba2084705106 1.00 d00575f8a0fddb2cf4481ad4844aea24f481d07c 1.00 079efcd636a42a2302cdc5f8ec20ea640189dd03 1.00 487fb90db5d16301835c7c6b85e27183ab20f96d 1.00 13316d1e0c194706fed4d779a5fbc431aa845187 1.00 89444e353ce245b819ed5e42a16bfbd31df8a367 1.00 faf1c9acda717cd3729d4093cec850f5fda1a615 1.00 a96249330370dd853da8021f2c1f42a83a2e9c76 1.00 bb78be89d6a53016f4378af44c02edc5a29c7eca 1.00 0da8e2a7525e7d8f5037f87d00641bb7def15a0e 1.00 710e0b5316bc13656fda544dbcea145c1e62094b 1.00 ... 4945d0f879fe8c25617b8d0f04e5395d0d02c6a7 0.00 b7d2aa4539fd0bcc0ad944702db6f3353e3437f3 1.00 f6d35f22cec3ec9e9d1ec62f1b9b9b2cab96ba3b 0.00 3c35b93d4d2e7273ac60fcf566ec04cd2ed52040 0.00 a8d249980496930270c3f62ea40437b569b4a576 1.00 35d67981fc107b0db3b7e2c380dc6c48d1a276a7 1.00 780d2a1296546116e450fba4fd0fb48e312fbb6f 0.00 5d838bb1a1bd01e170c8e46e77f9a6d042e910c0 0.00 fdb933725e53ed0d1cbb57bacf915e72e42d6976 1.00 0e330800c766a79bb09a8750b3e2f2a0f3b495a9 1.00 ed642f48391d369b5d362c3bd0a58a9441b2e47d 2.00 00b2260e4769744ec5cf2dc1973736cf26113947 1.00 3c262b96680f65dcfcce7b33c0fc9a5c42e5ae06 0.00 7c426e09e5f44abd8316c21e2d753c67b53c6b3b 0.00 20600f6fd9847e684df22803751f6c1ea4854fc4 2.00 a3cf9f323bb8e614c64ac8528216820b752fe965 0.00 0272fffeb0b5bb264a2ae049a34f101303d73350 0.00 bc4c1078dd7ee60f67d89c574aa2ead98b4a7f6a 1.00 8953c649cea6fad178563a9630ac88cb37f45515 0.00 b545fcbbc649daa5be3e2f678de536e2cd3eb8cc 1.00 1ab07de99305bb0052286646d8619c73aa47b6a8 0.00 302a10f2a9db8c489f079ff7cf04ec43967b899c 0.00 e0717c56dc5adbb093e3061d99985f2a7e32a78d 0.00 2e6ec6e3dcbcee6d16eaf7f2d6423bbf95ee243d 1.00 83fbd4044262f67fe0582658b64b69644013992a 0.00 1b578330a91e65cb11e9a5ce7ea7770ff0c0c2dd 3.00 cec0dd5ed9522d45461cad6451329493993bffe4 1.00 31fa3fd4a146c7e6ca5a68ae61a7cd84dba9708f 2.00 301e95874ced17644b54ce912152dd955f1b36f7 1.00 7dfb319c50b823a7a353edbb239d34cdaa5d8378 0.00 Name: front_score, Length: 155376, dtype: float64
One way a publisher has of promoting content is to post to their brand page. The significance of doing so is stronger when the brand page has more followers (likes).
$$ facebookPromotionProposed1_a = 15 \left( \frac {brandPageLikes_a} {\max(brandPageLikes)} \right) $$Now lets explore the data to see if that makes sense. tr;dr the formula above is incorrect
data.fb_brand_page_likes.max()
45013800.0
facebook_promotion_proposed_1 = np.ceil((15 * (data.fb_brand_page_likes / data.fb_brand_page_likes.max())).fillna(0))
facebook_promotion_proposed_1.value_counts().sort_index().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x10db53e80>
facebook_promotion_proposed_1.groupby(data.publisher_id).describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
publisher_id | ||||||||
anotherangryvoice_blogspot_co_uk | 46.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
bbc_co_uk | 12497.00 | 0.58 | 2.89 | 0.00 | 0.00 | 0.00 | 0.00 | 15.00 |
breitbart_com | 2702.00 | 0.84 | 0.99 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
brexitcentral_com | 52.00 | 0.98 | 0.14 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
buzzfeed_com | 1609.00 | 0.30 | 0.46 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 |
cnn_com | 3425.00 | 2.79 | 4.49 | 0.00 | 0.00 | 0.00 | 10.00 | 10.00 |
dailymail_co_uk | 23842.00 | 0.56 | 1.58 | 0.00 | 0.00 | 0.00 | 0.00 | 5.00 |
economist_com | 581.00 | 2.32 | 1.26 | 0.00 | 3.00 | 3.00 | 3.00 | 3.00 |
evolvepolitics_com | 61.00 | 0.90 | 0.30 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
foxnews_com | 5323.00 | 0.81 | 2.05 | 0.00 | 0.00 | 0.00 | 0.00 | 6.00 |
ft_com | 4997.00 | 0.41 | 0.81 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
huffingtonpost_com | 9772.00 | 0.46 | 1.27 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
independent_co_uk | 6567.00 | 0.57 | 1.17 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
indy100_com | 545.00 | 0.67 | 0.47 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
lemonde_fr | 3991.00 | 0.79 | 0.98 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
libdemvoice_org | 170.00 | 0.88 | 0.32 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
mirror_co_uk | 10121.00 | 0.25 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
nbcnews_com | 2003.00 | 2.29 | 1.98 | 0.00 | 0.00 | 4.00 | 4.00 | 4.00 |
newstatesman_com | 526.00 | 0.74 | 0.44 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
npr_org | 2102.00 | 1.36 | 1.49 | 0.00 | 0.00 | 0.00 | 3.00 | 3.00 |
nytimes_com | 4830.00 | 1.55 | 2.31 | 0.00 | 0.00 | 0.00 | 5.00 | 5.00 |
order-order_com | 267.00 | 0.80 | 0.40 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
propublica_org | 57.00 | 0.82 | 0.38 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
reuters_com | 5952.00 | 0.61 | 0.92 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
rt_com | 2691.00 | 0.93 | 1.00 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
skwawkbox_org | 117.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
telegraph_co_uk | 7571.00 | 0.50 | 0.87 | 0.00 | 0.00 | 0.00 | 2.00 | 2.00 |
thecanary_co | 251.00 | 0.98 | 0.15 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
theguardian_com | 8493.00 | 0.52 | 1.13 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
thetimes_co_uk | 9195.00 | 0.05 | 0.23 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
washingtonpost_com | 24278.00 | 0.17 | 0.69 | 0.00 | 0.00 | 0.00 | 0.00 | 3.00 |
westmonster_com | 350.00 | 0.26 | 0.44 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 |
yournewswire_com | 392.00 | 0.08 | 0.27 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
That's too much variation: sites like the Guardian, which have a respectable 7.5m likes, should not be scoring a 3. Lets try applying a log to it, and then standard feature scaling again.
data.fb_brand_page_likes.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 330305.00 bbc_co_uk 45013800.00 breitbart_com 3748243.00 brexitcentral_com 12499.00 buzzfeed_com 2795165.00 cnn_com 29252657.00 dailymail_co_uk 13536602.00 economist_com 8387474.00 evolvepolitics_com 114880.00 foxnews_com 16052065.00 ft_com 3709965.00 huffingtonpost_com 9815158.00 independent_co_uk 7842115.00 indy100_com 231700.00 lemonde_fr 3949946.00 libdemvoice_org 8615.00 mirror_co_uk 2926052.00 nbcnews_com 9381825.00 newstatesman_com 154735.00 npr_org 6252770.00 nytimes_com 14989576.00 order-order_com 45143.00 propublica_org 371738.00 reuters_com 3914921.00 rt_com 4705596.00 skwawkbox_org 6104.00 telegraph_co_uk 4390313.00 thecanary_co 156466.00 theguardian_com 7809686.00 thetimes_co_uk 718296.00 washingtonpost_com 6090176.00 westmonster_com 16158.00 yournewswire_com 27271.00 Name: fb_brand_page_likes, dtype: float64
np.log(2149)
7.6727578966425103
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max())
publisher_id anotherangryvoice_blogspot_co_uk 12.71 bbc_co_uk 17.62 breitbart_com 15.14 brexitcentral_com 9.43 buzzfeed_com 14.84 cnn_com 17.19 dailymail_co_uk 16.42 economist_com 15.94 evolvepolitics_com 11.65 foxnews_com 16.59 ft_com 15.13 huffingtonpost_com 16.10 independent_co_uk 15.88 indy100_com 12.35 lemonde_fr 15.19 libdemvoice_org 9.06 mirror_co_uk 14.89 nbcnews_com 16.05 newstatesman_com 11.95 npr_org 15.65 nytimes_com 16.52 order-order_com 10.72 propublica_org 12.83 reuters_com 15.18 rt_com 15.36 skwawkbox_org 8.72 telegraph_co_uk 15.29 thecanary_co 11.96 theguardian_com 15.87 thetimes_co_uk 13.48 washingtonpost_com 15.62 westmonster_com 9.69 yournewswire_com 10.21 Name: fb_brand_page_likes, dtype: float64
That's more like it, but the lower numbers should be smaller.
np.log(data.fb_brand_page_likes.groupby(data.publisher_id).max() / 1000)
publisher_id anotherangryvoice_blogspot_co_uk 5.80 bbc_co_uk 10.71 breitbart_com 8.23 brexitcentral_com 2.53 buzzfeed_com 7.94 cnn_com 10.28 dailymail_co_uk 9.51 economist_com 9.03 evolvepolitics_com 4.74 foxnews_com 9.68 ft_com 8.22 huffingtonpost_com 9.19 independent_co_uk 8.97 indy100_com 5.45 lemonde_fr 8.28 libdemvoice_org 2.15 mirror_co_uk 7.98 nbcnews_com 9.15 newstatesman_com 5.04 npr_org 8.74 nytimes_com 9.62 order-order_com 3.81 propublica_org 5.92 reuters_com 8.27 rt_com 8.46 skwawkbox_org 1.81 telegraph_co_uk 8.39 thecanary_co 5.05 theguardian_com 8.96 thetimes_co_uk 6.58 washingtonpost_com 8.71 westmonster_com 2.78 yournewswire_com 3.31 Name: fb_brand_page_likes, dtype: float64
scaled_fb_brand_page_likes = (data.fb_brand_page_likes / 1000)
facebook_promotion_proposed_2 = np.ceil(\
(15 * \
(np.log(scaled_fb_brand_page_likes) / np.log(scaled_fb_brand_page_likes.max()))\
)\
).fillna(0)
facebook_promotion_proposed_2.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 9.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 4.00 buzzfeed_com 12.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 7.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 8.00 lemonde_fr 12.00 libdemvoice_org 4.00 mirror_co_uk 12.00 nbcnews_com 13.00 newstatesman_com 8.00 npr_org 13.00 nytimes_com 14.00 order-order_com 6.00 propublica_org 9.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 3.00 telegraph_co_uk 12.00 thecanary_co 8.00 theguardian_com 13.00 thetimes_co_uk 10.00 washingtonpost_com 13.00 westmonster_com 4.00 yournewswire_com 5.00 Name: fb_brand_page_likes, dtype: float64
LGTM. So the equation is
$$ facebookPromotion_a = 15 \left( \frac {\log(\frac {brandPageLikes_a}{1000})} {\log(\frac {\max(brandPageLikes)}{1000}))} \right) $$Now, let's try applying standard feature scaling approch to this, rather than using a magic number of 1,000. That equation would be:
\begin{align} unscaledFacebookPromotion_a &= \log(brandPageLikes_a) \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \min(unscaledFacebookPromotion)}{\max(unscaledFacebookPromotion) - \min(unscaledFacebookPromotion)} \\ \\ \text{The scaling can be simplified to:} \\ facebookPromotion_a &= 15 \cdot \frac{unscaledFacebookPromotion_a - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \\ \\ \text{Meaning the overall equation becomes:} \\ facebookPromotion_a &= 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \end{align}facebook_promotion_proposed_3 = np.ceil(
(14 *
(
(np.log(data.fb_brand_page_likes) - np.log(data.fb_brand_page_likes.min()) ) /
(np.log(data.fb_brand_page_likes.max()) - np.log(data.fb_brand_page_likes.min()))
)
) + 1
)
facebook_promotion_proposed_3.groupby(data.publisher_id).max()
publisher_id anotherangryvoice_blogspot_co_uk 8.00 bbc_co_uk 15.00 breitbart_com 12.00 brexitcentral_com 3.00 buzzfeed_com 11.00 cnn_com 15.00 dailymail_co_uk 14.00 economist_com 13.00 evolvepolitics_com 6.00 foxnews_com 14.00 ft_com 12.00 huffingtonpost_com 13.00 independent_co_uk 13.00 indy100_com 7.00 lemonde_fr 12.00 libdemvoice_org 2.00 mirror_co_uk 11.00 nbcnews_com 13.00 newstatesman_com 7.00 npr_org 12.00 nytimes_com 14.00 order-order_com 5.00 propublica_org 8.00 reuters_com 12.00 rt_com 12.00 skwawkbox_org 2.00 telegraph_co_uk 12.00 thecanary_co 7.00 theguardian_com 13.00 thetimes_co_uk 9.00 washingtonpost_com 12.00 westmonster_com 3.00 yournewswire_com 4.00 Name: fb_brand_page_likes, dtype: float64
data["facebook_promotion_score"] = facebook_promotion_proposed_3.fillna(0.0)
data["promotion_score"] = (data.lead_score + data.front_score + data.facebook_promotion_score)
data["attention_index"] = (data.promotion_score + data.response_score)
data.promotion_score.plot.hist(bins=np.arange(50), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x113745ef0>
data.attention_index.plot.hist(bins=np.arange(100), figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x108762c18>
data.attention_index.value_counts().sort_index()
0.00 26228 1.00 18738 2.00 11875 3.00 8299 4.00 6777 5.00 5670 6.00 4580 7.00 4152 8.00 3592 9.00 3395 10.00 3052 11.00 2790 12.00 2610 13.00 2552 14.00 2289 15.00 2188 16.00 2124 17.00 1923 18.00 1939 19.00 1810 20.00 1731 21.00 1709 22.00 1636 23.00 1592 24.00 1610 25.00 1574 26.00 1515 27.00 1385 28.00 1367 29.00 1322 ... 65.00 127 66.00 111 67.00 84 68.00 93 69.00 88 70.00 54 71.00 61 72.00 41 73.00 54 74.00 43 75.00 33 76.00 41 77.00 40 78.00 30 79.00 36 80.00 24 81.00 20 82.00 23 83.00 20 84.00 22 85.00 17 86.00 10 87.00 10 88.00 16 89.00 9 90.00 7 91.00 6 92.00 6 93.00 5 94.00 2 Name: attention_index, Length: 95, dtype: int64
# and lets see the articles with the biggest attention index
data.sort_values("attention_index", ascending=False)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_likes | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
73ea9633c21733c5931c0c9c18b066285cb1bf39 | http://www.cnn.com/2017/11/27/europe/prince-ha... | Prince Harry and Meghan Markle are engaged | 2017-11-27 10:10:13.896 | 2017-11-27 10:06:10.000 | 120080 | 374.58 | 2017-11-27T10:21:10.556Z | 17091 | 95550 | 7439 | ... | 29210147.00 | 2017-11-27T10:10:57.000Z | 105 | 30.00 | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
0844561b92420804aad85219a5552042ce4d265c | http://money.cnn.com/2017/11/29/media/matt-lau... | Matt Lauer fired from NBC News | 2017-11-29 12:13:21.006 | 2017-11-29 12:03:48.000 | 280472 | 1059.42 | 2017-11-29T12:25:04.420Z | 121197 | 131299 | 27976 | ... | 29221482.00 | 2017-11-29T12:10:25.000Z | 105 | nan | 50.00 | 20.00 | 9.00 | 15.00 | 44.00 | 94.00 |
d89e12f04f8c41a417503f4ddd8738eb9e3a4fbe | http://www.cnn.com/2017/11/03/politics/bowe-be... | Bowe Bergdahl case: Judge reaches decision on ... | 2017-11-03 15:13:06.928 | 2017-11-03 15:09:11.000 | 116359 | 254.01 | 2017-11-03T17:16:12.979Z | 57923 | 48747 | 9689 | ... | 29117755.00 | 2017-11-03T15:50:13.000Z | 105 | 710.00 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
e517b876ffba73e2f5ffd10ba0f8f429906d29f5 | https://www.buzzfeed.com/briannasacks/trump-is... | Trump Is Allowing Hunters To Import Elephant T... | 2017-11-16 03:58:19.582 | 2017-11-16 03:38:35.000 | 123141 | 109.65 | 2017-11-16T14:11:09.991Z | 30123 | 80287 | 12731 | ... | 2764495.00 | 2017-11-19T16:02:00.000Z | 147 | 741.00 | 50.00 | 19.00 | 13.00 | 11.00 | 43.00 | 93.00 |
fc4d4c9056d6150a5d077277d6cbbad6d94cd9eb | http://www.bbc.co.uk/news/world-africa-42071488 | Zimbabwe's President Mugabe 'resigns' | 2017-11-21 15:55:05.471 | 2017-11-21 15:51:28.000 | 83318 | 945.12 | 2017-11-21T16:06:25.141Z | 11043 | 61549 | 10726 | ... | 44879347.00 | 2017-11-21T15:54:48.000Z | 96 | 59.00 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
eb5cf786e76b63b71b65ee2d768bc6b852cd09f5 | http://www.cnn.com/2017/11/26/politics/tom-ste... | Tom Steyer defends $20M ad campaign calling to... | 2017-11-26 20:52:30.134 | 2017-11-26 20:51:09.000 | 72391 | 345.35 | 2017-11-26T23:19:13.135Z | 12739 | 56089 | 3563 | ... | 29208375.00 | 2017-11-26T23:00:07.000Z | 105 | 583.00 | 49.00 | 20.00 | 9.00 | 15.00 | 44.00 | 93.00 |
f4068619cf024977dd22e0f55df457e8a5e1e81a | http://www.cnn.com/2017/11/11/politics/preside... | Trump says he believes Putin's election meddli... | 2017-11-11 11:13:12.620 | 2017-11-11 11:07:35.000 | 119370 | 217.41 | 2017-11-11T16:54:07.932Z | 42730 | 63073 | 13567 | ... | 29149649.00 | 2017-11-11T11:55:15.000Z | 105 | 269.00 | 50.00 | 20.00 | 8.00 | 15.00 | 43.00 | 93.00 |
ae93aeb0b54e4226e91f871104402b5d593c017a | http://www.cnn.com/2017/11/04/politics/the-las... | George H.W. Bush labels Trump a 'blowhard' in ... | 2017-11-04 04:04:21.998 | 2017-11-04 04:01:21.000 | 64884 | 201.01 | 2017-11-04T13:15:03.379Z | 10290 | 47483 | 7111 | ... | 29121177.00 | 2017-11-04T12:46:27.000Z | 105 | 1268.00 | 48.00 | 20.00 | 9.00 | 15.00 | 44.00 | 92.00 |
bac634126d3948d4a630e9ed45fc43daf5e47512 | http://www.bbc.co.uk/news/world-middle-east-42... | Bomb attack on mosque in Egypt's Sinai | 2017-11-24 11:40:13.945 | 2017-11-24 11:36:06.000 | 73270 | 111.27 | 2017-11-24T15:22:11.248Z | 9639 | 53547 | 10084 | ... | 44919422.00 | 2017-11-24T15:12:31.000Z | 96 | 64.00 | 49.00 | 20.00 | 8.00 | 15.00 | 43.00 | 92.00 |
7f547d36c1b7a45ce5561d1f1bb893ad6b1a4b16 | http://www.bbc.co.uk/news/uk-41876942 | Tax haven secrets of ultra-rich exposed | 2017-11-05 18:03:06.595 | 2017-11-05 18:00:10.000 | 47663 | 1509.05 | 2017-11-05T20:07:12.413Z | 14615 | 22927 | 10121 | ... | 44674035.00 | 2017-11-05T18:02:24.000Z | 96 | 1092.00 | 46.00 | 20.00 | 11.00 | 15.00 | 46.00 | 92.00 |
1d314f93ee2e693abc8db7f0c5c0b5bc88e1b189 | http://www.bbc.co.uk/news/uk-42137179 | Prince Harry to marry girlfriend Meghan Markle | 2017-11-27 10:07:08.120 | 2017-11-27 10:04:47.000 | 185430 | 796.22 | 2017-11-27T10:18:09.644Z | 32779 | 138647 | 14004 | ... | 44955978.00 | 2017-11-27T10:10:16.000Z | 96 | 64.00 | 50.00 | 20.00 | 7.00 | 15.00 | 42.00 | 92.00 |
3e124661440aaee9ff2f5abd6580c96de9c409cd | http://www.cnn.com/2017/11/16/politics/al-fran... | Woman alleges Franken groped, kissed her witho... | 2017-11-16 16:25:14.114 | 2017-11-16 16:21:00.000 | 64085 | 419.10 | 2017-11-16T18:31:07.975Z | 39607 | 18265 | 6213 | ... | 29165786.00 | 2017-11-16T16:34:06.000Z | 105 | 394.00 | 48.00 | 20.00 | 9.00 | 15.00 | 44.00 | 92.00 |
25571e148ab673e2bee73a203ccbc469ad2c2fcb | http://www.cnn.com/2017/11/29/politics/preside... | Trump's behavior raises questions of competency | 2017-11-29 18:43:33.556 | 2017-11-29 18:38:30.000 | 73526 | 189.80 | 2017-11-29T21:42:10.905Z | 30333 | 33827 | 9366 | ... | 29240975.00 | 2017-12-03T13:30:19.000Z | 105 | 1098.00 | 49.00 | 20.00 | 8.00 | 15.00 | 43.00 | 92.00 |
4e4483f28e1fb45e7cd2efd68013d8713e093689 | http://www.cnn.com/2017/11/24/africa/egypt-sin... | Egypt attack: 'Dozens killed' at North Sinai m... | 2017-11-24 13:10:12.702 | 2017-11-24 13:05:02.000 | 76701 | 146.55 | 2017-11-24T14:16:10.600Z | 11338 | 52336 | 13027 | ... | 29200399.00 | 2017-11-24T13:30:59.000Z | 105 | 77.00 | 49.00 | 20.00 | 7.00 | 15.00 | 42.00 | 91.00 |
eddf6f1f068d000d109a9f590f254b217f9231ec | http://www.cnn.com/2017/11/12/middleeast/iraq-... | Powerful earthquake strikes near Iraqi city of... | 2017-11-12 19:07:11.950 | 2017-11-12 19:05:17.000 | 54567 | 160.80 | 2017-11-12T23:42:07.318Z | 6011 | 39256 | 9300 | ... | 29153473.00 | 2017-11-12T23:30:23.000Z | 105 | 44.00 | 47.00 | 20.00 | 9.00 | 15.00 | 44.00 | 91.00 |
8936b3dfe6df7390975e0b3613010b294dea585f | http://www.cnn.com/2017/11/05/us/texas-church-... | FBI responding to the scene of a shooting outs... | 2017-11-05 19:28:26.968 | 2017-11-05 19:25:20.000 | 472479 | 1474.19 | 2017-11-05T21:33:10.123Z | 103331 | 319098 | 50050 | ... | 29130688.00 | 2017-11-05T19:33:59.000Z | 105 | 93.00 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
9bcec31a0af3473e657194f655644e361f4c6112 | http://www.cnn.com/2017/11/21/entertainment/da... | David Cassidy, '70s teen heartthrob, dies at a... | 2017-11-22 02:10:11.664 | 2017-11-22 02:07:07.000 | 308761 | 1149.44 | 2017-11-22T02:43:07.665Z | 55066 | 213732 | 39963 | ... | 29180211.00 | 2017-11-22T02:29:19.000Z | 105 | 524.00 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
56d4fd37d920ed660cbad59ea6783a1ac316d3f1 | http://www.cnn.com/2017/11/07/politics/2017-us... | US election tests nation's mood a year after T... | 2017-11-07 17:28:21.597 | 2017-11-07 17:22:09.000 | 110855 | 657.61 | 2017-11-08T02:48:10.555Z | 11024 | 92195 | 7636 | ... | 29140017.00 | 2017-11-08T02:16:05.000Z | 105 | 730.00 | 50.00 | 20.00 | 6.00 | 15.00 | 41.00 | 91.00 |
c60580cced4a93dea0f1501fb8fc11b9568d8d09 | http://www.cnn.com/2017/11/20/politics/al-fran... | Woman says Al Franken inappropriately touched ... | 2017-11-20 14:04:08.927 | 2017-11-20 14:01:00.000 | 60999 | 151.83 | 2017-11-20T16:08:05.371Z | 39924 | 15995 | 5080 | ... | 29174292.00 | 2017-11-20T14:34:07.000Z | 105 | 1237.00 | 48.00 | 20.00 | 8.00 | 15.00 | 43.00 | 91.00 |
ea8c249c77726779692648c4b61e6f17e2c20ef1 | http://www.cnn.com/2017/11/21/politics/donald-... | Trump all but endorses Roy Moore | 2017-11-21 20:43:22.621 | 2017-11-21 20:39:42.000 | 61716 | 1075.54 | 2017-11-21T22:07:09.250Z | 28421 | 25460 | 7835 | ... | 29179024.00 | 2017-11-21T23:00:28.000Z | 105 | 113.00 | 48.00 | 20.00 | 7.00 | 15.00 | 42.00 | 90.00 |
fe697d4d58f41d6e1de381fbfd12e8df7afd83f8 | https://www.buzzfeed.com/jimdalrympleii/oil-sp... | More Than 200,000 Gallons Of Oil Spill Along T... | 2017-11-16 21:28:15.945 | 2017-11-16 21:27:17.000 | 54944 | 148.26 | 2017-11-16T22:11:10.231Z | 9329 | 35136 | 10479 | ... | 2763703.00 | 2017-11-18T20:04:00.000Z | 147 | 115.00 | 47.00 | 19.00 | 13.00 | 11.00 | 43.00 | 90.00 |
7be688e97f64c952e8183855c713ded615bb9aa0 | https://www.buzzfeed.com/josephbernstein/sourc... | Sources: McMaster Mocked Trump’s Intelligence ... | 2017-11-20 17:07:12.434 | 2017-11-15 19:51:02.000 | 39274 | 406.15 | 2017-11-20T20:14:06.158Z | 6108 | 28213 | 4953 | ... | 2765039.00 | 2017-11-20T17:09:20.000Z | 147 | 867.00 | 45.00 | 19.00 | 15.00 | 11.00 | 45.00 | 90.00 |
918dd1da248df1e030da129727d26ff87d2c476d | http://www.bbc.co.uk/news/world-australia-4199... | Australians back gay marriage in non-binding vote | 2017-11-14 23:07:07.371 | 2017-11-14 23:04:51.000 | 193635 | 2130.00 | 2017-11-15T08:05:11.694Z | 15567 | 165814 | 12254 | ... | 44796279.00 | 2017-11-14T23:06:11.000Z | 96 | 64.00 | 50.00 | 13.00 | 12.00 | 15.00 | 40.00 | 90.00 |
3f232a19515eafff49d19aa97622c92ced9e94aa | http://www.bbc.co.uk/news/world-us-canada-4216... | Trump account retweets anti-Muslim videos | 2017-11-29 11:58:11.714 | 2017-11-29 11:56:08.000 | 61587 | 836.41 | 2017-11-29T18:04:08.063Z | 22107 | 30644 | 8836 | ... | 44978694.00 | 2017-11-29T12:01:35.000Z | 96 | 64.00 | 48.00 | 20.00 | 7.00 | 15.00 | 42.00 | 90.00 |
2443904aef281c6a4cdff93a466908084cfa403d | https://www.buzzfeed.com/katiejmbaker/more-tha... | More Than 180 Women Have Reported Sexual Assau... | 2017-11-26 16:25:18.372 | 2017-11-22 22:15:18.000 | 45167 | 28.91 | 2017-11-26T22:37:06.578Z | 15799 | 23122 | 6246 | ... | 2775117.00 | 2017-11-26T16:41:11.000Z | 147 | 4608.00 | 46.00 | 19.00 | 14.00 | 11.00 | 44.00 | 90.00 |
740f28a60dc9f4c080b5f2495e839a579134b05f | http://www.cnn.com/2017/11/06/us/texas-church-... | Texas church shooting: Of 26 dead, 8 came from... | 2017-11-06 09:25:15.499 | 2017-11-06 09:18:43.000 | 48754 | 67.26 | 2017-11-06T14:55:11.872Z | 14626 | 27026 | 7102 | ... | 29134644.00 | 2017-11-06T10:00:05.000Z | 105 | 1327.00 | 47.00 | 20.00 | 8.00 | 15.00 | 43.00 | 90.00 |
882f5d32bf5d563b1b4804a5b5054e7bf38fd49e | https://www.huffingtonpost.com/entry/charles-m... | Manson Family Leader Charles Manson Dead At 83 | 2017-11-20 05:58:17.677 | 2017-11-20 05:55:41.000 | 120366 | 468.15 | 2017-11-20T06:10:05.971Z | 32275 | 67753 | 20338 | ... | 9801729.00 | 2017-11-20T05:57:04.000Z | 215 | 918.00 | 50.00 | 17.00 | 9.00 | 13.00 | 39.00 | 89.00 |
eab0bd7eddf6530830d0739c2b95df91c04311af | http://www.cnn.com/2017/11/20/politics/trump-t... | Trump says Raiders running back should be susp... | 2017-11-20 12:49:19.171 | 2017-11-20 12:44:47.000 | 42682 | 202.44 | 2017-11-20T14:12:06.345Z | 17683 | 21738 | 3261 | ... | 29174331.00 | 2017-11-20T13:50:04.000Z | 105 | 242.00 | 46.00 | 20.00 | 8.00 | 15.00 | 43.00 | 89.00 |
fb1e8caeac265f01e6f14acd89b5caa950fad9db | http://money.cnn.com/2017/11/02/technology/don... | Trump's Twitter handle disappeared. Everyone f... | 2017-11-02 23:48:14.601 | 2017-11-02 23:25:28.000 | 45644 | 179.60 | 2017-11-03T19:05:07.586Z | 7000 | 32402 | 6242 | ... | 29115629.00 | 2017-11-02T23:45:09.000Z | 105 | nan | 46.00 | 20.00 | 8.00 | 15.00 | 43.00 | 89.00 |
786ccbf412c44c9929f14d99f69b26c08b41c7c4 | http://www.cnn.com/2017/11/14/us/california-te... | At least 3 people killed in Northern California | 2017-11-14 19:25:25.795 | 2017-11-14 19:21:15.000 | 64970 | 764.73 | 2017-11-15T04:03:14.627Z | 19633 | 35757 | 9580 | ... | 29159769.00 | 2017-11-14T20:30:12.000Z | 105 | 137.00 | 48.00 | 20.00 | 6.00 | 15.00 | 41.00 | 89.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
f4e1249cdcc101d0aa1d61fdcad594b5ffaedd7a | http://www.dailymail.co.uk/news/article-510517... | Police search for New York City synagogue vandals | 2017-11-21 19:52:18.617 | 2017-11-21 19:49:15.000 | 1 | 0.02 | 2017-11-21T22:04:09.231Z | 0 | 0 | 1 | ... | nan | NaN | 158 | 263.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
1f0aec88f260aee14d77dc39b8e2db114d5e53f1 | https://www.washingtonpost.com/national/review... | Review: Billy Bragg delivers the news on ‘Brid... | 2017-11-06 15:52:16.276 | 2017-11-06 15:48:17.000 | 0 | 0.00 | 2017-11-06T19:06:11.406Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 303.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5848ab7e4866902978ed29f58a718ebe38fe533b | https://www.washingtonpost.com/sports/dcunited... | Emenalo quits as Chelsea’s technical director,... | 2017-11-06 15:49:15.654 | 2017-11-06 15:39:14.000 | 1 | 0.02 | 2017-11-06T17:01:11.481Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 105.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
56fc6dab69d6747825f30071b3d55405bd7bc76a | https://www.washingtonpost.com/business/footba... | Football helped NBC to another ratings touchdo... | 2017-11-21 20:04:13.555 | 2017-11-21 19:50:11.000 | 0 | 0.00 | 2017-11-21T22:17:11.052Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 129.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5a93a40fee0828de942d2ed87e6498fb5539e771 | https://www.washingtonpost.com/world/europe/ru... | Russian lawmaker detained in France in tax fra... | 2017-11-21 20:01:12.291 | 2017-11-21 19:53:16.000 | 0 | 0.00 | 2017-11-21T22:13:21.095Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 150.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
fd9f75db2dd6cf003a4d3710f78c96a183c900af | https://www.huffingtonpost.com/entry/inside-th... | Inside the Digital World Of Vanity Fair With P... | 2017-11-21 20:01:16.478 | 2017-11-21 19:54:00.473 | 0 | 0.00 | 2017-11-21T22:13:21.100Z | 0 | 0 | 0 | ... | nan | NaN | 215 | 1217.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
3af22a8a9f33acd7d3563e43053bab8d724d5996 | https://www.washingtonpost.com/national/caregi... | Caregiver accused of trying to suffocate 88-ye... | 2017-11-21 19:58:09.299 | 2017-11-21 19:54:09.000 | 1 | 0.02 | 2017-11-22T15:28:10.555Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 147.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f0c3ae6a6bd50e9ad6b4cc46f1783efd157bd243 | https://www.huffingtonpost.com/entry/what-they... | What they'll say on Brietbart after Trump is gone | 2017-11-06 15:55:23.957 | 2017-11-06 15:37:56.366 | 1 | 0.00 | 2017-11-06T19:10:03.159Z | 0 | 0 | 1 | ... | nan | NaN | 215 | 733.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
fc50cf94337627a443e18ff0a668562b265cce5b | https://www.washingtonpost.com/national/police... | Police arrest man who identified himself as de... | 2017-11-06 15:46:11.188 | 2017-11-06 15:37:15.000 | 0 | 0.00 | 2017-11-06T17:59:10.516Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 154.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c32fd52c8d2e02aba70271c4492b975099262983 | https://www.huffingtonpost.com/entry/new-york-... | New York City Spends $1 Billion with MWBE Firms | 2017-11-21 19:46:15.255 | 2017-11-21 19:39:14.242 | 1 | 0.02 | 2017-11-21T20:58:06.182Z | 0 | 0 | 1 | ... | nan | NaN | 215 | 549.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c9304216196c1409380a7c249333cb66955928d1 | https://www.washingtonpost.com/national/qanda-... | Q&A: Donations for victims of the Las Vegas ma... | 2017-11-21 19:43:16.407 | 2017-11-21 19:38:35.000 | 0 | 0.00 | 2017-11-21T22:58:04.726Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 816.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
4eb3c48a8a54ccdc973efb7c2782a9f07d7329ab | https://www.huffingtonpost.com/entry/real-time... | Real Time Data Adoption Still Slow in the Ente... | 2017-11-06 16:04:12.393 | 2017-11-06 15:57:49.261 | 0 | 0.00 | 2017-11-06T18:17:06.170Z | 0 | 0 | 0 | ... | nan | NaN | 215 | 838.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
8541d646246355defc696897573ce52e5c2f1b87 | http://www.bbc.co.uk/news/uk-scotland-edinburg... | Queensferry Crossing death may have been 'frea... | 2017-11-21 19:34:12.361 | 2017-11-21 19:30:26.000 | 1 | 0.02 | 2017-11-22T08:58:11.287Z | 0 | 0 | 1 | ... | nan | NaN | 96 | 633.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5c377078cea6e6bdba3be1bba123e19d0d15bd79 | https://www.huffingtonpost.com/entry/how-to-av... | How to Avoid Downloading a Fake App | 2017-11-21 19:31:19.133 | 2017-11-21 19:28:53.673 | 1 | 0.02 | 2017-11-21T20:43:10.304Z | 0 | 0 | 1 | ... | nan | NaN | 215 | 849.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c6616c46ae04437d27a12cfbf64ebebb3d89be33 | https://www.huffingtonpost.com/entry/a-gratitu... | A Gratitude Ritual | 2017-11-21 19:31:18.735 | 2017-11-21 19:28:58.462 | 0 | 0.00 | 2017-11-21T23:46:09.078Z | 0 | 0 | 0 | ... | nan | NaN | 215 | 794.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
997b3e829e98b6ef68e0072e17477310b9095fae | https://www.washingtonpost.com/national/religi... | Vatican beefs up oversight of diplomats after ... | 2017-11-21 19:34:14.437 | 2017-11-21 19:29:11.000 | 1 | 0.02 | 2017-11-21T23:49:10.183Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 151.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
8061f2b3380e8915d26970a1124b761fe9c0add7 | https://www.ft.com/content/fec7e1a8-c2df-11e7-... | Dialectical face-off: Christmas Eve at the Ust... | 2017-11-06 15:52:25.344 | 2017-11-06 15:50:58.000 | 1 | 0.09 | 2017-11-06T16:04:03.421Z | 0 | 0 | 1 | ... | nan | NaN | 1596 | 371.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5b0911c3e20f654862c8519206cfcdb33c76290f | https://www.washingtonpost.com/sports/colleges... | Jenkins leads Jackrabbits past Iowa 80-72 | 2017-11-21 19:31:13.727 | 2017-11-21 19:29:25.000 | 0 | 0.00 | 2017-11-21T23:46:09.059Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 337.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2c4c90d724d8fc6817ebc18815d13fdc5cf4b908 | https://www.washingtonpost.com/sports/colleges... | Is J.T. Barrett the best QB to ever play at Oh... | 2017-11-21 19:31:13.939 | 2017-11-21 19:29:25.000 | 0 | 0.00 | 2017-11-21T23:46:09.064Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 781.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c75e906a52a20db93508391aefd3860f18d1eb5c | http://www.bbc.co.uk/news/av/world-africa-4207... | Zimbabwe reacts to Mugabe resignation | 2017-11-21 19:34:13.130 | 2017-11-21 19:30:10.000 | 0 | 0.00 | 2017-11-21T21:47:10.450Z | 0 | 0 | 0 | ... | nan | NaN | 96 | nan | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
76df4362a0be5fbb7bfc02168d7f391aa194912d | https://www.washingtonpost.com/lifestyle/home/... | Craftivism: Melding of crafting, activism is h... | 2017-11-21 19:40:15.940 | 2017-11-21 19:30:27.000 | 0 | 0.00 | 2017-11-21T22:53:13.683Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 766.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
9405dd882af60d34ebdeabb88e1e5a3fd5eb6403 | https://www.washingtonpost.com/sports/colleges... | Missouri’s Porter Jr. out for season after bac... | 2017-11-21 19:52:15.804 | 2017-11-21 19:38:21.000 | 0 | 0.00 | 2017-11-22T00:06:05.554Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 146.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
7c66cddb70a24a69f9c687a5409bd9b41e413c27 | http://www.telegraph.co.uk/music/what-to-liste... | Frank Turner interview: 'People who take pride... | 2017-11-21 19:34:08.979 | 2017-11-21 19:32:18.000 | 0 | 0.00 | 2017-11-21T21:47:10.436Z | 0 | 0 | 0 | ... | nan | NaN | 370 | 120.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a6c0dde1c0e8750d03e600bf6fc0b7dd0667536b | https://www.washingtonpost.com/sports/colleges... | Vols’ heralded senior class having disappointi... | 2017-11-21 19:40:16.255 | 2017-11-21 19:32:32.000 | 0 | 0.00 | 2017-11-21T22:53:13.692Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 740.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
365c6f16c05b07ec35b5fc2afdf8a21ac9d9678a | https://www.washingtonpost.com/national/health... | Diary: Inmate got over-the-counter drugs befor... | 2017-11-21 19:43:16.707 | 2017-11-21 19:34:36.000 | 0 | 0.00 | 2017-11-21T22:58:04.728Z | 0 | 0 | 0 | ... | nan | NaN | 191 | 141.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
f253306fa278a178f948b4c13682a248dbc5eef0 | https://www.washingtonpost.com/business/liz-we... | Liz Weston: 4 Steps to Disaster-Proof Your Fin... | 2017-11-06 15:58:18.131 | 2017-11-06 15:48:22.000 | 1 | 0.02 | 2017-11-06T17:11:04.540Z | 0 | 0 | 1 | ... | nan | NaN | 191 | 833.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
96fffc46028063fc811a83deeac400c0a3f53839 | http://www.dailymail.co.uk/sport/rugbyunion/ar... | Wales star Justin Tipuric a major doubt to fac... | 2017-11-06 15:52:22.247 | 2017-11-06 15:48:17.000 | 1 | 0.02 | 2017-11-06T17:05:02.995Z | 0 | 0 | 1 | ... | nan | NaN | 158 | 401.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
6132ea1ddd3b6547eb97e272a10df1adbba7bccc | http://www.washingtonpost.com/video/world/cele... | Celebration in Zimbabwe as Mugabe resigns | 2017-11-21 20:13:15.874 | 2017-11-21 19:38:10.000 | 0 | 0.00 | 2017-11-21T21:26:06.149Z | 0 | 0 | 0 | ... | nan | NaN | 191 | nan | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
fe1be446d676c9d8ff35367b6302f4f670f3813b | https://www.huffingtonpost.com/entry/lock-down... | Lock-Down Theology | 2017-11-21 19:46:15.287 | 2017-11-21 19:38:18.176 | 0 | 0.00 | 2017-11-22T00:01:05.534Z | 0 | 0 | 0 | ... | nan | NaN | 215 | 498.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
7dfb319c50b823a7a353edbb239d34cdaa5d8378 | http://www.dailymail.co.uk/tvshowbiz/article-5... | Australia decides: Did Kyle Sandilands just fa... | 2017-12-01 00:08:02.425 | 2017-11-30 23:59:57.000 | 0 | 0.00 | 2017-12-01T01:20:14.296Z | 0 | 0 | 0 | ... | nan | NaN | 158 | 566.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
155376 rows × 26 columns
data["score_diff"] = data.promotion_score - data.response_score
# promoted but low response
data.sort_values("score_diff", ascending=False).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
8fecd816fc621ac4816c2a7aae0d71d3b97e2489 | https://www.buzzfeed.com/keelyflaherty/meet-th... | Meet the Breakout Star of “Lady Bird” | 2017-11-22 17:49:17.714 | 2017-11-20 18:36:28.000 | 1 | 0.09 | 2017-11-22T18:01:13.856Z | 0 | 0 | 1 | ... | 2017-11-24T05:32:00.000Z | 147 | 1258.00 | 0.00 | 19.00 | 15.00 | 11.00 | 45.00 | 45.00 | 45.00 |
c4c68e90d29b9da58609f78ce13692d2bae1ab88 | https://www.buzzfeed.com/borzoudaragahi/afghan... | Afghanistan’s Hipsters Have Found Themselves A... | 2017-11-18 14:04:23.015 | 2017-11-17 13:47:19.000 | 0 | 0.00 | 2017-11-18T14:16:04.593Z | 0 | 0 | 0 | ... | 2017-11-26T19:05:00.000Z | 147 | 996.00 | 0.00 | 19.00 | 14.00 | 11.00 | 44.00 | 44.00 | 44.00 |
fcf67d8979ac5c70155e54a80b87b875f6768244 | https://www.buzzfeed.com/danvergano/beam-me-up... | A Russian Billionaire Just Founded The First “... | 2017-11-25 14:04:25.111 | 2017-11-22 16:45:35.000 | 1 | 0.10 | 2017-11-25T14:15:15.289Z | 0 | 0 | 1 | ... | 2017-11-26T00:33:01.000Z | 147 | 820.00 | 0.00 | 19.00 | 14.00 | 11.00 | 44.00 | 44.00 | 44.00 |
b38c9b0fc469585120c9f6b904463e406916a695 | https://www.buzzfeed.com/adolfoflores/these-ce... | These Central American Children Wonder Why Tru... | 2017-11-16 16:55:17.963 | 2017-11-14 21:02:32.000 | 0 | 0.00 | 2017-11-16T17:06:10.897Z | 0 | 0 | 0 | ... | 2017-11-18T04:45:00.000Z | 147 | 2728.00 | 0.00 | 19.00 | 14.00 | 11.00 | 44.00 | 44.00 | 44.00 |
b3466fdbf642535131f7ef5beaaf63a868a607f6 | https://www.buzzfeed.com/mollyhensleyclancy/wh... | Why Betsy DeVos May Be The Beauty School Indus... | 2017-12-01 14:01:26.916 | 2017-11-29 19:52:29.000 | 0 | 0.00 | 2017-12-01T14:12:11.651Z | 0 | 0 | 0 | ... | 2017-12-02T00:44:01.000Z | 147 | 5706.00 | 0.00 | 19.00 | 13.00 | 11.00 | 43.00 | 43.00 | 43.00 |
7aaf046bf5bbc5ef14e91b7be017c6b1989439c3 | https://www.buzzfeed.com/sylviaobell/nola-darling | Here’s Why You’ll Love And Hate Nola Darling I... | 2017-11-24 16:19:21.409 | 2017-11-22 22:57:31.000 | 3 | 0.10 | 2017-11-24T16:30:18.887Z | 0 | 0 | 3 | ... | 2017-11-28T05:10:00.000Z | 147 | 2489.00 | 1.00 | 19.00 | 14.00 | 11.00 | 44.00 | 45.00 | 43.00 |
ffd6202df243fe5f684c605e023e977a07fafc50 | https://www.buzzfeed.com/doree/meet-the-people... | Meet the People Who Listen to Podcasts at Supe... | 2017-11-12 17:25:14.647 | 2017-11-09 15:06:16.000 | 0 | 0.00 | 2017-11-12T17:36:07.410Z | 0 | 0 | 0 | ... | 2017-11-13T07:04:00.000Z | 147 | 1958.00 | 0.00 | 19.00 | 12.00 | 11.00 | 42.00 | 42.00 | 42.00 |
1e74e5aa86a20450f01357ff3222b67c6c248e58 | https://www.buzzfeed.com/tomphillips/we-found-... | We Found 45 Suspected Bot Accounts Sharing Pro... | 2017-11-24 09:46:18.952 | 2017-11-22 11:18:59.000 | 10 | 0.92 | 2017-11-24T09:58:03.524Z | 0 | 5 | 5 | ... | 2017-11-24T09:48:09.000Z | 147 | 1781.00 | 2.00 | 19.00 | 14.00 | 11.00 | 44.00 | 46.00 | 42.00 |
c4b00967f43439a68074164f3e522875295d49e8 | https://www.buzzfeed.com/charliewarzel/youtube... | YouTube Is Disabling Predatory Comments — But ... | 2017-12-01 01:58:17.657 | 2017-11-29 22:41:29.000 | 2 | 0.20 | 2017-12-01T02:09:08.636Z | 0 | 0 | 2 | ... | 2017-12-01T03:24:00.000Z | 147 | 686.00 | 1.00 | 19.00 | 12.00 | 11.00 | 42.00 | 43.00 | 41.00 |
3490bec24e123c369936928039cb08188e505c1b | https://www.buzzfeed.com/danvergano/chronic-pa... | He Took Opioids To Manage His Chronic Pain. Wh... | 2017-11-30 16:54:26.538 | 2017-11-27 22:52:18.000 | 21 | 0.09 | 2017-11-30T17:06:09.056Z | 2 | 1 | 18 | ... | 2017-12-03T14:44:00.000Z | 147 | 2844.00 | 4.00 | 19.00 | 14.00 | 11.00 | 44.00 | 48.00 | 40.00 |
29347b55be2bdb740e4cc77c14811a90fef18bcf | https://www.huffingtonpost.com/entry/trump-sex... | In Post-Weinstein Era, Trump Sexual Assault Ac... | 2017-11-17 15:07:20.628 | 2017-11-17 14:56:56.877 | 0 | 1156.88 | 2017-11-24T16:46:06.417Z | 0 | 0 | 0 | ... | 2017-11-17T17:00:24.000Z | 215 | 412.00 | 0.00 | 17.00 | 9.00 | 13.00 | 39.00 | 39.00 | 39.00 |
60c4cb79498808a6c041b44af17c5a120edcf1d9 | https://www.buzzfeed.com/nicolenguyen/amazon-s... | How To Avoid Buying Crap On Amazon | 2017-11-21 16:37:18.191 | 2017-11-20 17:04:57.000 | 33 | 0.48 | 2017-11-21T17:49:13.029Z | 2 | 7 | 24 | ... | 2017-11-21T18:42:41.000Z | 147 | 1546.00 | 6.00 | 19.00 | 14.00 | 11.00 | 44.00 | 50.00 | 38.00 |
0fd43fe79e1ff67ed2b12bcfc5fdd7d75c82d28b | https://www.buzzfeed.com/nidhisubbaraman/insec... | We’re Headed Toward A Thanksgiving Brimming Wi... | 2017-11-23 14:09:20.028 | 2017-11-22 13:43:54.000 | 0 | 0.00 | 2017-11-23T14:21:08.008Z | 0 | 0 | 0 | ... | 2017-11-24T01:32:00.000Z | 147 | 813.00 | 0.00 | 19.00 | 7.00 | 11.00 | 37.00 | 37.00 | 37.00 |
281a6307f00daeb13f754eaef9d0487156ed6a1a | https://www.buzzfeed.com/allensalkin/have-your... | Have Yourself A Very Lit Thanksgiving | 2017-11-22 18:37:16.501 | 2017-11-21 22:19:56.000 | 69 | 0.62 | 2017-11-22T19:50:11.243Z | 6 | 55 | 8 | ... | 2017-11-22T19:22:12.000Z | 147 | 1856.00 | 8.00 | 19.00 | 15.00 | 11.00 | 45.00 | 53.00 | 37.00 |
7f69c2b2f31a885208536abb26a93d6798ef2e70 | https://www.theguardian.com/world/2017/nov/15/... | Mugabe family in detention after military take... | 2017-11-15 10:28:08.000 | 2017-11-15 10:26:33.000 | 0 | 32.43 | 2017-11-15T12:11:12.027Z | 0 | 0 | 0 | ... | 2017-11-15T11:00:00.000Z | 142 | 1418.00 | 0.00 | 19.00 | 4.00 | 13.00 | 36.00 | 36.00 | 36.00 |
27da2db0438e2a96fd75fbb198e72c9642dad8f5 | https://www.buzzfeed.com/lissandravilla/former... | Former Congresswoman Who Oversaw Complaints Ag... | 2017-11-28 20:58:15.872 | 2017-11-28 20:54:08.000 | 60 | 0.70 | 2017-11-28T21:09:09.916Z | 5 | 12 | 43 | ... | 2017-11-28T21:02:12.000Z | 147 | 782.00 | 8.00 | 19.00 | 13.00 | 11.00 | 43.00 | 51.00 | 35.00 |
6901f2ad5c91f1eed99b0c2da1a7e2a50070f138 | https://www.buzzfeed.com/laurendecicca/these-d... | These Deported Mothers Are Being Forced To Wat... | 2017-11-24 14:04:21.741 | 2017-11-22 17:39:23.000 | 0 | 0.00 | 2017-11-24T14:16:11.011Z | 0 | 0 | 0 | ... | NaN | 147 | 602.00 | 0.00 | 19.00 | 14.00 | 0.00 | 33.00 | 33.00 | 33.00 |
a0863d12a71fad8de6714b609a611d44c6ef6925 | https://www.buzzfeed.com/borzoudaragahi/papado... | Papadopoulos And Flynn Client Both Tied To Isr... | 2017-11-03 18:43:19.766 | 2017-11-03 16:31:10.000 | 110 | 0.43 | 2017-11-03T19:56:03.804Z | 9 | 57 | 44 | ... | 2017-11-03T19:14:00.000Z | 147 | 636.00 | 11.00 | 19.00 | 14.00 | 11.00 | 44.00 | 55.00 | 33.00 |
f21fc06eb19142e842343339aaff297895aeb3ad | https://www.buzzfeed.com/sandirankaduwa/itsamo... | Meghan Markle Is The Future Of A Monarchy With... | 2017-11-30 18:54:12.449 | 2017-11-29 22:41:16.000 | 128 | 0.62 | 2017-11-30T23:09:10.406Z | 19 | 55 | 54 | ... | 2017-12-01T01:24:00.000Z | 147 | 2561.00 | 11.00 | 19.00 | 14.00 | 11.00 | 44.00 | 55.00 | 33.00 |
5291b060730148ba1031d21f8802481a198b97c2 | https://www.buzzfeed.com/nidhiprakash/pelosi-w... | Pelosi Won't Say If She'll Ask Conyers To Step... | 2017-11-26 15:46:19.929 | 2017-11-26 15:35:42.000 | 109 | 0.64 | 2017-11-26T19:01:06.024Z | 11 | 61 | 37 | ... | 2017-11-26T18:10:37.000Z | 147 | 404.00 | 11.00 | 19.00 | 14.00 | 11.00 | 44.00 | 55.00 | 33.00 |
795037b8fd7078d17f6daa14e7aa401914a3fc70 | https://www.buzzfeed.com/tomiobaro/greta-gerwi... | Complicated Teen Girls Are Finally Front And C... | 2017-11-11 17:49:12.479 | 2017-11-11 00:33:05.000 | 66 | 0.40 | 2017-11-11T18:00:10.036Z | 11 | 43 | 12 | ... | 2017-11-12T22:04:00.000Z | 147 | 1964.00 | 8.00 | 19.00 | 11.00 | 11.00 | 41.00 | 49.00 | 33.00 |
009db07e4f0df962b373de327ba80d4e70563ca0 | https://www.buzzfeed.com/alexkantrowitz/how-to... | How To Tell If The Thanksgiving Content In You... | 2017-11-23 17:31:19.975 | 2017-11-22 19:09:25.000 | 67 | 0.70 | 2017-11-23T19:45:12.565Z | 1 | 43 | 23 | ... | 2017-11-23T18:30:01.000Z | 147 | 832.00 | 8.00 | 19.00 | 10.00 | 11.00 | 40.00 | 48.00 | 32.00 |
d2abaf54b688b990a34f79e83b6773bd7ef15bcb | http://www.dailymail.co.uk/news/article-511482... | Hundreds evacuated from Oxford Circus | 2017-11-24 16:58:25.583 | 2017-11-24 16:56:02.000 | 2 | 209.58 | 2017-11-24T17:42:09.102Z | 0 | 0 | 2 | ... | 2017-11-24T16:57:51.000Z | 158 | 16.00 | 1.00 | 18.00 | 1.00 | 14.00 | 33.00 | 34.00 | 32.00 |
948a86924f6b6cdc8e649b980130dcfbe3320f81 | https://www.buzzfeed.com/tamerragriffin/zimbab... | Zimbabwe’s Ruling Party Have Voted To Sack Rob... | 2017-11-19 12:31:23.195 | 2017-11-19 12:25:11.000 | 188 | 0.67 | 2017-11-19T15:44:16.396Z | 7 | 124 | 57 | ... | 2017-11-19T14:52:00.000Z | 147 | 116.00 | 13.00 | 19.00 | 14.00 | 11.00 | 44.00 | 57.00 | 31.00 |
4549ccb32d5e26888b5503af566de4bc1ae610da | https://www.npr.org/2017/11/26/566006195/6-pos... | 6 Possible Hurdles For The GOP Tax Plan | 2017-11-26 11:31:23.184 | 2017-11-26 11:01:00.000 | 0 | 1130.43 | 2017-11-27T07:22:09.101Z | 0 | 0 | 0 | ... | 2017-11-26T13:55:00.000Z | 594 | 853.00 | 0.00 | 15.00 | 3.00 | 12.00 | 30.00 | 30.00 | 30.00 |
25 rows × 27 columns
# high response but not promoted
data.sort_values("score_diff", ascending=True).head(25)
url | headline | discovered | published | fb_engagements | fb_max_engagements_per_min | fb_max_engagements_per_min_time | fb_comments | fb_reactions | fb_shares | ... | fb_brand_page_time | alexa_rank | word_count | response_score | lead_score | front_score | facebook_promotion_score | promotion_score | attention_index | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
42028ce3b987d9ec95e01274b992204bf27fdb1a | https://www.huffingtonpost.com/entry/exposing-... | Exposing America's Biggest Hypocrites: Evangel... | 2017-11-24 17:31:31.338 | 2017-11-24 17:27:39.012 | 86827 | 79.97 | 2017-11-25T17:20:19.095Z | 18763 | 56521 | 11543 | ... | NaN | 215 | 953.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
0d1704ce4fcd2856a1357d444ab6ddd4610db8f8 | https://www.huffingtonpost.com/entry/since-you... | Since you asked, Roy Moore, here is why victim... | 2017-11-11 18:01:19.973 | 2017-11-11 17:49:19.798 | 85116 | 21.63 | 2017-11-16T01:28:12.610Z | 11355 | 56808 | 16953 | ... | NaN | 215 | 1261.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
93f4b0a5428fd4f228b02b823c4f3922e6130bff | https://www.nytimes.com/2017/11/26/business/me... | Time Inc. Is Said to Near Sale in Deal Backed ... | 2017-11-26 21:07:10.424 | 2017-11-26 21:05:08.000 | 94430 | 328.39 | 2017-11-27T03:06:04.999Z | 20502 | 59091 | 14837 | ... | NaN | 120 | 680.00 | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
36a95aa544343e37d69586a3f628b6ebe4e04220 | https://www.buzzfeed.com/alexandrearagao/manif... | Manifestantes põem fogo em boneca de filósofa ... | 2017-11-07 14:46:25.813 | 2017-11-07 14:35:57.000 | 80375 | 206.32 | 2017-11-07T16:10:08.489Z | 11476 | 63186 | 5713 | ... | NaN | 147 | nan | 50.00 | 0.00 | 0.00 | 0.00 | 0.00 | 50.00 | -50.00 |
71e02bf08d789e6191eac1f32dfefd85c6f71a27 | http://insider.foxnews.com/2017/11/07/mike-pen... | WATCH: Pence Responds to Attacks on Prayer Aft... | 2017-11-08 04:59:07.133 | 2017-11-07 23:50:00.000 | 109556 | 229.75 | 2017-11-08T12:27:07.534Z | 14945 | 89516 | 5095 | ... | NaN | 285 | 197.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
21e7fb448ff66db53054103e7e866ebd8faafdb6 | http://www.foxnews.com/politics/2017/11/03/tru... | Trump embarks on 13-day foreign trip to Asia | 2017-11-03 15:34:17.642 | 2017-11-03 15:21:29.000 | 80522 | 162.73 | 2017-11-03T22:02:04.064Z | 9910 | 68673 | 1939 | ... | NaN | 285 | 1053.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
c08460b2595b412be2e6a31d0c42234a67ba7d21 | http://www.foxnews.com/entertainment/2017/11/3... | Jim Nabors dead; star best known as Gomer Pyle... | 2017-11-30 18:09:13.397 | 2017-11-30 18:03:53.000 | 102762 | 303.00 | 2017-11-30T20:17:10.889Z | 16818 | 70041 | 15903 | ... | NaN | 285 | 172.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
9b07e5f6fc081861f8d0c6d6bb478e6c43d3fba6 | http://yournewswire.com/jay-z-jesus-fake-lucifer/ | Jay-Z: 'Jesus Is Fake News; Lucifer Is Way Of ... | 2017-11-13 17:54:26.080 | 2017-11-13 17:27:17.000 | 85940 | 32.63 | 2017-11-14T16:17:12.675Z | 42233 | 33501 | 10206 | ... | NaN | 22568 | 521.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
26f86dbc9b9cc02d1e3130f085f97195b585a3a4 | https://www.washingtonpost.com/news/sports/wp/... | Shalane Flanagan becomes first American woman ... | 2017-11-05 16:59:18.064 | 2017-11-05 16:56:00.000 | 114335 | 1522.95 | 2017-11-06T16:28:13.194Z | 3220 | 106152 | 4963 | ... | NaN | 191 | 32.00 | 50.00 | 0.00 | 1.00 | 0.00 | 1.00 | 51.00 | -49.00 |
6d75837f69cebb8b1cefae20422de0c1ea000beb | https://www.theguardian.com/environment/2017/n... | Michael Bloomberg’s ‘war on coal’ goes global ... | 2017-11-09 06:49:05.637 | 2017-11-09 06:45:06.000 | 328658 | 409.75 | 2017-11-11T18:03:11.221Z | 462 | 327101 | 1095 | ... | NaN | 142 | 740.00 | 50.00 | 0.00 | 2.00 | 0.00 | 2.00 | 52.00 | -48.00 |
8ebbbf91d0f0d5617002ef50d4738426ba4d72b2 | https://www.huffingtonpost.com/entry/repent-an... | Repent and Believe in the Gospel! Over 300 Chr... | 2017-11-20 20:46:19.033 | 2017-11-20 20:40:02.257 | 62306 | 29.13 | 2017-11-21T17:00:04.727Z | 9176 | 41868 | 11262 | ... | NaN | 215 | 981.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
6f2b23a4e1bcf2a0aaec840d0c405f2555e0021c | https://www.buzzfeed.com/beatrizserranomolina/... | ¿Y cómo se supone que se tiene que comportar u... | 2017-11-15 13:01:19.016 | 2017-11-15 11:47:22.000 | 61572 | 54.31 | 2017-11-15T22:11:07.513Z | 7987 | 43761 | 9824 | ... | NaN | 147 | 1010.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
bbbf07f32f7cd2f97ab067f0cca7236f88d75347 | https://www.washingtonpost.com/news/education/... | Teachers spend nearly $1,000 a year on supplie... | 2017-11-03 00:37:11.237 | 2017-11-03 00:24:13.000 | 62320 | 28.91 | 2017-11-04T16:16:11.718Z | 13950 | 36628 | 11742 | ... | NaN | 191 | 554.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
ba4dd0d3636f52b2c3acace3a81c55dd0ccb4a7e | http://www.independent.co.uk/life-style/people... | People who put up Christmas decorations early ... | 2017-11-20 15:19:18.674 | 2017-11-20 15:18:29.000 | 57406 | 220.70 | 2017-11-21T16:08:07.332Z | 23434 | 30356 | 3616 | ... | NaN | 386 | 363.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 48.00 | -48.00 |
40c8c55d685b08a0be04a3ff12d943c00dfc1d95 | https://www.huffingtonpost.com/entry/what-work... | What Working People Face | 2017-11-05 20:01:14.905 | 2017-11-05 19:57:11.582 | 55579 | 29.91 | 2017-11-06T18:32:10.656Z | 2505 | 48871 | 4203 | ... | NaN | 215 | 832.00 | 47.00 | 0.00 | 0.00 | 0.00 | 0.00 | 47.00 | -47.00 |
328125a8ff5f9d8fa2f27e87f31f825a8d84afbb | https://www.nytimes.com/2017/11/25/us/ohio-hov... | In America’s Heartland, the Nazi Sympathizer N... | 2017-11-25 16:22:14.136 | 2017-11-25 16:18:53.000 | 63540 | 45.21 | 2017-11-26T00:09:05.054Z | 29483 | 26172 | 7885 | ... | NaN | 120 | 2267.00 | 48.00 | 0.00 | 1.00 | 0.00 | 1.00 | 49.00 | -47.00 |
5d4dffaf16d73098cf982bdb11b7c43ceae43104 | http://www.foxnews.com/politics/2017/11/16/tru... | Trump to lift ban on importing elephant trophi... | 2017-11-16 07:24:13.207 | 2017-11-16 06:46:06.000 | 54014 | 69.05 | 2017-11-16T15:56:07.176Z | 22421 | 25194 | 6399 | ... | NaN | 285 | 355.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
8a826e4e1f393df72e7860ce819b52385cb47ff7 | https://www.washingtonpost.com/news/wonk/wp/20... | 37 of 38 economists said the GOP tax plans wou... | 2017-11-22 16:01:20.600 | 2017-11-22 15:47:42.000 | 50910 | 140.55 | 2017-11-22T22:25:10.300Z | 6022 | 35960 | 8928 | ... | NaN | 191 | 672.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
90d101797f1de9c57000613cd336854586fd7fa4 | https://www.washingtonpost.com/news/national/w... | Charles Manson, the fiery-eyed cult leader who... | 2017-11-20 05:59:15.649 | 2017-11-20 05:53:00.000 | 51672 | 174.73 | 2017-11-20T06:10:05.982Z | 13170 | 31137 | 7365 | ... | NaN | 191 | 146.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
069b9faa8dfdb74f280b47a0ad9ce97c73bd7d4d | http://www.foxnews.com/science/2017/11/07/hell... | 'Hell is Here' for burning elephants in award-... | 2017-11-07 16:54:16.736 | 2017-11-07 16:54:16.736 | 50714 | 248.06 | 2017-11-08T14:06:07.932Z | 12146 | 32338 | 6230 | ... | NaN | 285 | 380.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
766d2bd690e1055b804eaae1ea439964586aa8c8 | https://www.washingtonpost.com/news/politics/w... | In a rarity, Trump says administration will ba... | 2017-11-18 01:49:14.023 | 2017-11-18 01:45:00.000 | 49202 | 234.59 | 2017-11-18T18:22:05.966Z | 8326 | 34803 | 6073 | ... | NaN | 191 | 73.00 | 47.00 | 0.00 | 1.00 | 0.00 | 1.00 | 48.00 | -46.00 |
2491314cee8db9dfac626a215a1efcf87fa340e7 | https://www.huffingtonpost.com/entry/kaeptain-... | KAEPtain America: This Is What A Patriot Looks... | 2017-11-13 23:16:13.630 | 2017-11-13 23:10:19.532 | 47303 | 78.29 | 2017-11-14T02:29:06.819Z | 6663 | 38123 | 2517 | ... | NaN | 215 | 847.00 | 46.00 | 0.00 | 0.00 | 0.00 | 0.00 | 46.00 | -46.00 |
c6d5ebd4e61ff24c3d1a6860a0eeebe586ebbd4b | http://yournewswire.com/cia-agent-deathbed-bob... | CIA Agent Confesses On Deathbed: ‘I Killed Bob... | 2017-11-30 17:24:36.210 | 2017-11-30 16:56:00.000 | 109979 | 46.52 | 2017-12-02T15:10:16.853Z | 34047 | 49632 | 26300 | ... | 2017-12-01T15:04:01.000Z | 22568 | 907.00 | 50.00 | 0.00 | 1.00 | 4.00 | 5.00 | 55.00 | -45.00 |
844ef6eb6931cdb104db43fbb85318c886a6f89d | https://www.theguardian.com/australia-news/201... | Same-sex marriage bill passes in Australian Se... | 2017-11-29 02:37:10.818 | 2017-11-29 02:34:52.000 | 35803 | 144.64 | 2017-11-29T02:48:15.367Z | 2219 | 31450 | 2134 | ... | NaN | 142 | 693.00 | 45.00 | 0.00 | 0.00 | 0.00 | 0.00 | 45.00 | -45.00 |
66d9977c8749c1bd45ecdfe077fbabdf5012f716 | http://www.breitbart.com/big-government/2017/1... | President Trump Will Permit Citizens to Buy Mi... | 2017-11-24 23:24:17.385 | 2017-11-24 12:09:26.000 | 46124 | 24.30 | 2017-11-26T17:40:16.302Z | 6806 | 35818 | 3500 | ... | NaN | 994 | 266.00 | 46.00 | 0.00 | 1.00 | 0.00 | 1.00 | 47.00 | -45.00 |
25 rows × 27 columns
Write that data to a file. Note that the scores here are provisional for two reasons:
data.to_csv("articles_with_provisional_scores_2017-10-01_2017-10-31.csv")
The attention index of an article is comprised of four components:
Or, in other words:
\begin{align} attentionIndex_a &= leadScore_a + frontScore_a + facebookPromotionScore_a + responseScore_a \\ leadScore_a &= 20 \cdot \left(\frac{\min(minsAsLead_a, 60)}{alexaRank_a}\right) \cdot \left( \frac{\min(alexaRank)}{60} \right) \\ frontScore_a &= 15 \cdot \left(\frac{\min(minsOnFront_a, 1440)}{alexaRank_a \cdot numArticlesOnFront_a}\right) \cdot \left( \frac{\min(alexaRank \cdot numArticlesOnFront)}{1440} \right) \\ facebookPromotion_a &= \begin{cases} 0 \text{ if not shared on brand page }\\ 15 \cdot \frac{\log(brandPageLikes_a) - \log(\min(brandPageLikes))}{\log(\max(brandPageLikes)) - \log(\min(brandPageLikes))} \text{ otherwise } \end{cases} \\ responseScore_a &= \begin{cases} 0 \text{ if } engagements_a = 0 \\ 50 \cdot \frac{\log(\min(engagements_a,limit) + median(engagements)) - \log(1 + median(engagements))} {\log(limit + median(engagements)) - \log(1 + median(engagements))} \text{ if } engagements_a > 0 \end{cases} \\ \end{align}