We would like to know the percentage of logged-in users that have opted out from the new version of vector:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
library(tidyverse); library(glue); library(lubridate); library(scales)
})
A new schema, mediawiki_preff_diff, was created in T261842 to track the opt-ins and opt-outs of desktop improvements and deployed on 8 April 2021. Unlike PrefUpdate, this schema logs an arbitrary intial
and final
state (e.g. a user switched from latest Vector to Monobook). It also allows for the use of a hashed user ID.
QA Checks: I queried the data and looked through the following aggregations to determine if the data appears as expected based on instrumentation and time of deployment.
query <-
"
SELECT
date_format(dt, 'yyyy-MM-dd') AS `date`,
user_hash AS `user`,
initial_state AS initial_state,
final_state AS final_state,
bucketed_user_edit_count AS edit_count,
meta.domain as wiki,
COUNT(*) AS num_selections
FROM event.mediawiki_pref_diff
WHERE
year = 2021
AND month >= 04
AND day >= 15
GROUP BY
date_format(dt, 'yyyy-MM-dd') ,
user_hash,
initial_state,
final_state,
bucketed_user_edit_count,
meta.domain
"
desktop_pref_updates <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit
desktop_pref_updates$date <- as.Date(desktop_pref_updates$date, format = "%Y-%m-%d")
options(repr.plot.width = 15, repr.plot.height = 15)
desktop_pref_updates_events_bydate <- desktop_pref_updates %>%
#filter(final_state %in% c('vector1', 'vector2')) %>%
group_by(date) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
desktop_pref_updates_events_bydate
`summarise()` ungrouping output (override with `.groups` argument)
date | total_selections | total_users |
---|---|---|
<date> | <int> | <int> |
2021-04-15 | 57 | 35 |
2021-04-16 | 297 | 196 |
2021-04-17 | 257 | 171 |
2021-04-18 | 271 | 174 |
2021-04-19 | 360 | 230 |
2021-04-20 | 300 | 197 |
2021-04-21 | 294 | 193 |
2021-04-22 | 270 | 177 |
2021-04-23 | 232 | 156 |
2021-04-24 | 254 | 181 |
2021-04-25 | 246 | 167 |
2021-04-26 | 336 | 216 |
2021-04-27 | 333 | 205 |
2021-04-28 | 361 | 217 |
2021-04-29 | 348 | 215 |
2021-04-30 | 264 | 178 |
2021-05-15 | 254 | 163 |
2021-05-16 | 300 | 186 |
2021-05-17 | 350 | 230 |
2021-05-18 | 291 | 201 |
2021-05-19 | 265 | 183 |
2021-05-20 | 329 | 196 |
2021-05-21 | 495 | 268 |
2021-05-22 | 300 | 201 |
2021-05-23 | 319 | 184 |
2021-05-24 | 356 | 219 |
2021-05-25 | 320 | 197 |
2021-05-26 | 287 | 190 |
2021-05-27 | 264 | 171 |
2021-05-28 | 267 | 153 |
2021-05-29 | 257 | 162 |
2021-05-30 | 293 | 181 |
2021-05-31 | 268 | 180 |
Events start appearing in the database on April 15th. There is an average 260 preference updates by 171 users per day. This is expected as it's possible for the same user to make more than one preference update.
p <- desktop_pref_updates %>%
#filter(final_state %in% c('vector1', 'vector2')) %>%
group_by(date, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user)) %>%
ggplot(aes(x = date, y = total_selections, color = final_state)) +
geom_line(size = 1.5) +
scale_x_date() +
labs(title = "Daily number of preference update events across all wikis",
y = "Number of selections") +
theme_bw() +
scale_color_brewer(name="Final State", palette="Set2") +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
legend.position = "bottom")
p
`summarise()` regrouping output by 'date' (override with `.groups` argument)
## Vector Events
vector_events <- desktop_pref_updates %>%
filter(final_state == 'vector') %>%
group_by(date, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
vector_events
`summarise()` regrouping output by 'date' (override with `.groups` argument)
date | final_state | total_selections | total_users |
---|---|---|---|
<date> | <chr> | <int> | <int> |
2021-04-16 | vector | 13 | 5 |
2021-04-17 | vector | 17 | 12 |
2021-04-18 | vector | 7 | 3 |
2021-04-19 | vector | 11 | 6 |
2021-04-20 | vector | 1 | 1 |
2021-04-21 | vector | 1 | 1 |
2021-04-22 | vector | 3 | 3 |
2021-04-23 | vector | 11 | 7 |
2021-04-24 | vector | 4 | 4 |
2021-04-25 | vector | 9 | 6 |
2021-04-26 | vector | 37 | 15 |
2021-04-27 | vector | 29 | 15 |
2021-04-28 | vector | 6 | 5 |
2021-04-29 | vector | 15 | 8 |
2021-04-30 | vector | 10 | 7 |
2021-05-15 | vector | 3 | 2 |
2021-05-16 | vector | 4 | 2 |
2021-05-17 | vector | 11 | 6 |
Confirmed we stop recording vector events following fix: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/690789/ deployed on May 17th.
We stop seeing flucturations from May 1st - May 15th. This was due to changes in the config due to the AB Test.
p <- desktop_pref_updates %>%
#filter(final_state %in% c('vector1', 'vector2')) %>%
group_by(date, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user)) %>%
ggplot(aes(x = date, y = total_users, color = final_state)) +
geom_line(size = 1.5) +
labs(title = "Daily number of preference update users by final selected state \n across all wikis",
y = "Daily distinct users")+
theme_bw() +
scale_color_brewer(name="Final State", palette="Set2") +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
legend.position = "bottom")
p
`summarise()` regrouping output by 'date' (override with `.groups` argument)
A review of the daily number of final state preference updates by user and wiki appear as expected. The majority of preference updates each day since we began recording are for vector 1 and vector2 desktop skins. Daily trends are consistent - There are no signficant drops or spikes in the data indicating any anomalies.
desktop_pref_updates_events_bydate_target <- desktop_pref_updates %>%
filter(wiki %in% c('fr.wiktionary.org', 'he.wikipedia.org', 'pt.wikiversity.org', 'fr.wikipedia.org',
'eu.wikipedia.org', 'fa.wikipedia.org', 'pt.wikipedia.org', 'ko.wikipedia.org', 'tr.wikipedia.org',
'sr.wikipedia.org', 'bn.wikipedia.org', 'de.wikivoyage.org', 'vec.wikipedia.org')) %>%
filter(final_state %in% c('vector1', 'vector2'),
initial_state %in% c('vector1', 'vector2')) %>%
group_by(date, wiki, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
`summarise()` regrouping output by 'date', 'wiki' (override with `.groups` argument)
p <- desktop_pref_updates_events_bydate_target %>%
ggplot(aes(x = date, y = total_selections, color = final_state)) +
geom_line(size = 1.5) +
facet_wrap (~ wiki) +
labs(title = "Daily number of preference updates by vector version \n across all early adopter wikis",
y = "Number of preference updates") +
scale_color_brewer(name="Final State", palette="Set1") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
legend.position = "bottom")
p
p <- desktop_pref_updates_events_bydate_target %>%
ggplot(aes(x = date, y = total_users, color = final_state)) +
geom_line(size = 1.5) +
facet_wrap (~ wiki) +
labs(title = "Daily number of preference update users by vector version \n across all early adopter wikis",
y = "Daily number of distinct users") +
scale_color_brewer(name="Final State", palette="Set1") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
legend.position = "bottom")
p
There are more users that opt-out than opt-in as this new desktop skin (vector2) was presented as default to all logged-in users on all target wikis.
There have only been 1 to 8 vector opt-in or opt-outs a day on each target wiki since April 15th
desktop_pref_updates_bystate <- desktop_pref_updates %>%
filter(date > '2021-05-17') %>% # following deployment fix
group_by(initial_state, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
desktop_pref_updates_bystate
`summarise()` regrouping output by 'initial_state' (override with `.groups` argument)
initial_state | final_state | total_selections | total_users |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
minerva | vector1 | 144 | 133 |
minerva | vector2 | 152 | 133 |
modern | vector1 | 102 | 98 |
modern | vector2 | 73 | 67 |
monobook | vector1 | 119 | 109 |
monobook | vector2 | 114 | 106 |
timeless | vector1 | 180 | 170 |
timeless | vector2 | 123 | 119 |
vector1 | minerva | 437 | 418 |
vector1 | modern | 345 | 338 |
vector1 | monobook | 248 | 235 |
vector1 | timeless | 372 | 364 |
vector1 | vector2 | 762 | 649 |
vector1 | wikimediaapiportal | 1 | 1 |
vector2 | minerva | 168 | 153 |
vector2 | modern | 98 | 91 |
vector2 | monobook | 98 | 92 |
vector2 | timeless | 97 | 89 |
vector2 | vector1 | 678 | 581 |
p <- desktop_pref_updates_bystate %>%
ggplot(aes(x= initial_state, y = total_users, group = initial_state)) +
geom_col(fill = "dark blue") +
geom_text(aes(label = paste(initial_state)), color = "black", vjust=-1, size = 5) +
facet_wrap(~ final_state) +
labs(title = "Number of preference update users by final selected state \n across all wikis",
y = "Number of distinct users") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
legend.position = "bottom")
p
Potential Issues: We're recording a couple instances of someone going from vector 2 to vector (instead of vector 1). This was due to a bug that was fixed on 17 May.
After a recheck of the data, I noticed an instance where a user switched from initial_state = 'vector1'
to final_state = 'wikimediaapiportal'
head(desktop_pref_updates)
date | user | initial_state | final_state | edit_count | wiki | num_selections | |
---|---|---|---|---|---|---|---|
<date> | <chr> | <chr> | <chr> | <chr> | <chr> | <int> | |
1 | 2021-04-15 | 00cfaafe818b6da3e5487a7f6326afd8f69e952c5227cfc8d3aee50368917d1b21b9fc073965a077c727bcfeede77d12965012dad42f5db50723194432d13271 | vector1 | modern | 0 edits | de.wikipedia.org | 1 |
2 | 2021-04-15 | 23801399a076d16cc990b560e8a036800b99c014028390294b7a1d1ab7564f1e69ca1b081e2b5b880c034afe4946a6951ec9a5e1a38104e498dc031bc2decce3 | timeless | vector1 | 1000+ edits | fr.wikipedia.org | 1 |
3 | 2021-04-15 | 23801399a076d16cc990b560e8a036800b99c014028390294b7a1d1ab7564f1e69ca1b081e2b5b880c034afe4946a6951ec9a5e1a38104e498dc031bc2decce3 | vector1 | vector2 | 1000+ edits | fr.wikipedia.org | 2 |
4 | 2021-04-15 | 23801399a076d16cc990b560e8a036800b99c014028390294b7a1d1ab7564f1e69ca1b081e2b5b880c034afe4946a6951ec9a5e1a38104e498dc031bc2decce3 | vector2 | timeless | 1000+ edits | fr.wikipedia.org | 1 |
5 | 2021-04-15 | 23801399a076d16cc990b560e8a036800b99c014028390294b7a1d1ab7564f1e69ca1b081e2b5b880c034afe4946a6951ec9a5e1a38104e498dc031bc2decce3 | vector2 | vector1 | 1000+ edits | fr.wikipedia.org | 1 |
6 | 2021-04-15 | 257dfc3b24425759a3f9a32592dcdd559d9a4587bd3018b05f9214896d14aed37c6524fcdac17ff3d2e18bdd83f38aaa126e9363941f22bbb79e4568c4d84980 | vector2 | timeless | 0 edits | tr.wikipedia.org | 1 |
## Look into wikimediapiportal switch
wikimediapiportal_events <- desktop_pref_updates %>%
filter(final_state== 'wikimediaapiportal')
wikimediapiportal_events
date | user | initial_state | final_state | edit_count | wiki | num_selections |
---|---|---|---|---|---|---|
<date> | <chr> | <chr> | <chr> | <chr> | <chr> | <int> |
2021-05-19 | 44184be8b844f70ae57a7c7b7e8440d3dda86d9772842fc71d09acac1d545a60f3c3cc89bea6547a2cb3e81a1f8a152011633db36a723d6e6b213e224dd6e2cf | vector1 | wikimediaapiportal | 0 edits | api.wikimedia.org | 1 |
desktop_pref_updates_bystate_target <- desktop_pref_updates %>%
filter(wiki %in% c('fr.wiktionary.org', 'he.wikipedia.org', 'pt.wikiversity.org', 'fr.wikipedia.org',
'eu.wikipedia.org', 'fa.wikipedia.org', 'pt.wikipedia.org', 'ko.wikipedia.org', 'tr.wikipedia.org',
'sr.wikipedia.org', 'bn.wikipedia.org', 'de.wikivoyage.org', 'vec.wikipedia.org'),
date > '2021-05-17' ) %>%
group_by(initial_state, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
desktop_pref_updates_bystate_target
`summarise()` regrouping output by 'initial_state' (override with `.groups` argument)
initial_state | final_state | total_selections | total_users |
---|---|---|---|
<chr> | <chr> | <int> | <int> |
minerva | vector1 | 8 | 8 |
minerva | vector2 | 43 | 41 |
modern | vector1 | 6 | 6 |
modern | vector2 | 22 | 22 |
monobook | vector1 | 9 | 9 |
monobook | vector2 | 29 | 28 |
timeless | vector1 | 12 | 12 |
timeless | vector2 | 30 | 30 |
vector1 | minerva | 10 | 10 |
vector1 | modern | 9 | 9 |
vector1 | monobook | 10 | 10 |
vector1 | timeless | 14 | 13 |
vector1 | vector2 | 83 | 75 |
vector2 | minerva | 80 | 77 |
vector2 | modern | 58 | 57 |
vector2 | monobook | 42 | 42 |
vector2 | timeless | 49 | 46 |
vector2 | vector1 | 190 | 177 |
p <- desktop_pref_updates_bystate_target %>%
ggplot(aes(x= initial_state, y = total_users, group = initial_state)) +
geom_col(fill = "dark blue") +
geom_text(aes(label = paste(initial_state)), color = "black", vjust=-1, size = 5) +
facet_wrap(~ final_state) +
labs(title = "Number of preference update users by final selected state \n across all target wikis",
y = "Number of distinct users",
subtitle = "Each bar represents the initial state and each chart the final state") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
text = element_text(size=14),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
legend.position = "bottom")
p
The number of updates to each available desktop skin appears to fit with current config states. For the target wikis, the majority of users updated their preferences from vector2 to vector1, which is expected since vector 2 was deployed as opt-out to all of the pilot wikis. Users would only need to explicility update their preference to Vector1 from Vector2 if they opt-out and then decided to opt back in again.
We see more users update from vector 1 to vector 2 when looking across all wikis since it is available as an opt-in preference across all wikis.
vec2_to_vector <- desktop_pref_updates %>%
filter(initial_state == 'vector2',
final_state == 'vector') %>%
group_by( wiki) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
vec2_to_vector
`summarise()` ungrouping output (override with `.groups` argument)
wiki | total_selections | total_users |
---|---|---|
<chr> | <int> | <int> |
bn.wikipedia.org | 1 | 1 |
bn.wikisource.org | 1 | 1 |
bn.wikivoyage.org | 1 | 1 |
bs.wikipedia.org | 4 | 2 |
commons.wikimedia.org | 4 | 3 |
de.wikipedia.org | 3 | 1 |
en.wikinews.org | 2 | 1 |
en.wikipedia.org | 35 | 14 |
en.wikisource.org | 4 | 1 |
en.wikivoyage.org | 2 | 1 |
en.wiktionary.org | 4 | 2 |
fr.wikipedia.org | 9 | 2 |
it.wikipedia.org | 5 | 2 |
ja.wikinews.org | 1 | 1 |
ja.wikipedia.org | 9 | 1 |
meta.wikimedia.org | 3 | 3 |
pt.wikipedia.org | 6 | 2 |
ru.wikipedia.org | 3 | 1 |
simple.wikipedia.org | 6 | 2 |
species.wikimedia.org | 1 | 1 |
test.wikipedia.org | 3 | 3 |
www.wikidata.org | 4 | 2 |
zh.wikipedia.org | 5 | 4 |
zh.wiktionary.org | 1 | 1 |
The vector2 to vector switch across a number of different wiki projects including test and non-test wikis so there doesn't seem to be a correlation there.
vec2_to_vector_daily <- desktop_pref_updates %>%
filter(initial_state == 'vector2',
final_state == 'vector') %>%
group_by(date, wiki) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
vec2_to_vector_daily
`summarise()` regrouping output by 'date' (override with `.groups` argument)
date | wiki | total_selections | total_users |
---|---|---|---|
<date> | <chr> | <int> | <int> |
2021-04-16 | commons.wikimedia.org | 1 | 1 |
2021-04-16 | en.wikipedia.org | 3 | 1 |
2021-04-16 | en.wiktionary.org | 1 | 1 |
2021-04-16 | fr.wikipedia.org | 7 | 1 |
2021-04-16 | zh.wikipedia.org | 1 | 1 |
2021-04-17 | commons.wikimedia.org | 1 | 1 |
2021-04-17 | en.wikinews.org | 2 | 1 |
2021-04-17 | en.wikipedia.org | 8 | 4 |
2021-04-17 | en.wikisource.org | 1 | 1 |
2021-04-17 | ja.wikinews.org | 1 | 1 |
2021-04-17 | meta.wikimedia.org | 1 | 1 |
2021-04-17 | ru.wikipedia.org | 1 | 1 |
2021-04-17 | simple.wikipedia.org | 1 | 1 |
2021-04-17 | www.wikidata.org | 1 | 1 |
2021-04-18 | en.wikipedia.org | 2 | 1 |
2021-04-18 | it.wikipedia.org | 4 | 1 |
2021-04-18 | pt.wikipedia.org | 1 | 1 |
2021-04-19 | de.wikipedia.org | 3 | 1 |
2021-04-19 | en.wikipedia.org | 1 | 1 |
2021-04-19 | en.wikisource.org | 3 | 1 |
2021-04-19 | www.wikidata.org | 3 | 2 |
2021-04-19 | zh.wikipedia.org | 1 | 1 |
2021-04-20 | ru.wikipedia.org | 1 | 1 |
2021-04-21 | zh.wikipedia.org | 1 | 1 |
2021-04-22 | en.wikipedia.org | 1 | 1 |
2021-04-22 | pt.wikipedia.org | 1 | 1 |
2021-04-22 | zh.wikipedia.org | 1 | 1 |
2021-04-23 | en.wikipedia.org | 8 | 4 |
2021-04-23 | meta.wikimedia.org | 1 | 1 |
2021-04-23 | ru.wikipedia.org | 1 | 1 |
2021-04-23 | species.wikimedia.org | 1 | 1 |
2021-04-24 | en.wikipedia.org | 1 | 1 |
2021-04-24 | en.wiktionary.org | 1 | 1 |
2021-04-24 | test.wikipedia.org | 1 | 1 |
2021-04-24 | zh.wiktionary.org | 1 | 1 |
2021-04-25 | bn.wikipedia.org | 1 | 1 |
2021-04-25 | bn.wikisource.org | 1 | 1 |
2021-04-25 | commons.wikimedia.org | 1 | 1 |
2021-04-25 | it.wikipedia.org | 1 | 1 |
2021-04-25 | pt.wikipedia.org | 4 | 1 |
2021-04-25 | zh.wikipedia.org | 1 | 1 |
2021-04-26 | bn.wikivoyage.org | 1 | 1 |
2021-04-26 | bs.wikipedia.org | 4 | 2 |
2021-04-26 | commons.wikimedia.org | 1 | 1 |
2021-04-26 | en.wikipedia.org | 10 | 5 |
2021-04-26 | en.wikivoyage.org | 2 | 1 |
2021-04-26 | en.wiktionary.org | 2 | 1 |
2021-04-26 | fr.wikipedia.org | 2 | 1 |
2021-04-26 | ja.wikipedia.org | 9 | 1 |
2021-04-26 | simple.wikipedia.org | 5 | 1 |
2021-04-26 | test.wikipedia.org | 1 | 1 |
2021-04-27 | en.wikipedia.org | 1 | 1 |
2021-04-27 | meta.wikimedia.org | 1 | 1 |
2021-04-27 | test.wikipedia.org | 1 | 1 |
No sharp increases or declines by date or wiki. Most of these switches from 'vector2' to 'vector' occur on en.wikpedia or zh.wikipedia.
# set factor levels
desktop_pref_updates$edit_count <- factor(desktop_pref_updates$edit_count, levels = c("0 edits", "1-4 edits",
"5-99 edits", "100-999 edits",
"1000+ edits"))
desktop_pref_updates_usercount <- desktop_pref_updates %>%
group_by(edit_count) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
desktop_pref_updates_usercount
`summarise()` ungrouping output (override with `.groups` argument)
edit_count | total_selections | total_users |
---|---|---|
<fct> | <int> | <int> |
0 edits | 1425 | 1012 |
1-4 edits | 488 | 324 |
5-99 edits | 715 | 421 |
100-999 edits | 318 | 158 |
1000+ edits | 437 | 192 |
The data by edit count appears as expected. We see the most preference switches by users with 0 edits. Almost half of all the preference updates come from users with 0 edits (48.03%). For editors (those that have made at least 1 edit), the majority of desktop preference updates (68%) are by users with under 99 edits.
desktop_pref_updates_usercount_target <- desktop_pref_updates %>%
filter(wiki %in% c('fr.wiktionary.org', 'he.wikipedia.org', 'pt.wikiversity.org', 'fr.wikipedia.org',
'eu.wikipedia.org', 'fa.wikipedia.org', 'pt.wikipedia.org', 'ko.wikipedia.org', 'tr.wikipedia.org',
'sr.wikipedia.org', 'bn.wikipedia.org', 'de.wikivoyage.org', 'vec.wikipedia.org')) %>%
filter(final_state %in% c('vector1', 'vector2'),
initial_state %in% c('vector1', 'vector2')) %>%
group_by(edit_count, final_state) %>%
summarise(total_selections = sum(num_selections),
total_users = n_distinct(user))
desktop_pref_updates_usercount_target
`summarise()` regrouping output by 'edit_count' (override with `.groups` argument)
edit_count | final_state | total_selections | total_users |
---|---|---|---|
<fct> | <chr> | <int> | <int> |
0 edits | vector1 | 71 | 68 |
0 edits | vector2 | 9 | 8 |
1-4 edits | vector1 | 29 | 29 |
1-4 edits | vector2 | 2 | 2 |
5-99 edits | vector1 | 46 | 42 |
5-99 edits | vector2 | 13 | 11 |
100-999 edits | vector1 | 25 | 23 |
100-999 edits | vector2 | 10 | 10 |
1000+ edits | vector1 | 34 | 30 |
1000+ edits | vector2 | 23 | 18 |