New Discussion Tool Adoption Metrics

Task

Last Updated: 17 August 2021

Code Repository

Purpose

The New Discussion Tool was deployed as an opt-in beta feature to all logged-in users to improve contributors' workflows for starting new discussion threads on talk pages, across Wikipedia's 16 talk namespaces. See the project page for more details.

Deployment dates:

  • 18 February 2021: Arabic, Czech and Hungarian Wikipedias.
  • 10 March 2021: All Wikipedias except the English, German, and Russian Wikipedias.
  • 16 March 2021: English and Russian Wikipedias and all Wikimedia Sister Projects.

The purpose of this analysis is to understand how people are engaging with the New Discussion Tool beta feature to help us determine whether the New Discussion Tool is ready to be made available to all people by default at some sub-set of wikis. This analysis is intended to help us answer these questions:

  • Are people finding the tool to be disruptive?
  • Are people finding the tool behaves in the ways they expect?
  • Who has been using the new Discussion Tool and how much have they been using it?

Data

Data for this analysis comes from a combination of the following sources:

For this analysis, we reviewed events logged from the data of deployment as a beta feature (18 February 2021) through the end of July (31 July 2021). For each metric, we calculated metrics for overall (across all Wikimedia projects), by experience level (users cumulative edit count), and by the specific Wikipedias we are considering opt-out deployments (Arabic and Czech Wikipedia).

In [308]:
library(IRdisplay)

display_html(
'<script>  
code_show=true; 
function code_toggle() {
  if (code_show){
    $(\'div.input\').hide();
  } else {
    $(\'div.input\').show();
  }
  code_show = !code_show
}  
$( document ).ready(code_toggle);
</script>
  <form action="javascript:code_toggle()">
    <input type="submit" value="Click here to toggle on/off the raw code.">
 </form>'
)
In [6]:
# load required packages
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(tidyverse); library(glue); library(lubridate); library(scales)
})

Disruption Metrics

What percent of contributors that explicitly disabled the discussion tools beta feature after making at least one new discussion edit?

Purpose: Do people using the New Discussion Tool find it disruptive?

We reviewed how many new discussion tool users explicitly1turned off the feature after making at least one edit.

Data Desciption and Assumptions:

  • User Preference changes comes from the PrefUpdate eventlogging data.
  • We can only review last 90 days of data due to sanitization of prefupdate data, which at the time of this analysis was 10 May 20201 through 31 July 2021. We do not have data on the number of users that opted out prior to that date. To supplement this data and account for users that opted out prior to this 90 day period, we reviewed current user preference settings recorded in the mediawiki_user_history table. Please see "New_discussion_tool_opt_out_analysis.ipynb" located in the code repository for details of this analysis.
  • Excludes users that opted in and out multiple times.
  • There is user preference (event.property = 'discussiontools-betaenable') that allows a user to explicitly turn on or off all discussion tool beta features. This includes both the reply tool and new discussion tool - these features are not turned off individually.

  1. "Explicitly" turned on indicates users did not have the Automatically enable all new beta features preference checked. Note explicilty turned off could include users that were auto enrolled and then turned off the feature.

In [54]:
query <- "
--find users that opted out of the discussiontool beta feature 
WITH opt_out_users AS (
SELECT
    event.userid as opt_out_user,
    wiki as opt_out_wiki,
    min(event.saveTimestamp) as opt_out_time,
    sum(cast(event.value = '\"0\"' as int)) as opt_outs
FROM 
    event.prefupdate
WHERE
    event.property = 'discussiontools-betaenable' AND
    event.value = '\"0\"' AND
    CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '2021-05-18' AND
    CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) <= '2021-07-31'
GROUP BY 
    event.userid, 
    wiki
),

-- find users that made at least one edit with the new discussion tool
new_topic_users AS (
SELECT
    event_user_id as new_topic_user,
    wiki_db as new_topic_wiki,
    min(mh.event_timestamp) as first_post,
    CASE
        WHEN min(event_user_revision_count) < 100 THEN 'under 100'
        WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
        ELSE 'over 500'
        END AS edit_count_group,
    min(event_user_revision_count)AS edit_count
FROM wmf.mediawiki_history AS mh
WHERE 
    ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic') 
    AND snapshot = '2021-07' 
-- date of first deployment
    AND event_timestamp >= '2021-02-18'  
    AND event_timestamp <= '2021-07-31'  
-- only on desktop
    AND NOT array_contains(revision_tags, 'iOS')
    AND NOT array_contains(revision_tags, 'Android')
    AND NOT array_contains(revision_tags, 'Mobile Web')
     -- find all edits on talk pages 
    AND page_namespace_historical % 2 = 1
    AND event_entity = 'revision' AND 
    event_type = 'create'
    AND event_user_is_anonymous = FALSE
GROUP BY
    event_user_id,
    wiki_db
)

-- Main Query --
SELECT
    new_topic_wiki AS wiki,
    edit_count AS edit_count,
    edit_count_group AS edit_count_group,
--find opt out users that opted out following new discussion tool post
    SUM(CAST(opt_out_user IS NOT NULL AND first_post < opt_out_time AS INT)) AS opt_out_users,
    SUM(CAST(new_topic_user IS NOT NULL AS int)) AS new_topic_contributor
    
FROM (
SELECT
    new_topic_users.first_post,
    new_topic_users.new_topic_user,
    opt_out_users.opt_out_time,
    new_topic_users.new_topic_wiki,
    opt_out_users.opt_out_user,
    new_topic_users.edit_count,
    new_topic_users.edit_count_group
FROM new_topic_users
LEFT JOIN opt_out_users ON 
    new_topic_users.new_topic_user = opt_out_users.opt_out_user AND
    new_topic_users.new_topic_wiki = opt_out_users.opt_out_wiki  
WHERE 
    opt_out_users.opt_outs IS NULL OR
    opt_out_users.opt_outs = 1 
) sessions
GROUP BY
    sessions.new_topic_wiki,
    sessions.edit_count,
    sessions.edit_count_group
"
In [55]:
opt_out_contributors <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit

In [50]:
write_csv(opt_out_contributors, "Data/opt_out_contributors.csv")

Overall

In [56]:
opt_out_contributors_overall <- opt_out_contributors %>%
    summarise(opt_out_users = sum(opt_out_users),
             new_topic_contributors = sum(new_topic_contributor),
             pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
              )

opt_out_contributors_overall
A data.frame: 1 × 3
opt_out_usersnew_topic_contributorspct_opt_out
<int><int><chr>
42751338.32%

By Experience Level

In [57]:
opt_out_contributors_byexp <- opt_out_contributors %>%
    group_by(edit_count_group)  %>%
    summarise(opt_out_users = sum(opt_out_users),
             new_topic_contributors = sum(new_topic_contributor),
             pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
              )

opt_out_contributors_byexp
`summarise()` ungrouping output (override with `.groups` argument)

A tibble: 3 × 4
edit_count_groupopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><chr>
100-500 43 6816.31%
over 500 23734276.92%
under 100147102514.34%

Junior Contributor Opt-Out Investigation

Opt-out rates for all three groups are fairly low (below 15%) with Junior Contributors (editors with under 100 edits) having the highest opt-out rate.

Since the higher opt-out rate for junior contributors is somewhat unexpected, we further broke down the under 100 edit count group into smaller edit count groups (e.g 0-10 edits, 10-20 edits, 30-40 edits, etc) and reviewed the wikis with this highest Junior Contributors Opt-out rate. This was done to identify if the higher opt-out rate for Junior Contributors was due to a specific edit count group or wiki.

By Junior Contributor Experience Level

In [34]:
# Divide edit counts into groups

b <- c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
names <- c( '0-10 edits', '11-20 edits', '21-30 edits', '31-40 edits', 
          '41-50 edits', '51-60 edits', '61-70 edits', '71-80 edits', '81-90 edits', '91-100 edits')
In [37]:
jc_opt_out_contributors_byexp <- opt_out_contributors %>%
    filter(edit_count <= 100) %>% # only review Junior Contributors
    mutate(edit_count = cut(edit_count, breaks = b, labels = names)) %>%
    group_by(edit_count)  %>%
    summarise(opt_out_users = sum(opt_out_users),
             new_topic_contributors = sum(new_topic_contributor),
             pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%")
              )

jc_opt_out_contributors_byexp
`summarise()` ungrouping output (override with `.groups` argument)

A tibble: 10 × 4
edit_countopt_out_usersnew_topic_contributorspct_opt_out
<fct><int><int><chr>
0-10 edits 5332016.56%
11-20 edits 2919015.26%
21-30 edits 1410713.08%
31-40 edits 10 9710.31%
41-50 edits 16 8219.51%
51-60 edits 10 7313.7%
61-70 edits 7 779.09%
71-80 edits 4 705.71%
81-90 edits 12 6219.35%
91-100 edits 3 446.82%

Most all the Junior Contributor edit groups have around the same opt-out rate identified for all contributors with under 100 edits (~15%). There are slightly higher opt-out rates for contributors with under 50 edits but there is does not appear to be a specific group that contributed to the higher opt-out rate.

Wikis with the highest Junior Contributor Opt-Out Rate

In [59]:
jc_opt_out_contributors_bywiki <- opt_out_contributors %>%
    filter(edit_count_group == 'under 100') %>% # only review Junior Contributors
    group_by(wiki)  %>%
    summarise(opt_out_users = sum(opt_out_users),
             new_topic_contributors = sum(new_topic_contributor),
             pct_opt_out = round(opt_out_users/new_topic_contributors * 100, 2)
              ) %>%
    filter(new_topic_contributors > 1) %>% # review wikis with more than 1 new topic contributor
    arrange(desc(pct_opt_out))

head(jc_opt_out_contributors_bywiki, 20)
`summarise()` ungrouping output (override with `.groups` argument)

A tibble: 20 × 4
wikiopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><dbl>
bnwiki 2 450.00
zhwikibooks 1 250.00
arwiki 5 1145.45
trwiki 7 1741.18
simplewiki 3 933.33
svwiki 1 333.33
kowiki 3 1030.00
mediawikiwiki 3 1030.00
mswiki 1 425.00
thwiki 1 425.00
ruwiki 1 520.00
enwiki 5931618.67
fawiki 7 3818.42
commonswiki 4 2218.18
viwiki 2 1118.18
jawiki 5 2817.86
eswiki 10 7014.29
zhwiki 4 3611.11
ptwiki 3 2810.71
itwiki 6 62 9.68

A review by wiki appears also does not reveal any surprising trends . The higher opt-out rates are for wikis with only a few new discussion tool users; as result these rates do not accurately represent the population.

The rates for larger wikis are around 15 to 18%, similar to the overall opt-out rate identifed for Junior Contributors.

Since we are only able to access user-specific opt-out data for the last 90 days, this higher opt-out rate for Junior Contributors is likely because Senior Contributors are more likely to have already accessed and decided to opt out of the tool prior to this 90 days.

Arabic and Czech Wikipedias

In [58]:
opt_out_contributors_byexp <- opt_out_contributors %>%
    filter(wiki %in% c('arwiki', 'cswiki')) %>%
    group_by(wiki)  %>%
    summarise(opt_out_users = sum(opt_out_users),
             new_topic_contributors = sum(new_topic_contributor),
             pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), "%"),.groups = 'drop'
              )

opt_out_contributors_byexp
A tibble: 2 × 4
wikiopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><chr>
arwiki95416.67%
cswiki0270%

Status of Current Discussion Tool Preference Settings for all New Discussion Tool Contributors

Data: Based on data recorded in the mediawiki user_properties table

While we are unable to access user-specific preference change events that occured prior to 90 days ago (18 May 2021) in PrefUpdate, I reviewed the user properties database to determine the numbers of new discussion tool contributors that currently have the discussiontools-betaenable set to disabled.

Note: This data reflects just the current nondefault status of user preference and does not provide any details on if the user enabled and disabled the feature multiple times or when they disabled it in relation to their edit. Also, there are contributors that have used the new discussion tool but don't have a preference set in the user properties table, indicated as "no local preference recorded" in the results below. Possible reasons for this include: (1) the user disabled the setting by selecting 'restore all default preferences' in their user preferences or (2) the user enabled discussion tools in their global preferences but not in their local preferences.

Please see summary of results below and "New_discussion_tool_opt_out_analysis.ipynb" located in the code repository for further details of current user discussionvtool preferences using the mediawiki user_properties table.

Overall

Current Discussion Tool Preference Status Percent of New Discussion Tool Contributors
no local preference recorded 27.14%
explicitly disabled 6.05%
explicitly enabled 66.81%

By Edit Count Group

edit_count_group Current Discussion Tool Preference Status Percent of New Discussion Tool Contributors
under 100 no local preference recorded 28.32%
explicitly disabled 4.05%
explicitly enabled 67.63%
100-500 no local preference recorded 28.87%
explicitly disabled 3.1%
explicitly enabled 68.03%
over 500 no local preference recorded 26.5%
explicitly disabled 7.2%
explicitly enabled 66.3%

Arabic and Czech Wikipedias

Wiki Current Discussion Tool Preference Status Percent of New Discussion Tool Contributors
arwiki explicitly disabled 4.84%
explicitly enabled 95.16%
cswiki explicitly disabled 3.33%
explicitly enabled 96.67%

Summary

From 18 May 2021 through 31 July 2021, 8.32% of contributors that saved at least one new discussion tool edit explicitly opted out of the new discussion tool, indicating that most users of the tool do not find it disruptive. Junior contributors (users with under 100 edits) had the highest opt out rate (15.04%).

Furher investigation indicates that the higher opt-out rate identified for Junior Contributors is likely due to the reviewed timeframe used for the opt-out analysis. We only retain user-specific data on preference updates for 90 days in PrefUpdate due to privacy concerns. As a result, the opt-out analysis only reflects preference changes between 18 May 2021 through 31 July 2021. It's more likely that Senior Contributors have already accessed and decided to opt-out of the tool prior to this 90 days. A review of data logged in the user properties table shows a slightly lower opt-out rate for Junior Contributors compared to Senior Contributors and still reflects an overall low opt-out rate across all three edit count groups, indicating no significant sign of disruption.

No new discussion tool contributors have opted out of Czech Wikipedia. There was an 18.03% opt out rate (based on PrefUpdate data) for Arabic Wikipedia. However, each of these wikis had a limited number of contributors that made a new discussion tool edit (61 new discussion tool contributors on Arabic Wikipedia and only 29 on Czech Wikipedia) so this data may not be reflective of the population.

What percent of all edits made with the New Discussion Tool are reverted within 48 hours of being published?

Purpose: Do people NOT using the New Discussion Tool find it disruptive? How does the level of disruption introduced by people using the New Discussion Tool compare to the level of disruption introduced by people using the current experience?

For this analysis, we reviewed data recorded in mediawiki_history to identify the percent comments posted by the reply tool (identified by the revision tag: discussiontools-newtopic) on talk pages that are reverted within 48 hours 1.

We compared the revert rate for comments published using the new discussion tool to the revert rate for comments made using full page editing (the current editing experience) during the same timeframe. Note: In this analysis, page edits can include any edit made on a talk page not using a discussion tool. This can include both edits to start a new topic and edits to existing comments.


  1. 48 hours is a common cutoff, as research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours. Source: Research: Revert. Mediawiki. https://meta.wikimedia.org/wiki/Research:Revert.

In [316]:
## collect all revert edits for new discussion tool and page editing
query <-

"SELECT
     wiki_db AS wiki,
     event_user_id AS user_id,
     CASE
        WHEN min(event_user_revision_count) < 100 THEN 'under 100'
        WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
        ELSE 'over 500'
        END AS edit_count,
    max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,
    IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit') AS editor_type,
     SUM(CAST(
            revision_is_identity_reverted AND 
            revision_seconds_to_identity_revert <= 172800  -- 48 hours
           AS int)) AS num_reverts,
    COUNT(*) as num_comments
FROM wmf.mediawiki_history 
WHERE 
    snapshot = '2021-07'
    -- exclude reply tool talk page edits
    AND NOT (ARRAY_CONTAINS(revision_tags, 'discussiontools-reply'))
    -- include only desktop edits
    AND NOT array_contains(revision_tags, 'iOS')
    AND NOT array_contains(revision_tags, 'Android')
    AND NOT array_contains(revision_tags, 'Mobile Web')
     -- find all edits on talk pages 
    AND page_namespace_historical % 2 = 1
    AND event_entity = 'revision'
    AND event_type = 'create'
    -- date deployed
    AND event_timestamp >= '2021-02-18' 
    AND event_timestamp <= '2021-07-31' -- allow two days to avoid data censoring 
    -- user is not anonymous
    AND event_user_is_anonymous = FALSE
GROUP BY 
 wiki_db,
 event_user_id,
 IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit')
"
In [ ]:
new_dt_reverts <- wmfdata::query_hive(query)
In [318]:
# reformat user-id and adjust to include wiki to account for duplicate user id instances.
# Users can have the smae user_id on different wikis

new_dt_reverts$user_id <-
  as.character(paste(new_dt_reverts$user_id,new_dt_reverts$wiki,sep ="-" ))
In [319]:
# set factor levels
new_dt_reverts$editor_type <-
  factor(
    new_dt_reverts$editor_type,
    levels = c("page-edit", "new-discussion-tool"),
    labels = c("Page editing", "New Discussion Tool")
  )
new_dt_reverts$edit_count <-
  factor(new_dt_reverts$edit_count,
         levels = c("under 100", "100-500", "over 500"))

Overall

In [320]:
# overall revert rate for dt and page edits
new_dt_reverts_byexp <- new_dt_reverts %>%
    filter(bot_by_group == 'false') %>%
    group_by(editor_type) %>%
    summarise(total_reverts = sum(num_reverts),
              total_comments = sum(num_comments),
              revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') 

new_dt_reverts_byexp
A tibble: 2 × 4
editor_typetotal_revertstotal_commentsrevert_rate
<fct><int><int><chr>
Page editing 13314260210372.21 %
New Discussion Tool 1053 382492.75 %

By Experience Level

In [321]:
# wiki revert rate for dt and page edits
new_dt_reverts_byexp<- new_dt_reverts %>%
    filter(bot_by_group == 'false') %>%
    group_by(edit_count, editor_type) %>%
    summarise(total_reverts = sum(num_reverts),
              total_comments = sum(num_comments),
              revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') 

new_dt_reverts_byexp
A tibble: 6 × 5
edit_counteditor_typetotal_revertstotal_commentsrevert_rate
<fct><fct><int><int><chr>
under 100Page editing 57971 7333917.9 %
under 100New Discussion Tool 170 27326.22 %
100-500 Page editing 4602 1071634.29 %
100-500 New Discussion Tool 63 14814.25 %
over 500 Page editing 7056951804831.36 %
over 500 New Discussion Tool 820 340362.41 %

Arabic and Czech Wikipedia

In [322]:
#  revert rate for dt and page edits by experience level
new_dt_reverts_bywiki <- new_dt_reverts %>%
    filter(bot_by_group == 'false',
           wiki %in% c('arwiki', 'cswiki'))  %>%
    group_by(wiki, editor_type) %>%
    summarise(total_reverts = sum(num_reverts),
              total_comments = sum(num_comments),
              revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') 

new_dt_reverts_bywiki
A tibble: 4 × 5
wikieditor_typetotal_revertstotal_commentsrevert_rate
<chr><fct><int><int><chr>
arwikiPage editing 2162538634.01 %
arwikiNew Discussion Tool 27 10532.56 %
cswikiPage editing 262193641.35 %
cswikiNew Discussion Tool 3 2721.1 %

Summary

Overall, the revert rate for the new discussion tool is only slightly higher than the revert rate for page editing on talk pages (2.75% for the new discussion tool compared to 2.21% for page editing.

However, by experience level, the revert rate for the new discussion tool is lower than page editing for Junior Contributors. For editors with under 100 cumulative edits, there was a -21.3% percent decrease the revert rate for editors using the new discussion tool.

The new discussion tool also had a lower revert rate on both Arabic and Czech Wikipedia compared to page editing on those Wikipedias.

Usage Metrics

We are also interested in understanding who has been using the new Discussion Tool and how much they have been using it.

For this analysis, we reviewed two metrics:

  • The percent of distinct contributors who publish at least one new topic with the tool. We reviewed both the percent of all distinct talk page contributors and the percent of all contributors that started a new topic during the reviewed time period.
  • For contributors that have posted 1 new topic with the New Discussion Tool, the percent of distinct contributors used the New Discussion Tool to create the following percentage of all new topics within the time period?
    • 0%-25% of new topics
    • 26%-50% of new topics
    • 51%-75% of new topics
    • 76%-100% of new topics

What percent of distinct contributors publish at least one new topic with the tool?

All Talk Page Contributors

We first reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool out of all talk page contributors 1.


  1. This includes anyone that has made at least one talk page edit (including posting new comments or sections or editing exiting comments) on any of talk namespaces during the reviewed time period.

In [212]:
# Collect users new topic edits by user over deployment time period and remove bots
# use mediawiki-history as it includes all saved edits at 100 percent sampling rate

query <- "

SELECT
    to_date(event_timestamp) as `date`,
    wiki_db AS wiki,
    event_user_id AS `user`,
    max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,
    CASE
        WHEN min(event_user_revision_count) < 100 THEN 'under 100'
        WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'
        ELSE 'over 500'
        END AS edit_count,
    SUM(CAST(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic') AS INT)) AS new_topic_edits,
    COUNT(*) AS all_talk_edits
FROM wmf.mediawiki_history
WHERE 
    snapshot = '2021-07' 
-- include only desktop edits
    AND NOT array_contains(revision_tags, 'iOS')
    AND NOT array_contains(revision_tags, 'Android')
    AND NOT array_contains(revision_tags, 'Mobile Web')
-- review all talk namespaces
    AND page_namespace_historical % 2 = 1 
-- date of first deployment 
    AND event_timestamp >= '2021-02-18'  
    AND event_timestamp <= '2021-07-31' 
    AND event_entity = 'revision' 
    AND event_type = 'create' 
-- remove logged out users
    AND event_user_is_anonymous = FALSE
GROUP BY
    to_date(event_timestamp),
    wiki_db,
    event_user_id  
"
In [213]:
discussion_tool_users <- wmfdata::query_hive(query)
Don't forget to authenticate with Kerberos using kinit

In [214]:
write_csv(discussion_tool_users, file = 'Data/discussion_tool_users.csv')
In [218]:
discussion_tool_users$date <- as.Date(discussion_tool_users$date, format = "%Y-%m-%d")
In [216]:
# reformat user-id and adjust to include wiki to account for duplicate user id instances.

discussion_tool_users$user <-
  as.character(paste(discussion_tool_users$user, discussion_tool_users$wiki, sep ="-"))

# set discussion tool factor levels
discussion_tool_users$edit_count <-
  factor(discussion_tool_users$edit_count,
         levels = c("under 100", "100-500", "over 500"))

Overall

In [251]:
# overall numbers since deployment
new_discussion_contributors <- discussion_tool_users %>%
    filter(bot_by_group == 'false') %>% # remove bots
    summarise(new_discussion_users = n_distinct(user[new_topic_edits >= 1]) ,
             new_discussion_edits = sum(new_topic_edits))

new_discussion_contributors
A data.frame: 1 × 2
new_discussion_usersnew_discussion_edits
<int><int>
538838261

Since deployment as a beta feature on 18 February 2021, a total of 5,388 distinct users have posted at least one new topic using the new discussion tool. There have been a total of 38,261 edits using the new discussion tool.

To put these numbers into context, we reviewed the percent of contributors that edited a talk page and made at least 1 new topic using the new discussion tool during the reviewed time. Note: For this calculation, we only reviewed the time period when the new discussion tool was available to all wikis.

In [324]:
# pct talk page users
new_discussion_contributors_pct <- discussion_tool_users %>%
    filter(bot_by_group == 'false',
          date >= '2021-03-17') %>%  #day of deployment to all wikis
         summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
             all_talk_contributors = n_distinct(user),
             pct_new_discussion_users = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%')
            )


new_discussion_contributors_pct
A data.frame: 1 × 3
new_discussion_contributorsall_talk_contributorspct_new_discussion_users
<int><int><chr>
51852073842.5%

By Experience Level

In [348]:
# pct talk page users by experience levels
new_discussion_contributors_pct_byexp <- discussion_tool_users %>%
    filter(bot_by_group == 'false',
          date >= '2021-03-17') %>% #day of deployment to all wikis
    mutate(all_new_discussion_contributors = n_distinct(user[new_topic_edits >= 1])) %>%
    group_by(edit_count) %>% 
    summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
             all_talk_contributors = n_distinct(user),
             pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors *100, 2), '%'),.groups = 'drop'
            )  %>% 
    distinct()


new_discussion_contributors_pct_byexp
A tibble: 3 × 4
edit_countnew_discussion_contributorsall_talk_contributorspct_new_discussion_contributors
<fct><int><int><chr>
under 10010521408590.75%
100-500 877 244443.59%
over 500 3480 485657.17%

Arabic and Czech Wiki

In [349]:
new_discussion_contributors_pct_bywikis <- discussion_tool_users %>%
    filter(bot_by_group == 'false',
            wiki %in% c('arwiki', 'cswiki')) %>% #no date filter needed as it was deployed at these wikis since deployment date
    group_by(wiki)  %>%
   summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),
             all_talk_contributors = n_distinct(user),
             pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%'),.groups = 'drop'
              )

new_discussion_contributors_pct_bywikis
A tibble: 2 × 4
wikinew_discussion_contributorsall_talk_contributorspct_new_discussion_contributors
<chr><int><int><chr>
arwiki6270810.88%
cswiki3016741.79%

Overall, 2.5% of all talk page contributors have posted at least one new topic using the new discussion tool since March 17th (when available at all wikis as an opt-in beta feature) through the end of July.

Senior contributors are the more frequent users of the tool. 7.2% of users with over 500 edits that edited a talk page during the reviewed time period made an edit with the new discussion tool.

Usage of the new discussion tool on Arabic and Czech Wikipedias are somewhat low with only 0.88% of talk page editors on Arabic Wikipedia and 1.79% of all talk page editors on Czech Wikipedias making an edit with the new discussion tool.

New Section Usage

For the analysis below, we also reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool but only reviewed contributors that created a new topic on talk page during the reviewed time period.

We used data EditAttemptStep for this analysis as it allows us distinguish edits to existing sections from edits associated with the creation of new sections.

In [229]:
query <-
"
SELECT 
  CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as `date`,
  wiki AS wiki,
  event.user_id AS `user`,
  CASE
        WHEN min(event.user_editcount) < 100 THEN 'under 100'
        WHEN (min(event.user_editcount) >=100 AND min(event.user_editcount <= 500)) THEN '100-500'
        ELSE 'over 500'
        END AS edit_count,
-- new page section edits
  SUM(CAST(event.integration = 'page' AND (event.init_mechanism = 'url-new' OR event.init_mechanism == 'new') AS INT)) AS page_edit,
-- new discussion tool edits
  SUM(CAST(event.integration ='discussiontools' AS INT)) AS dt_edit
FROM event_sanitized.editattemptstep
WHERE
-- section edits
  event.action = 'init'
  AND event.init_type = 'section'
  AND year = 2021
-- review events following deployment
  AND dt >= '2021-02-18'
  AND dt <= '2021-07-31'
 -- review all talk namespaces
  AND event.platform = 'desktop'
  AND event.page_ns % 2 = 1
  AND event.user_id != 0
GROUP BY
  CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),
  wiki, 
  event.user_id
"
In [ ]:
new_section_contributors <- wmfdata::query_hive(query)
In [231]:
new_section_contributors$date <- as.Date(new_section_contributors$date, format = "%Y-%m-%d")
In [232]:
# reformat user-id and adjust to include wiki to account for duplicate user id instances.

new_section_contributors$user <-
  as.character(paste(new_section_contributors$user, new_section_contributors$wiki, sep ="-"))

# set edit count factor levels
new_section_contributors$edit_count <-
  factor(new_section_contributors$edit_count,
         levels = c("under 100", "100-500", "over 500"))

Overall

In [233]:
new_topic_edits <- new_section_contributors %>%
# date released to all wikis
    filter(date >= '2021-03-17') %>%
    summarize(page_editors = n_distinct(user[page_edit >= 1]),
             dt_editor = n_distinct(user[dt_edit >=1]),
             pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%')
              )

new_topic_edits
A data.frame: 1 × 3
page_editorsdt_editorpct_dt_editors
<int><int><chr>
19659568822.44%

By Experience Level

In [350]:
new_topic_edits_byexperience <- new_section_contributors %>%
# date released to all wikis
    filter(date >= '2021-03-17') %>%
    group_by(edit_count) %>%
    summarize(page_editors = n_distinct(user[page_edit >= 1]),
             dt_editor = n_distinct(user[dt_edit >=1]),
             pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'
              )

new_topic_edits_byexperience
A tibble: 3 × 4
edit_countpage_editorsdt_editorpct_dt_editors
<fct><int><int><chr>
under 10010364145912.34%
100-500 1949 94732.7%
over 500 7595349631.52%

Arabic and Czech Wikipedia

In [351]:
new_topic_edits_bywiki <- new_section_contributors %>%
# date released to all wikis
    filter(wiki %in% c('arwiki', 'cswiki')) %>%
    group_by(wiki) %>%
    summarize(page_editors = n_distinct(user[page_edit >= 1]),
             dt_editor = n_distinct(user[dt_edit >=1]),
             pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'
              )

new_topic_edits_bywiki
A tibble: 2 × 4
wikipage_editorsdt_editorpct_dt_editors
<chr><int><int><chr>
arwiki3879319.38%
cswiki1263220.25%

Summary

During the reviewed time period, 22.4% of all contributors that created a new topic on a talk page posted at least one new topic using the new discussion tool.

Senior contributors more commonly used the tool at least once to create a new topic compared to Junior Contributors. Almost half (46.5%) of contributors with over 100 edits that created a new topic on a talk page posted at least one of their new topics using the new discussion tool.

Similar to the noted proportion across all Wikipedias, 19.4% of Arabic contributors and 20.3% of Czech contributors that posted a new topic used the new discussion tool at least once.

For contributors that have posted more than one new topic, what percent of distinct contributors used the New Discussion Tool to create the following percentage of all new topics within the time period?1

Purpose: How much are they using it? This metric helps us understand how many times people chose to use the New Discussion Tool in relation to the number of opportunities they had to use it. For this analysis, we limited our review to contributors that had accesss and used the tool at least once.

  • 0%-25% of new topics
  • 25%-50% of new topics
  • 50%-75% of new topics
  • 75%-100% of new topics

  1. This metric has some slight noise as there could be cases where the following people end up looking the same in the data. Person A: added two new topics to talk pages in the reviewed timeframe, one of which was with the new discussion tool; Person B: made a total of 150 new topics to talk pages, 75 of which were with the New Discussion tool.

Overall

How many contributors made just 1 edit using the new discussion tool?

In [329]:
new_dt_contributors_1edit <- new_section_contributors %>%
    filter(date >= '2021-03-17') %>%
    summarise(one_time_editors = n_distinct(user[dt_edit ==1]),
             all_editors = n_distinct(user[dt_edit >= 1]),
            pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), "%") )
              
  
new_dt_contributors_1edit
A data.frame: 1 × 3
one_time_editorsall_editorspct_1_dt_edit
<int><int><chr>
5166568890.82%

Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe.

In [145]:
#Divide new discussion tool edits into groups
b <- c(0, 25, 50, 75, 100)
names <- c('1-25 percent', '26-50 percent', '51-75 percent', '76-100 percent')
In [352]:
new_dt_contributors_prop <- new_section_contributors %>%
    filter(date >= '2021-03-17') %>%
    filter(dt_edit >= 1,
          page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
    group_by(user) %>% 
    summarise(dt_edit = sum(dt_edit),
             page_edit = sum(page_edit),
            pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
            new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names) ,.groups = 'drop'
              )
In [354]:
# Breakdown of contributors by percent use

prop_new_dt_overall <- new_dt_contributors_prop  %>%
    group_by(new_discussion_edits_group ) %>%
    summarise(n_users = n(),.groups = 'drop') %>%
    mutate(pct_new_discussion_contributors = paste0(round(n_users/sum(n_users) * 100, 2), "%")
           )

prop_new_dt_overall
A tibble: 3 × 3
new_discussion_edits_groupn_userspct_new_discussion_contributors
<fct><int><chr>
26-50 percent 572.63%
51-75 percent 452.08%
76-100 percent206595.29%

By Experience Level

In [355]:
new_dt_contributors_prop_exp <- new_section_contributors %>%
    filter(date >= '2021-03-17') %>%
    filter(dt_edit >= 1,
          page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
    group_by(user, edit_count) %>% 
    summarise(dt_edit = sum(dt_edit),
             page_edit = sum(page_edit),
            pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
            new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop' 
              )
In [359]:
# Breakdown of contributors by percent use

prop_new_dt_byexperience <- new_dt_contributors_prop_exp %>%
    group_by(edit_count, new_discussion_edits_group) %>%
    summarise(n_users = n()) %>%
    mutate(pct_new_discussion_contributors =  paste0(round(n_users/sum(n_users) * 100, 2), "%")
           )

prop_new_dt_byexperience
`summarise()` regrouping output by 'edit_count' (override with `.groups` argument)

A grouped_df: 9 × 4
edit_countnew_discussion_edits_groupn_userspct_new_discussion_contributors
<fct><fct><int><chr>
under 10026-50 percent 195.18%
under 10051-75 percent 195.18%
under 10076-100 percent 32989.65%
100-500 26-50 percent 113.81%
100-500 51-75 percent 62.08%
100-500 76-100 percent 27294.12%
over 500 26-50 percent 301.91%
over 500 51-75 percent 241.53%
over 500 76-100 percent151896.56%

Arabic and Czech Wikipedias

In [360]:
new_dt_contributors_1edit_bywiki <- new_section_contributors %>%
    filter(wiki %in% c('arwiki', 'cswiki')) %>%
    group_by(edit_count) %>%
    summarise(one_time_editors = n_distinct(user[dt_edit ==1]),
             all_editors = n_distinct(user[dt_edit >= 1]),
            pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), "%"),.groups = 'drop' )
              
  
new_dt_contributors_1edit_bywiki
A tibble: 3 × 4
edit_countone_time_editorsall_editorspct_1_dt_edit
<fct><int><int><chr>
under 100364776.6%
100-500 81080%
over 500 646992.75%
In [361]:
new_dt_contributors_prop_wiki <- new_section_contributors %>%
    filter(dt_edit >= 1,
          page_edit + dt_edit > 1,
            wiki %in% c('arwiki', 'cswiki')) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic
    group_by(user, wiki) %>% 
    summarise(dt_edit = sum(dt_edit),
             page_edit = sum(page_edit),
            pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,
             new_discussion_edits_group  = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop' 
              )
In [364]:
# Breakdown of contributors by percent use

prop_new_dt_bywiki <- new_dt_contributors_prop_wiki %>%
    group_by(wiki,  new_discussion_edits_group ) %>%
    summarise(n_users = n(),.groups = NULL) %>%
    mutate(percent_new_dt_users = paste0(round(n_users/sum(n_users) * 100, 2), "%")
           )

prop_new_dt_bywiki
`summarise()` regrouping output by 'wiki' (override with `.groups` argument)

A grouped_df: 5 × 4
wikinew_discussion_edits_groupn_userspercent_new_dt_users
<chr><fct><int><chr>
arwiki26-50 percent 25.13%
arwiki51-75 percent 12.56%
arwiki76-100 percent3692.31%
cswiki51-75 percent 313.64%
cswiki76-100 percent1986.36%

Summary

Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe. Of the contributors that posted more than one new topic on a talk page, 95.3% of these contributors posted between 75 to 100 percent of their new topics using the new discussion tool, indicating that these contributors chose to use the tool when presented with an opportunity to start a new topic.

For all three levels of editor experience, over 89% of all contributors that posted more than one new topic used the new discussion tool to make between 76-100 percent of their new topics. Senior contributors made the highest proprotion of their new topic edits using the new discussion tool (96.56% made between 76-100 percent of their new topic edits) compared to Junior Contributors (89.65% made between 76-100 percent of their new topic edits).

The majority of contributors on on Arabic and Czech Wikipedia also 76-100 percent of their new topic using the new discussion tool.