Edit Card Impact Analysis Report

2019 December 19
Megan Neisler

Task

Summary

As part of efforts to simplifiy contributing on mobile, the Editing team worked to improve the context items (called edit cards) in the mobile visual editor. Edit cards are the part of the editing interface that shows additional details about, and actions related to, editable elements within articles. These elements include links, citations, images, infoboxes, templates, etc.

Work focused on two specific elements of edit cards: adding and modifying links and adding and modifying citations.

Project page

Hypotheses and metrics

Edit Cards and the changes to the subsequent dialogs will:

Hypothesis 1: Make contributors more likely to start adding and modifying links and citations, since the actions to edit them will be more prominent on the page

* Metric 1: average number of link and citation workflow starts per session. Defined as the average number of 'window-open' type actions per editing session as this indicates an intent to edit in most cases.
* Metric 2: percent of sessions that view an edit card and start editing workflow. Defined as the percent of sessions shown an edit card (recorded as action 'context-show') that open the window to begin editing (recorded as 'window-open-from-context')

Hypothesis 2: Make contributors more likely to finish adding and modifying links and citations, since the steps to do so will be more visible and clear.

* Metric: link and citation workflow completion rate. Defined as the number of sessions with link workflow starts (measured by window-open actions) that reach dialog-done, dialog-remove, or dialog-insert.


Hypothesis 3: Make contributors more likely to publish their edits, since it will be more clear when they have finished their task.

* Metric: edit completion rate. Defined as the number of ready sessions (recorded as 'ready' action) where edits are published (recorded as 'SaveSuccess').

Experiment plan

We completed a basic pre/post analysis of whether the metrics above changed due to the multiple iterative deployments of Edit Cards:

Version Deployment Dates:

  • v1 21-June to Bengali, Hebrew, and Persian Wikipedias T221314. Note: V1 was never deployed to all wikis. Since we did not start recording events until 13 July 2019, we did not review pre and post deployments for v1.
  • v2 1-August 2019 in 1.34.0-wmf.16 T225834 to all Wikipedias.
  • v3 29-August 2019 in 1.34.0-wmf.20 T229830 to all Wikipedias.
  • v4 12-September-2019 in 1.34.0-wmf.22 T231342 to all Wikipedias.

We excluded any sessions that have a bucket set in the EditAttemptStep data, because those are part of the mobile-VE-as-default A/B test and reviewed data available on linking and citiation workflows recorded in the VisualEditorFeatureUse eventlogging data. For details about the linking and citation workflow instrumentation, see the data dictionary.

Detailed data on linking and working citation workflows were available as of 13 July 2019. Since we do not yet have a full year of data, we could not review year over year changes and some of the noted observations may be elevated due to seasonal fluctuations; however, we reviewed rates pre and post each edit card deployment for any significant changes.

In [1039]:
library(IRdisplay)

display_html(
'<script>  
code_show=true; 
function code_toggle() {
  if (code_show){
    $(\'div.input\').hide();
  } else {
    $(\'div.input\').show();
  }
  code_show = !code_show
}  
$( document ).ready(code_toggle);
</script>
  <form action="javascript:code_toggle()">
    <input type="submit" value="Click here to toggle on/off the raw code.">
 </form>'
)

Are contributors more likely to start adding or modifying links and citations?

Methodology

We measured the start of a session with the 'window-open' type actions as this indicates an intent to edit in most cases. We also reviewed the percent of sessions shown the edit card that started the link editing workflow to determine if the the new design made people more likely in start editing in the first place.

Events are recorded as follows:

Action: User clicks on an existing internal link, selects "edit"
Event: Link/internal:context-show then link:window-open-from-context

Action: User clicks an existing external link, selects "edit"
Event: Link:context-show then link:window-open-from-context

Action: User clicks the link button in a toolbar to add a new link
Event: link:window-open-from-tool

In [2]:
shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))
shhh({
    library(tidyverse); library(lubridate); library(scales)
})
In [2]:
#Find link workflow starts from context items measured by the window-open action.
query <-
"
--find mobile VE sessions that were not included in the VE as default AB test
with non_test_sessions as (
    SELECT DISTINCT event.editing_session_id as session_id
    FROM
        event_sanitized.editattemptstep
    WHERE
        event.bucket is NULL and
        event.platform = 'phone' and
        event.editor_interface = 'visualeditor' and
--full data available on July 13
        year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) 
        )
SELECT
    to_date(dt) as date,
    event.editingsessionid as session_id,
    event.action as action,
    Count (*) as events
FROM event_sanitized.visualeditorfeatureuse as vefu
INNER JOIN
    non_test_sessions 
    ON event.editingsessionid = non_test_sessions.session_id 
WHERE

    year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) and
--Both internal and external links are labeled as link following context-open action
    event.feature = 'link' and
    event.action Like 'window-open%'

GROUP BY
    to_date(dt),
    event.editingsessionid,
    event.action
"
In [3]:
link_workflow_starts <- wmf::query_hive(query)
In [4]:
link_workflow_starts$date <- as.Date(link_workflow_starts$date, format = "%Y-%m-%d")
In [7]:
#Find the average link workflow starts per session for all window-open type actions

link_workflow_starts_daily <- link_workflow_starts %>%
    #filter(action == 'window-open-from-context') %>%
    group_by(date) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events))
In [1040]:
# Plot timeseries of average link_workflow_starts

p <- ggplot(link_workflow_starts_daily, aes(x= date, y = avg_events)) +    
  geom_line(size =0.8, color = 'blue') +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=2.8, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=2.8, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=2.8, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Average events per session", labels = polloi::compress) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
  labs(title = "Daily average link workflow starts per session \n on mobile visual editor") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
  scale_alpha_manual(values = c(0.45, 0.65, 0.85, 0.95, 1)) +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom')
 
p
ggsave("Figures/average_link_workflow_starts.png", p, width = 18, height = 9, units = "in", dpi = 150)
In [1030]:
link_workflow_starts_monthly_avg <- link_workflow_starts %>%
    mutate(date = floor_date(date, "month")) %>%
    filter(date != '2019-07-01',
          date != '2019-08-01')  %>% #remove months due to incomplete data
    group_by(date) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events)) %>%
    arrange(date) %>%
    mutate(mom_percent = (avg_events/lag(avg_events,1) -1) *100) %>%
    arrange(desc(date))

link_workflow_starts_monthly_avg
A tibble: 3 × 4
datetotal_eventsavg_eventsmom_percent
<date><int><dbl><dbl>
2019-11-011591182.078860 3.147164
2019-10-011551662.015431-6.902632
2019-09-011368542.164863 NA

Between v3 and v4 deployments (around September 4, 2019), there was a sharp decrease in the daily average number of link workflow starts per session from about 2.4 to 2.1. Further investigation is needed to determine if this may be caused by another change deployed around this time or is reflective of user behavior.

After the version 4 deployment, the average rate of link workflow starts has been faily stable and ranged between 1.8 and 2.3 starts per session. There was a 3.1% increase from October to November 2019.

In [1033]:
#Find the average link workflow starts per session by window-open action type

link_workflow_starts_daily_byaction <- link_workflow_starts %>%
    filter(action %in% c('window-open-from-context', 'window-open-from-tool')) %>%  #Filter to the two primary window actions for links
    group_by(date, action) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events))
In [1041]:
# Plot timeseries of average link_workflow_starts broken down by action type

p <- ggplot(link_workflow_starts_daily_byaction, aes(x= date, y = avg_events, color = action)) +    
  geom_line(size =0.8) +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=3.5, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=3.5, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=3.5, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Average events per session", labels = polloi::compress) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
  labs(title = "Daily average link workflow starts per session by action type \n on mobile visual editor") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
  scale_alpha_manual(values = c(0.45, 0.65, 0.85, 0.95, 1)) +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom')
 
p

ggsave("Figures/average_link_workflow_starts_byaction.png", p, width = 18, height = 9, units = "in", dpi = 150)

A breakdown of the average link workflow starts by action type shows a decrease in the window-open-from-tool action between v3 and v4, which occurs when a user clicks the link button in the toolbar to add a new link. This appears to be the primary driver of the decrease observed in the overall rate during this time.

Following v4 deployment, the average number of link workflow starts per session has remained fairly stable for both action types.

In [1042]:
# Daily total link workflow start rate

p <- ggplot(link_workflow_starts_daily_byaction, aes(x= date, y = total_events, color = action)) +    
  geom_line( size =0.8) +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=2.5E3, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=2.5E3, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=2.5E3, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Total number of events per day", labels = polloi::compress) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
  labs(title = "Daily link workflow start rate \n on mobile visual editor") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
  scale_alpha_manual(values = c(0.45, 0.65, 0.85, 0.95, 1)) +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom')
 
p

ggsave("Figures/total_link_workflow_starts_byaction.png", p, width = 18, height = 9, units = "in", dpi = 150)
In [1031]:
link_workflow_starts_monthly_total <- link_workflow_starts %>%
    mutate(date = floor_date(date, "month")) %>%
    filter(date != '2019-07-01',
          date != '2019-08-01')  %>% #remove months due to incomplete data
    group_by(date) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events)) %>%
    arrange(date) %>%
    mutate(mom_percent = (total_events/lag(total_events,1) -1) *100) %>%
    arrange(desc(date))

link_workflow_starts_monthly_total
A tibble: 3 × 4
datetotal_eventsavg_eventsmom_percent
<date><int><dbl><dbl>
2019-11-011591182.078860 2.54695
2019-10-011551662.01543113.38068
2019-09-011368542.164863 NA

There are more additions of new links compared to modifying current ones.

The overall number of link workflow starts per day increase following each deployment. From v2 to v3 of the edit cards, there was a 13% increase in the overall number of link workflow starts per day. Following v4 deployment, the total number of workflow starts has leveled slightly with a 2.5% increase from October to November 2019.

These increases are likely due to an increase in the number of contributors or sessions using edit cards since the number of starts per session did not increase.

Percent of sessions shown edit card with engagement

We are also interested in understanding the percent of sessions where contributors who are shown the edit card (revised) or context item (existing) end up engaging with it. This will help determine if the new design made people more likely to start editing in the first place.

We defined this as the percent sessions shown an edit card (recorded as action 'context-show') that open the window to begin editing (recorded as 'window-open-from-context')

In [11]:
query <-
"
--find mobile VE edit sessions that were not included in the AB test 
with non_test_sessions as (
    SELECT DISTINCT event.editing_session_id as session_id
    FROM
        event_sanitized.editattemptstep
    WHERE
        event.bucket is NULL and
        event.platform = 'phone' and
        event.editor_interface = 'visualeditor' and
        year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) 
        )
SELECT
    to_date(dt) as date,
    event.editingsessionid as session_id,
    sum(cast(event.action = 'window-open-from-context' as int)) >= 1 as window_open
FROM event_sanitized.visualeditorfeatureuse as vefu
INNER JOIN
    non_test_sessions 
    ON event.editingsessionid = non_test_sessions.session_id 
WHERE
   -- full data available as of July 13
       year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) and
    event.feature IN ('link', 'link/internal') and
    -- the first action in link workflow starts is context-show
    event.action IN ('context-show', 'window-open-from-context')
GROUP BY
    to_date(dt),
    event.editingsessionid;
"
In [12]:
link_context_shown_engagement <-  wmf::query_hive(query)
In [13]:
link_context_shown_engagement$date <- as.Date(link_context_shown_engagement$date, format = "%Y-%m-%d")
In [14]:
link_context_shown_engagement_rate <- link_context_shown_engagement %>%
    group_by(date) %>%
    summarise(session_count = n(),
             window_open = sum(window_open == 'true')) %>%
    mutate(window_open_rate = window_open/session_count)
In [15]:
p <- link_context_shown_engagement_rate %>%
  ggplot(aes(x= date, y = window_open_rate)) +    
  geom_line(size = 0.8, color = 'blue') +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=.10, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=.10, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=.10, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Percent of edit card view sessions", labels = scales::percent) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
  labs(title = "Proportion of sessions with edit card views leading to \n start of a link edit") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
    scale_color_brewer(palette = 'Dark2') +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom',
        legend.title  = element_text(size=12, face = 'bold'),
        legend.text = element_text(size=12))
p


ggsave("Figures/link_context_shown_engagement_rate.png", p, width = 18, height = 9, units = "in", dpi = 150)
In [16]:
link_context_shown_engagement_monthly <- link_context_shown_engagement_rate %>%
    mutate(date = floor_date(date, "month")) %>%
    filter(date != '2019-07-01',
          date != '2019-08-01')  %>% #remove months due to incomplete data
    group_by(date) %>%
    summarise(total_sessions = sum(session_count),
             total_events = sum(window_open),
             window_open_rate = total_events/total_sessions) %>%
    arrange(date) %>%
    mutate(mom_percent = (window_open_rate/lag(window_open_rate,1) -1) *100) %>%
    arrange(desc(date))

link_context_shown_engagement_monthly
A tibble: 3 × 5
datetotal_sessionstotal_eventswindow_open_ratemom_percent
<date><int><int><dbl><dbl>
2019-11-0169453112760.16235444.002013
2019-10-0168389106760.15610702.642181
2019-09-0159827 90990.1520885 NA

There was a sudden increase in the percent of editing sessions that started a editing a link once shown the link edit card around September 5, 2019 between v3 and v4. This is correlated to the decrease the average starts per session noted earlier.

Following v4 deployment on September 12th, there has been a steady increase in the percent of sessions that engage in the editing workflow once shown an edit card. From October to November 2019, there was a 4% increase. About 13%-18% of sessions who view a link context card start an edit.

Average number of citation workflow starts per session

Methodology

Similarily to link workflow starts, we measured the start of a citation workflow session with the 'window-open' action as this indicates an intent to edit in most cases. This occurs when a window was opened by clicking the edit button in a context item. We also reviewed the percent of sessions shown the edit card that started the citation editing workflow to determine if the the new design made people more likely in start editing in the first place.

Events are recorded as follows:

Action: User clicks on the citation icon in the toolbar.
Event: citoid: window-open-from-tool

Action: User clicks an an existing citation
Event: cite-book (or cite-web, cite-news): context-show then "mwcite": "window-open-from-context"

Action: User clicks on an existing reference
Event: reference: context-show then reference: window-open-from-context

Note: There were no recorded citoid events or mwcite events (which occurs when a user adds or edits a specific citation template on wikis without Citoid) until the v2 deployment.

In [745]:
#measure start from "window-open" events.
query <-
"
--find sessions that were not included in the AB test
with non_test_sessions as (
    SELECT DISTINCT event.editing_session_id as session_id
    FROM
        event_sanitized.editattemptstep
    WHERE
        event.platform = 'phone' and
        event.editor_interface = 'visualeditor' and
        event.bucket is NULL and
        year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11))
        )
SELECT
    to_date(dt) as date,
    event.feature as feature,
    event.editingsessionid as session_id,
    Count (*) as events
FROM event_sanitized.visualeditorfeatureuse as vefu
INNER JOIN
    non_test_sessions 
    ON event.editingsessionid = non_test_sessions.session_id 
WHERE
    -- started recorded full data on July 13th
     year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) and
    -- event feature in citoid, reference or mwcite
    event.feature IN ('citoid', 'reference', 'mwcite') and
    -- the first action in a citation workflow is window-open following context-show
    event.action LIKE 'window-open%'
GROUP BY
    to_date(dt),
    event.editingsessionid,
    event.feature
"
In [746]:
citation_workflow_starts <- wmf::query_hive(query)
In [747]:
citation_workflow_starts$date <- as.Date(citation_workflow_starts$date, format = "%Y-%m-%d")
In [961]:
#Find the average link_workflow_starts_per_session

citation_workflow_starts_daily <- citation_workflow_starts %>%
    filter(date >= '2019-08-01') %>% #Did not have full data on all citation workflows until v2 deployment
    group_by(date, feature) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events))
In [1043]:
# Plot timeseries of average citaton_workflow_starts

p <- ggplot(citation_workflow_starts_daily, aes(x= date, y = avg_events, color = feature)) +    
  geom_line(size =0.8) +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=2.5, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=2.5, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=2.5, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Average events per session", labels = polloi::compress) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
  labs(title = "Daily average citation workflow starts per session \n on mobile visual editor") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
  scale_alpha_manual(values = c(0.45, 0.65, 0.85, 0.95, 1)) +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom')
 
p

ggsave("Figures/average_citation_workflow_starts.png", p, width = 18, height = 9, units = "in", dpi = 150)

Month over Month Changes in the average citation workflow starts per session

In [963]:
citation_workflow_starts_monthly_avg <- citation_workflow_starts %>%
    mutate(date = floor_date(date, "month")) %>%
    filter(date != '2019-07-01',
          date != '2019-08-01')  %>% #remove months due to incomplete data
    group_by(date) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events)) %>%
    arrange(date) %>%
    mutate(mom_percent = (avg_events/lag(avg_events,1) -1) *100) %>%
    arrange(desc(date))

citation_workflow_starts_monthly_avg
A tibble: 3 × 4
datetotal_eventsavg_eventsmom_percent
<date><int><dbl><dbl>
2019-11-01700391.437140 4.009153
2019-10-01668751.381743-1.637521
2019-09-01519111.404746 NA

There was a increase in the average citation workflow starts between v2 and v3 edit cards especially for edits using the citation template (recorded as the mwcite feature), which was followed by a decrease in September. Following the v4 deployment, the rate has been fairly stable with about 1 and 2.5 workflow starts per session. From October to November 2019, there was a 4.0% increase in the average number of citation workflow starts per session.

Daily total citation workflow starts

In [1044]:
# Daily rate of citation workflow starts


p <- ggplot(citation_workflow_starts_daily, aes(x= date, y = total_events, color = feature)) +    
  geom_line(size =0.8) +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=1.5E3, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=1.5E3, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=1.5E3, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("Total number of events per day", labels = polloi::compress) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "1 week") +
  labs(title = "Daily total number of citation workflow starts \n on mobile visual editor") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
  scale_alpha_manual(values = c(0.45, 0.65, 0.85, 0.95, 1)) +
ggthemes::theme_tufte(base_size = 12, base_family = "Gill Sans") +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom')
 
p
ggsave("Figures/total_citation_workflow_starts.png", p, width = 18, height = 9, units = "in", dpi = 150)

Month over Month Changes in the daily total citation workflow starts

In [964]:
citation_workflow_starts_monthly_total <- citation_workflow_starts %>%
    mutate(date = floor_date(date, "month")) %>%
    filter(date != '2019-07-01',
          date != '2019-08-01')  %>% #remove months due to incomplete data
    group_by(date) %>%
    summarise(total_events = sum(events),
             avg_events = mean(events)) %>%
    arrange(date) %>%
    mutate(mom_percent = (total_events/lag(total_events,1) -1) *100) %>%
    arrange(desc(date))

citation_workflow_starts_monthly_total
A tibble: 3 × 4
datetotal_eventsavg_eventsmom_percent
<date><int><dbl><dbl>
2019-11-01700391.437140 4.731215
2019-10-01668751.38174328.826260
2019-09-01519111.404746 NA

Similarily to the trends we observed for link workflow starts, the number of citation workflows have been increasing following the deployment of v2 to edit cards. From September to October 2019 (following v3 and v4 deployments, there was a 28.8% increase in the daily total number of citation workflow starts. This increase appears to be largely driven by citoid-generated references; however, there were also increases by citation-template (mwcite) and reference generated references.

Percent of sessions shown edit card with engagement

We are also interested in understanding the percent of sessions where contributors who are shown the edit card (revised) or context item (existing) end up engaging with it. This will help determine if the new design made people more likely to start editing in the first place.

We defined this as the percent sessions shown an edit card (recorded as action 'context-show') that open the window to begin editing (recorded as 'window-open-from-context')

In [4]:
#Note: Context shown items don't appear to be recorded for citoid events where a user generates a new reference.

query <-
"
--find mobile VE edit sessions that were not included in the AB test 
with non_test_sessions as (
    SELECT DISTINCT event.editing_session_id as session_id
    FROM
        event_sanitized.editattemptstep
    WHERE
        event.bucket is NULL and
        event.platform = 'phone' and
        event.editor_interface = 'visualeditor' and
        year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) 
        )
SELECT
    to_date(dt) as date,
    event.editingsessionid as session_id,
    sum(cast(event.action = 'window-open-from-context' as int)) >= 1 as window_open
FROM event_sanitized.visualeditorfeatureuse as vefu
INNER JOIN
    non_test_sessions 
    ON event.editingsessionid = non_test_sessions.session_id 
WHERE
   -- full data available as of July 13
       year = 2019 and ((month = 07 and day >= 13) OR (month >=08 AND month <= 11)) and
    (event.feature LIKE 'cite%' or
    event.feature IN ('reference', 'mwcite')) and
    -- the first action in link workflow starts is context-show
    event.action IN ('context-show', 'window-open-from-context')
GROUP BY
    to_date(dt),
    event.editingsessionid;
"
In [5]:
citation_context_shown_engagement <-  wmf::query_hive(query)
In [6]:
citation_context_shown_engagement$date <- as.Date(citation_context_shown_engagement$date, format = "%Y-%m-%d")
In [7]:
citation_context_shown_engagement_rate <- citation_context_shown_engagement %>%
    group_by(date) %>%
    filter(date >= '2019-08-01') %>% #Did not have full data on all citation workflows until v2 deployment
    summarise(session_count = n(),
             window_open = sum(window_open == 'true')) %>%
    mutate(window_open_rate = window_open/session_count)
In [17]:
p <- citation_context_shown_engagement_rate %>%
  ggplot(aes(x= date, y = window_open_rate)) +    
  geom_line(size = 0.8, color = 'blue') +
  geom_vline(xintercept = c(as.Date('2019-08-01'), as.Date('2019-08-29'), as.Date('2019-09-12')),
             linetype = "dashed", color = "black") +        
   geom_text(aes(x=as.Date('2019-08-01'), y=.10, label= 'v2 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-08-29'), y=.10, label='v3 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
   geom_text(aes(x=as.Date('2019-09-12'), y=.10, label='v4 deployed'), size=3.6, vjust = -1, angle = 90, color = "black") +
  scale_y_continuous("percent of edit card view sessions", labels = scales::percent) +
  scale_x_date("Date", labels = date_format("%b %d %Y"), date_breaks = "2 weeks") +
  labs(title = "Proportion of sessions with edit card views leading to \n start of a citation edit ") +
  ggthemes::theme_tufte(base_size = 14, base_family = "Gill Sans") +
    scale_color_brewer(palette = 'Dark2') +
    theme(axis.text.x=element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        panel.grid = element_line("gray70"),
         legend.position = 'bottom',
        legend.title  = element_text(size=12, face = 'bold'),
        legend.text = element_text(size=12))
p

ggsave("Figures/citation_context_shown_engagement_rate.png", p, width = 18, height = 9, units = "in", dpi = 150)