Authors list:
Frank Soboczenski (1),
Thomas Trikalinos (2),
Joel Kuiper (3),
Randolph G Bias (4),
Byron C Wallace (5),
Iain J Marshall (1)
(1) School of Population Health and Life Sciences, King's College London, London, UK
(2) Center for Evidence-based Medicine, Brown University, Providence, USA
(3) Vortext Systems, Groningen, Netherlands
(4) School of Information, University of Texas at Austin, Austin, USA
(5) College of Computer and Information Science, Northeastern University, Boston, USA
The System used and evaluated in this study can be found here: RobotReviewer User Study
This Notebook is unsing an R Kernel!
First setting up the R environment:
#set specific working directory uncomment to refer to the data files
#setwd("~/Desktop/R_STUFF")
# load performance improvement libraries & enable just in time compiler
library(compiler)
enableJIT(1)
#some environmental variables (decimals)
options=7
#check if required packages are there - if not the script will install them!
requiredPackages = c('rcompanion','gdata','compiler','car','lsr','sft','nlme', 'lme4', 'bibtex', 'psych', 'likert', 'ggplot2', 'tidyverse')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p);
library(p,character.only = TRUE);
}
#load all the libraries
lapply(requiredPackages, require, character.only = TRUE);
# !!!! NOTE: run this cell twice for decluttering
# Importing the data
data <- read.xls("data/TimeAnalysis2_1.xlsx", verbose=FALSE, na.strings=c("NA"))
# quick check on the data
head(data)
ParticipantNo | ID | Order | Time | Condition | CExperience | Tasks | NReviews | AMeanNReviews | ⋯ | AC2 | AL2 | AC3 | AL3 | AC4 | AL4 | Comments | TotalAdded | TotalDeleted | TotalSubmitted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | -JjAzwjQakZk-3kIbeMfw | U2PXr4iMBf8Kt-leQI8Ii | 1 | 435 | 1 | 1 | 9 | 10+ | 12.5 | ⋯ | Yes | Yes | Yes | Yes | Yes | Yes | C1:2added, C2:2added,C3:3added,C4:1added | 8 | 0 | 20 |
1 | -JjAzwjQakZk-3kIbeMfw | 4gIe5sni3WJJCzNglSdo8 | 2 | 582 | 0 | 1 | 9 | 10+ | 12.5 | ⋯ | Yes | No | Yes | No | Yes | No | C1:1added, C2:1added, C3:1added, C4:2added | 5 | 0 | 5 |
1 | -JjAzwjQakZk-3kIbeMfw | iHbGgWtrKNfksdoj9Hxv9 | 3 | 297 | 1 | 1 | 9 | 10+ | 12.5 | ⋯ | No | Yes | Yes | Yes | Yes | Yes | C1:1added, C2:1added, C4:1added | 3 | 0 | 15 |
1 | -JjAzwjQakZk-3kIbeMfw | cnDXl97I_WoVUqyIH0HbQ | 4 | 610 | 0 | 1 | 9 | 10+ | 12.5 | ⋯ | No | No | No | No | Yes | No | C1:1added, C2:0added, C3:0added | 1 | 0 | 1 |
2 | hNQKTiKtHSY_BmxLyZb9Q | XOCbiBddVQK3BYy1Ox4lI | 1 | 306 | 1 | 1 | 8 | 5to10 | 7.5 | ⋯ | No | Yes | No | Yes | No | Yes | C1:1added | 1 | 0 | 16 |
2 | hNQKTiKtHSY_BmxLyZb9Q | oOAy7INgRLumTV3vZWtVM | 2 | 127 | 1 | 1 | 8 | 5to10 | 7.5 | ⋯ | No | Yes | No | Yes | No | Yes | 0 | 0 | 15 |
Short comments on the data:
ID = individual participant
PDF = document ID
Order = document order of appearance
Time = Time spent on one document in seconds
Condition = Independent variable (1 = Machine learning recommendations (MLR) present, 0 = no MLR)
CExperience = Experience with Cochrance Risk of Bias Tool? (1 = Yes, 0 = No)
Tasks = Number of tasks (in a systematic review) performed (9 = max)
NReviews = Number of systematic reviews performed
AMeanNReviews = Artithmetic Mean number of systematic reviews performed (0=0, 1-5=3, 5-10=7.5, 10+=12.5)
Error = reported errors (ignore for now)
# first we'll look at data: the histogram
hist(data$Time)
# However we chose to use a log representation (lot=log of time) of our data to account for
lot <- log(data$Time)
hist(lot)
# we perform a shapiro-wilk test to establish if the data is parametric or not:
shapiro.test(data$Time)
Shapiro-Wilk normality test data: data$Time W = 0.69672, p-value < 2.2e-16
The Shapiro-Wilk test shows significance ($W=0.69$, $p<0.001$) which means it is highly unlikely that the data here was sampled from a normal distribution. Hence our data is non-parametric.
As this analysis is looking for differences between the two groups (machine-leanring and No-machine-learning) and as the study followed a within-jubjects design, a Wilcoxon Rank-Sum test is the suitable test for an initial overview of the data.
An alpha level of .05 was used for all statistical tests. A Shapiro-Wilk test showed that the timing data does not follow a normal distribution $W=0.69$, $p<0.001$. The subsequent Wilcoxon Rank-Sum test showed that there was a significant difference in time between the participants in the semi-automated (machine-learning) and the manual (non-machine learning) condition $W(164)=$ 13530, $p<$ 0.001.
# We used th log of the timeing data for all subsequent analyses
T_log <- log(data$Time)
# In this case the Wilcox test does not care about Log or not as it uses Rankings of bins
# Wilcox text on the Log Time of the data
wilcox.test(T_log, data$Condition, paired = TRUE)
# We also looked at the Timing differences by documents
#wilcox.test(T_log, data$PDF, paired = TRUE)
# As well as differences in time in respect to the order
wilcox.test(T_log, data$Order, paired = TRUE)
Wilcoxon signed rank test with continuity correction data: T_log and data$Condition V = 13530, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0
Wilcoxon signed rank test with continuity correction data: T_log and data$Order V = 13530, p-value < 2.2e-16 alternative hypothesis: true location shift is not equal to 0
# We'll have a look at the boxplots now:
# For timing data (Logarithmic scale) by Condition (ML or Non-ML)
boxplot(T_log~data$Condition, xlab='Condition 0=Manual, 1=Semi-Automated', ylab="Time in Seconds (Log)", main='Overall timing by condition')
#boxplot(data$Time~data$Condition, xlab='Condition', ylab="Time in Seconds", main=axis(1, at=0:1, labels=(["N", "M"]))
# also by looking at the scatterplot
scatterplot(T_log ~ Condition, data=data)
Eyeballing...there is a slight downward tendencey of time towards the MLR (1.0) condition.
# We also perfomed a t-test (paired because our within-participants design) to
# check robustness
t.test(T_log, data$Condition, paired = TRUE, alternative = "two.sided")
Paired t-test data: T_log and data$Condition t = 73.028, df = 163, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 5.668718 5.983796 sample estimates: mean of the differences 5.826257
The t-test also confirms the result is highly significant! The t-test shows that the two groups differ significatnly from each other $t(163) = 73.02, p < 0.001$
# before we carry on, lets have a look if there is a difference of timing in terms of the
# documents. First the box plot:
boxplot(T_log~data$PDF)
#Also looking at the order here First the box plot:
boxplot(data$Time~data$Order, xlab="Order of the Documents", ylab="Time in Seconds", main="Overall Time by Order")
This nicely shows the incease in time spent on a PDF from the 1st Document participants see to the 4th.
# Now the same plot in time (Log scale) and separated by condition
boxplot(T_log~data$Order*data$Condition, xlab='0.X = Manual conditions, 1.X = Semi-Automated conditions', ylab="Time in Seconds (Log)", main='Order by Time Separated by Condition')
We can see a almost linear increase in the timing data on the non-machine learning side (1.0, 2.0, 3.0, 4.0) compared to the machine-learning side. Note, as this was a within-participants design fatigue effects are expected and can be seen here in form of the last plot 4.0 & 4.1. The timing in the machine-learning condition almost stays at a constant level except the expected last box.
# Again for robustness we also performed a t-test:
t.test(T_log, data$Order, paired = TRUE, alternative = "two.sided")
Paired t-test data: T_log and data$Order t = 44.277, df = 163, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 3.655618 3.996895 sample estimates: mean of the differences 3.826257
The t-test shows that the Order is significantly different in terms of timing data $t(163) = 44.28, p < 0.001$
# creating two groups (ML (1) and NonML (0) )
Annotations <- subset(data, Condition=='1')
NoAnnotations <- subset(data, Condition=='0')
# checking the data
head(Annotations)
ParticipantNo | ID | Order | Time | Condition | CExperience | Tasks | NReviews | AMeanNReviews | ⋯ | AC2 | AL2 | AC3 | AL3 | AC4 | AL4 | Comments | TotalAdded | TotalDeleted | TotalSubmitted | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | -JjAzwjQakZk-3kIbeMfw | U2PXr4iMBf8Kt-leQI8Ii | 1 | 435 | 1 | 1 | 9 | 10+ | 12.5 | ⋯ | Yes | Yes | Yes | Yes | Yes | Yes | C1:2added, C2:2added,C3:3added,C4:1added | 8 | 0 | 20 |
3 | 1 | -JjAzwjQakZk-3kIbeMfw | iHbGgWtrKNfksdoj9Hxv9 | 3 | 297 | 1 | 1 | 9 | 10+ | 12.5 | ⋯ | No | Yes | Yes | Yes | Yes | Yes | C1:1added, C2:1added, C4:1added | 3 | 0 | 15 |
5 | 2 | hNQKTiKtHSY_BmxLyZb9Q | XOCbiBddVQK3BYy1Ox4lI | 1 | 306 | 1 | 1 | 8 | 5to10 | 7.5 | ⋯ | No | Yes | No | Yes | No | Yes | C1:1added | 1 | 0 | 16 |
6 | 2 | hNQKTiKtHSY_BmxLyZb9Q | oOAy7INgRLumTV3vZWtVM | 2 | 127 | 1 | 1 | 8 | 5to10 | 7.5 | ⋯ | No | Yes | No | Yes | No | Yes | 0 | 0 | 15 | |
10 | 3 | pyTvUi2W85Nb9handPSvZ | gD_JugCFB6D5iV-JIOI0p | 2 | 209 | 1 | 1 | 9 | 5to10 | 7.5 | ⋯ | No | Yes | No | Yes | No | Yes | 0 | 0 | 15 | |
12 | 3 | pyTvUi2W85Nb9handPSvZ | 4gIe5sni3WJJCzNglSdo8 | 4 | 756 | 1 | 1 | 9 | 5to10 | 7.5 | ⋯ | Yes | Yes | Yes | Yes | Yes | Yes | C1:1added all left;C2:1added all left,C3:1added,C4:1added | 4 | 0 | 19 |
cat("Time Overall: ", sum(data$Time), '\n')
cat("Overall Mean Time: ", mean(data$Time), '\n')
cat("Overall SD Time: ", sd(data$Time), '\n')
cat('_____________________________________________', '\n')
cat("Time in Annotations: ", sum(Annotations$Time), "Time in NoAnnotations: ", sum(NoAnnotations$Time), '\n')
cat("Mean Annotations: ", mean(Annotations$Time), "Mean NoAnnotations: ", mean(NoAnnotations$Time), '\n')
cat("SD Annotations: ", sd(Annotations$Time), "SD NoAnnotations: ", sd(NoAnnotations$Time), '\n')
cat('_____________________________________________', '\n')
Time Overall: 129454 Overall Mean Time: 789.3537 Overall SD Time: 794.4597 _____________________________________________ Time in Annotations: 61913 Time in NoAnnotations: 67541 Mean Annotations: 755.0366 Mean NoAnnotations: 823.6707 SD Annotations: 868.4611 SD NoAnnotations: 716.6 _____________________________________________
Forty-one participants were recruited. All except four had experience of at least one systematic review and all but eight were familiar with the Cochrane Risk of Bias tool. Twenty listed more than one task on how they contributed to previsous systematic reviews.
A mean of 755 seconds (SD 868) were taken for semi-automated bias assesmsents and 824 seconds (SD 717) for manual assessments ( $p<0.001$). Participants spent in total 129454 seconds ($Mean$=789.35, $SD$=794.46) to complete the study together in both conditions semi-automated and manual.
The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting $\lambda =$ -0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale.
# Quick look at the data
plotNormalHistogram(data$Time)
# cheching the data transformation:
T_tuk = transformTukey(data$Time, plotit=FALSE)
lambda W Shapiro.p.value 395 -0.15 0.9907 0.3626 if (lambda > 0){TRANS = x ^ lambda} if (lambda == 0){TRANS = log(x)} if (lambda < 0){TRANS = -1 * x ^ lambda}
# plotting the transformed data
plotNormalHistogram(T_tuk)
# Now using the log of the data and plotting it:
plotNormalHistogram(T_log)
The Tukey Ladder of Powers was used to transform the response variable (time) to come closer to a normal distribution. The resulting $\lambda =$ -0.15 is close to 0 which confirms that the optimal transformation according to the Tukey Ladder of Powers is indeed a logarithmic operation. For the following mixed model analysis the timing data was therefore transformed to a logarithmic scale.
A linear mixed effects model was used to examine the associations between the log transformed time response, the semi-automatic or manual condition, the order in which a document was randomly presented and self reported characteristics of the reviewers.
# first add the log time to the data frame
data$logT <- log(data$Time)
# create the primary model: Log(Time) by Condition as fixed effects and ID as random effect:
ml.p = lmer(logT ~ Condition + (1 | ID), data=data)
summary(ml.p)
Linear mixed model fit by REML ['lmerMod'] Formula: logT ~ Condition + (1 | ID) Data: data REML criterion at convergence: 374.2 Scaled residuals: Min 1Q Median 3Q Max -2.29905 -0.63291 0.02993 0.55999 2.47915 Random effects: Groups Name Variance Std.Dev. ID (Intercept) 0.2085 0.4566 Residual 0.4277 0.6540 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 6.4678 0.1015 63.73 Condition -0.2832 0.1021 -2.77 Correlation of Fixed Effects: (Intr) Condition -0.503
# looking at coefficients by ID
coef(ml.p)$ID
(Intercept) | Condition | |
---|---|---|
-JjAzwjQakZk-3kIbeMfw | 6.342909 | -0.2831514 |
-O8MP5AR-esmcnFcENyoF | 6.402884 | -0.2831514 |
-YoLEIFk1XZUJo3BgVmlp | 6.723705 | -0.2831514 |
2K-K26QZH9xPJH9RVMltM | 6.419611 | -0.2831514 |
2snvok14kLm_dRIdL67PF | 6.223803 | -0.2831514 |
3CDOco99k3UGgia7A9LQo | 7.275245 | -0.2831514 |
4dkYUV8wlkm_Q5miw0qCJ | 6.865738 | -0.2831514 |
5h9rXuHepqDSmetOKXM0J | 6.557449 | -0.2831514 |
5SAqbCbqZP_q7hVWTEJdL | 5.615216 | -0.2831514 |
8arsi19w5sMwMmpkmM0sX | 6.415119 | -0.2831514 |
C07qGfUuD1L-1IoLjtOXY | 6.572443 | -0.2831514 |
C4xWyqDJrsHcIoHU5GtUG | 7.029055 | -0.2831514 |
Ci3GWDLAOc5ANxkPpNOBc | 6.409193 | -0.2831514 |
cs1qlmKFvdkFjUcXEkJgT | 6.431466 | -0.2831514 |
FfUZhXrHdKSHZ3ON2_Dty | 6.667310 | -0.2831514 |
fQoBVJ60uF5ehuYNAg4v2 | 6.417978 | -0.2831514 |
FXvg7iTf7P1bl-Z9dHKPO | 6.431382 | -0.2831514 |
GR96RbbiPLL4gS3-ilxky | 7.259352 | -0.2831514 |
hNQKTiKtHSY_BmxLyZb9Q | 6.227614 | -0.2831514 |
j-6IWj13fqVcUXKjFCo7y | 6.202480 | -0.2831514 |
jBFO8AR45EErIQ5grz5n- | 6.324240 | -0.2831514 |
jYha2hNra2RfUfgXnVaj1 | 6.436849 | -0.2831514 |
KlKCr9O8A-8WK6gr2vK9k | 6.455434 | -0.2831514 |
KtfB3jYg9jsg0osJkMWVG | 5.920122 | -0.2831514 |
L-KBZ1ZC0lK2hQlDSUCgH | 6.018265 | -0.2831514 |
lllK6Q_LLhHbw7bTJ1v51 | 6.135699 | -0.2831514 |
pyTvUi2W85Nb9handPSvZ | 6.191651 | -0.2831514 |
Rh7QrcKwEzMEORahevriS | 6.846711 | -0.2831514 |
S2oI3sQr3HnSJ7rgbaMJR | 6.692770 | -0.2831514 |
sLUZSWii-lFp5oEFHCd7f | 6.310985 | -0.2831514 |
to4PqjXHBIG3bKQhv9cHr | 6.173613 | -0.2831514 |
tRdDoOfbjIQw3tUkGktky | 6.736527 | -0.2831514 |
TTtTg_voG4fV-uDraAWC3 | 6.302266 | -0.2831514 |
UbIO7NW7RAUjVfe3_VJ48 | 6.232454 | -0.2831514 |
uq4HMVa0zTYYknq8_4SOc | 7.464664 | -0.2831514 |
UyqDH9vL0gK918t2bEj0z | 6.266225 | -0.2831514 |
V9_3tiOUC-jab7xnPK1HK | 6.569870 | -0.2831514 |
WpFHCjeR-3YdyEJJmCT2Y | 6.208212 | -0.2831514 |
ynh1N1_BZO8W4d0_gaMDy | 6.345564 | -0.2831514 |
ZAj9Y0K4px17TAcPf3uYG | 6.092636 | -0.2831514 |
ZXpghXhauDD3Vpv76PGoj | 6.966419 | -0.2831514 |
# the mean values
coef(summary(ml.p))[,"Estimate"]
# establishing the confidence intervalls:
confint(ml.p)
Computing profile confidence intervals ...
2.5 % | 97.5 % | |
---|---|---|
.sig01 | 0.3074827 | 0.62121739 |
.sigma | 0.5777147 | 0.74206725 |
(Intercept) | 6.2682066 | 6.66745826 |
Condition | -0.4840935 | -0.08220934 |
The confidence intervals are -0.48 to -0.08 -> exp(-0.48)=62% to exp(-0.08)=92% )
# main speed up = 100 - 0.75 = 25%
exp(-.28)
The primary model ($m_p$) took the log transformed time and the condition semi-automated or manual) as fixed effects and the individuals as random effect into account. Participants performing bias assessments in the semi-automated condition were on average 25% quicker than the participants in the manual condition (95% CI 62% to 92%).
In addition, model $m_1$ took also the Order as fixed effect and the document as random effect into account. whereas $m_2$ did only account for the document as random effect to examine the importance of the Order in the model. A close examination for $m_1$ showed that there was a random intercept for every PDF. The random effect variance of the documents is 0.02 (about 1/10th) as high as from the individuals (0.23). Hence, there is more variance across participants than there is across documents. The subsequent likelihood-ratio test showed that the Order was highly significant $\chi^2$(1) = 42.26, $p<$ 0.001. Therefore the Order was kept in the following models.\newline
# What is the effect of the PDF, Order, Condition and Person?
ml.1 = lmer(logT ~ Order + Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)
ml.2 = lmer(logT ~ Condition + (1|ID) + (1|PDF), data=data, REML=FALSE)
summary(ml.1)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF) Data: data AIC BIC logLik deviance df.resid 338.2 356.8 -163.1 326.2 158 Scaled residuals: Min 1Q Median 3Q Max -2.05956 -0.44213 -0.01847 0.60865 2.15485 Random effects: Groups Name Variance Std.Dev. PDF (Intercept) 0.02466 0.1571 ID (Intercept) 0.23839 0.4883 Residual 0.27691 0.5262 Number of obs: 164, groups: PDF, 46; ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.78065 0.13809 41.86 Order 0.27151 0.03781 7.18 Condition -0.27389 0.08498 -3.22 Correlation of Fixed Effects: (Intr) Order Order -0.691 Condition -0.319 0.012
summary(ml.2)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Condition + (1 | ID) + (1 | PDF) Data: data AIC BIC logLik deviance df.resid 378.5 394.0 -184.2 368.5 159 Scaled residuals: Min 1Q Median 3Q Max -2.29806 -0.63552 0.02841 0.55830 2.49665 Random effects: Groups Name Variance Std.Dev. PDF (Intercept) 2.306e-15 4.802e-08 ID (Intercept) 2.016e-01 4.490e-01 Residual 4.243e-01 6.514e-01 Number of obs: 164, groups: PDF, 46; ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 6.4678 0.1005 64.38 Condition -0.2832 0.1017 -2.78 Correlation of Fixed Effects: (Intr) Condition -0.506
# Likelihood-Ratio analysis (ANOVA) to see if the Order is important
anova(ml.1, ml.2)
Df | AIC | BIC | logLik | deviance | Chisq | Chi Df | Pr(>Chisq) | |
---|---|---|---|---|---|---|---|---|
ml.2 | 5 | 378.4641 | 393.9634 | -184.2321 | 368.4641 | NA | NA | NA |
ml.1 | 6 | 338.1962 | 356.7954 | -163.0981 | 326.1962 | 42.26793 | 1 | 7.958708e-11 |
This tells us (and confirms previous overall results) to keep the order:
The Likelihood-Ratio analysis (Anova) between $m_1$ and $m_2$ tells us that the order is highly significant ($p<0.001$).
That is an indication to keep the order in the model.
Let's again take a look at model $m_1$
summary(ml.1)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (1 | ID) + (1 | PDF) Data: data AIC BIC logLik deviance df.resid 338.2 356.8 -163.1 326.2 158 Scaled residuals: Min 1Q Median 3Q Max -2.05956 -0.44213 -0.01847 0.60865 2.15485 Random effects: Groups Name Variance Std.Dev. PDF (Intercept) 0.02466 0.1571 ID (Intercept) 0.23839 0.4883 Residual 0.27691 0.5262 Number of obs: 164, groups: PDF, 46; ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.78065 0.13809 41.86 Order 0.27151 0.03781 7.18 Condition -0.27389 0.08498 -3.22 Correlation of Fixed Effects: (Intr) Order Order -0.691 Condition -0.319 0.012
We have a random intercept for every PDF. We can see that the random effect variance of PDF is about 1/10th as high as the variance from the ID. So there is more variance across people than there is across PDF (document) related.
Let's examine a model $m_3$ (removed the PDF random effect):
ml.3 = lmer(logT ~ Order + Condition + (1|ID), data=data, REML=FALSE)
# again likelihood-ratio analysis (ANOVA) now looking at the difference between m_1 and m_3 :
anova(ml.1, ml.3)
Df | AIC | BIC | logLik | deviance | Chisq | Chi Df | Pr(>Chisq) | |
---|---|---|---|---|---|---|---|---|
ml.3 | 5 | 336.9199 | 352.4193 | -163.4600 | 326.9199 | NA | NA | NA |
ml.1 | 6 | 338.1962 | 356.7954 | -163.0981 | 326.1962 | 0.723746 | 1 | 0.3949179 |
Model $m_3$ did not have the document as random effect in order to see if it had an influence. The following likelihood-ratio comparisons with $m_1$ showed that there was no significance which suggests that there is little variability from documents and by ignoring this does not detract from the explanatory potential of the model $\chi^2$(1) = 0.72, $p=$ 0.39.
There is no significance! This means we do not care so much for the PDF random effect.
In fact we could loose the PDF random effect without destroying the known universe.
So let's see the random effects coefficients:
coef(ml.3)
$ID (Intercept) Order Condition -JjAzwjQakZk-3kIbeMfw 5.646676 0.2701334 -0.2765628 -O8MP5AR-esmcnFcENyoF 5.715102 0.2701334 -0.2765628 -YoLEIFk1XZUJo3BgVmlp 6.081139 0.2701334 -0.2765628 2K-K26QZH9xPJH9RVMltM 5.734187 0.2701334 -0.2765628 2snvok14kLm_dRIdL67PF 5.510783 0.2701334 -0.2765628 3CDOco99k3UGgia7A9LQo 6.710410 0.2701334 -0.2765628 4dkYUV8wlkm_Q5miw0qCJ 6.243188 0.2701334 -0.2765628 5h9rXuHepqDSmetOKXM0J 5.891452 0.2701334 -0.2765628 5SAqbCbqZP_q7hVWTEJdL 4.816425 0.2701334 -0.2765628 8arsi19w5sMwMmpkmM0sX 5.729063 0.2701334 -0.2765628 C07qGfUuD1L-1IoLjtOXY 5.908558 0.2701334 -0.2765628 C4xWyqDJrsHcIoHU5GtUG 6.429523 0.2701334 -0.2765628 Ci3GWDLAOc5ANxkPpNOBc 5.722301 0.2701334 -0.2765628 cs1qlmKFvdkFjUcXEkJgT 5.747713 0.2701334 -0.2765628 FfUZhXrHdKSHZ3ON2_Dty 6.016796 0.2701334 -0.2765628 fQoBVJ60uF5ehuYNAg4v2 5.732324 0.2701334 -0.2765628 FXvg7iTf7P1bl-Z9dHKPO 5.747617 0.2701334 -0.2765628 GR96RbbiPLL4gS3-ilxky 6.692276 0.2701334 -0.2765628 hNQKTiKtHSY_BmxLyZb9Q 5.515131 0.2701334 -0.2765628 j-6IWj13fqVcUXKjFCo7y 5.486455 0.2701334 -0.2765628 jBFO8AR45EErIQ5grz5n- 5.625375 0.2701334 -0.2765628 jYha2hNra2RfUfgXnVaj1 5.753855 0.2701334 -0.2765628 KlKCr9O8A-8WK6gr2vK9k 5.775059 0.2701334 -0.2765628 KtfB3jYg9jsg0osJkMWVG 5.164302 0.2701334 -0.2765628 L-KBZ1ZC0lK2hQlDSUCgH 5.276278 0.2701334 -0.2765628 lllK6Q_LLhHbw7bTJ1v51 5.410262 0.2701334 -0.2765628 pyTvUi2W85Nb9handPSvZ 5.474099 0.2701334 -0.2765628 Rh7QrcKwEzMEORahevriS 6.221480 0.2701334 -0.2765628 S2oI3sQr3HnSJ7rgbaMJR 6.045844 0.2701334 -0.2765628 sLUZSWii-lFp5oEFHCd7f 5.610252 0.2701334 -0.2765628 to4PqjXHBIG3bKQhv9cHr 5.453520 0.2701334 -0.2765628 tRdDoOfbjIQw3tUkGktky 6.095768 0.2701334 -0.2765628 TTtTg_voG4fV-uDraAWC3 5.600304 0.2701334 -0.2765628 UbIO7NW7RAUjVfe3_VJ48 5.520653 0.2701334 -0.2765628 uq4HMVa0zTYYknq8_4SOc 6.926525 0.2701334 -0.2765628 UyqDH9vL0gK918t2bEj0z 5.559183 0.2701334 -0.2765628 V9_3tiOUC-jab7xnPK1HK 5.905622 0.2701334 -0.2765628 WpFHCjeR-3YdyEJJmCT2Y 5.492995 0.2701334 -0.2765628 ynh1N1_BZO8W4d0_gaMDy 5.649705 0.2701334 -0.2765628 ZAj9Y0K4px17TAcPf3uYG 5.361130 0.2701334 -0.2765628 ZXpghXhauDD3Vpv76PGoj 6.358060 0.2701334 -0.2765628 attr(,"class") [1] "coef.mer"
summary(ml.3)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (1 | ID) Data: data AIC BIC logLik deviance df.resid 336.9 352.4 -163.5 326.9 159 Scaled residuals: Min 1Q Median 3Q Max -2.19339 -0.52439 -0.01335 0.53696 2.27474 Random effects: Groups Name Variance Std.Dev. ID (Intercept) 0.2320 0.4817 Residual 0.3027 0.5501 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.78920 0.13663 42.37 Order 0.27013 0.03843 7.03 Condition -0.27656 0.08592 -3.22 Correlation of Fixed Effects: (Intr) Order Order -0.707 Condition -0.322 0.011
The variance of coefficients 0.23
Do different people have a different coefficient when it comes to the Machine learning (random slope)?
One option to examine this is to include the condition also as random effect. So there is an overall mean for the machine learning, but how variable does it seem to be?
# including the condition also as random effect in model m_3
ml.3 = lmer(logT ~ Order + Condition + (Condition | ID), data=data, REML=FALSE)
summary(ml.3)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 328.0 349.7 -157.0 314.0 157 Scaled residuals: Min 1Q Median 3Q Max -2.3985 -0.4850 0.0052 0.5775 2.2962 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1487 0.3856 Condition 0.2000 0.4473 0.29 Residual 0.2361 0.4859 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.76592 0.12264 47.01 Order 0.27940 0.03678 7.60 Condition -0.27634 0.10314 -2.68 Correlation of Fixed Effects: (Intr) Order Order -0.753 Condition -0.138 0.009
There seems to be variability in the slope. So different people seem to react differently to the machine learning condition. It seems that the residual variance has dropped between the first $m_3$ and updated $m_3$ models: It was .40 in the first $m_3$ now in the updated $m_3$ it is .24. Perhaps this model explains a bit more. Let's take a look at the coefficients:
# coefficients by ID
coef(ml.3)$ID
(Intercept) | Order | Condition | |
---|---|---|---|
-JjAzwjQakZk-3kIbeMfw | 5.645644 | 0.2794002 | -0.31868291 |
-O8MP5AR-esmcnFcENyoF | 5.689063 | 0.2794002 | -0.24591965 |
-YoLEIFk1XZUJo3BgVmlp | 6.018559 | 0.2794002 | -0.21255577 |
2K-K26QZH9xPJH9RVMltM | 5.758266 | 0.2794002 | -0.43458312 |
2snvok14kLm_dRIdL67PF | 5.533983 | 0.2794002 | -0.37010571 |
3CDOco99k3UGgia7A9LQo | 6.471687 | 0.2794002 | 0.25956400 |
4dkYUV8wlkm_Q5miw0qCJ | 6.085542 | 0.2794002 | 0.09094654 |
5h9rXuHepqDSmetOKXM0J | 5.825058 | 0.2794002 | -0.14658113 |
5SAqbCbqZP_q7hVWTEJdL | 5.065488 | 0.2794002 | -1.00634718 |
8arsi19w5sMwMmpkmM0sX | 5.665417 | 0.2794002 | -0.11211088 |
C07qGfUuD1L-1IoLjtOXY | 5.907933 | 0.2794002 | -0.39198176 |
C4xWyqDJrsHcIoHU5GtUG | 6.114986 | 0.2794002 | 0.61406152 |
Ci3GWDLAOc5ANxkPpNOBc | 5.721248 | 0.2794002 | -0.33934149 |
cs1qlmKFvdkFjUcXEkJgT | 5.696203 | 0.2794002 | -0.16163754 |
FfUZhXrHdKSHZ3ON2_Dty | 5.907440 | 0.2794002 | -0.02371297 |
fQoBVJ60uF5ehuYNAg4v2 | 5.799538 | 0.2794002 | -0.59194148 |
FXvg7iTf7P1bl-Z9dHKPO | 5.716290 | 0.2794002 | -0.23548329 |
GR96RbbiPLL4gS3-ilxky | 6.409160 | 0.2794002 | 0.42701709 |
hNQKTiKtHSY_BmxLyZb9Q | 5.593803 | 0.2794002 | -0.57432025 |
j-6IWj13fqVcUXKjFCo7y | 5.540374 | 0.2794002 | -0.47586129 |
jBFO8AR45EErIQ5grz5n- | 5.589154 | 0.2794002 | -0.18404910 |
jYha2hNra2RfUfgXnVaj1 | 5.650462 | 0.2794002 | 0.02656450 |
KlKCr9O8A-8WK6gr2vK9k | 5.710093 | 0.2794002 | -0.11988919 |
KtfB3jYg9jsg0osJkMWVG | 5.223300 | 0.2794002 | -0.40611143 |
L-KBZ1ZC0lK2hQlDSUCgH | 5.348445 | 0.2794002 | -0.48501901 |
lllK6Q_LLhHbw7bTJ1v51 | 5.418836 | 0.2794002 | -0.28901104 |
pyTvUi2W85Nb9handPSvZ | 5.496255 | 0.2794002 | -0.35622486 |
Rh7QrcKwEzMEORahevriS | 6.114006 | 0.2794002 | -0.08672533 |
S2oI3sQr3HnSJ7rgbaMJR | 5.908495 | 0.2794002 | 0.07077306 |
sLUZSWii-lFp5oEFHCd7f | 5.602679 | 0.2794002 | -0.28475114 |
to4PqjXHBIG3bKQhv9cHr | 5.475240 | 0.2794002 | -0.34898562 |
tRdDoOfbjIQw3tUkGktky | 6.095288 | 0.2794002 | -0.44385065 |
TTtTg_voG4fV-uDraAWC3 | 5.697105 | 0.2794002 | -0.66402711 |
UbIO7NW7RAUjVfe3_VJ48 | 5.689982 | 0.2794002 | -0.90763414 |
uq4HMVa0zTYYknq8_4SOc | 6.665880 | 0.2794002 | 0.28053689 |
UyqDH9vL0gK918t2bEj0z | 5.595343 | 0.2794002 | -0.43080896 |
V9_3tiOUC-jab7xnPK1HK | 5.849366 | 0.2794002 | -0.18756661 |
WpFHCjeR-3YdyEJJmCT2Y | 5.609388 | 0.2794002 | -0.70630849 |
ynh1N1_BZO8W4d0_gaMDy | 5.831413 | 0.2794002 | -0.98833027 |
ZAj9Y0K4px17TAcPf3uYG | 5.459037 | 0.2794002 | -0.60249247 |
ZXpghXhauDD3Vpv76PGoj | 6.207454 | 0.2794002 | 0.03368090 |
# histogram about those conditions:
hist(coef(ml.3)$ID[,"Condition"])
So there seems to be some people who do it faster and some slower so there seems to be some heterogeneity. Model $m_4$ adds the random effect per person:
ml.4 = lmer(logT ~ Order + Condition + (Condition|ID), data=data, REML=FALSE)
# again likelihood-ratio (ANOVA) to see the differences between models:
anova(ml.4, ml.3)
Df | AIC | BIC | logLik | deviance | Chisq | Chi Df | Pr(>Chisq) | |
---|---|---|---|---|---|---|---|---|
ml.4 | 7 | 328.0184 | 349.7175 | -157.0092 | 314.0184 | NA | NA | NA |
ml.3 | 7 | 328.0184 | 349.7175 | -157.0092 | 314.0184 | 0 | 0 | 1 |
Not significant: $\chi^2$(0) = 0.0, $p=$ 1.
# Do we need the Order if we have the condition as a random effect?
ml.5 = lmer(logT ~ Condition + (Condition|ID), data=data, REML=FALSE)
# Differences in models:
anova(ml.4, ml.5)
Df | AIC | BIC | logLik | deviance | Chisq | Chi Df | Pr(>Chisq) | |
---|---|---|---|---|---|---|---|---|
ml.5 | 6 | 372.7248 | 391.3240 | -180.3624 | 360.7248 | NA | NA | NA |
ml.4 | 7 | 328.0184 | 349.7175 | -157.0092 | 314.0184 | 46.70645 | 1 | 8.245712e-12 |
Yes, significant! The order is still important: $\chi^2$(1) = 46.70, $p<$ 0.001.
# examining model m_4
summary(ml.4)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Order + Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 328.0 349.7 -157.0 314.0 157 Scaled residuals: Min 1Q Median 3Q Max -2.3985 -0.4850 0.0052 0.5775 2.2962 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1487 0.3856 Condition 0.2000 0.4473 0.29 Residual 0.2361 0.4859 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 5.76592 0.12264 47.01 Order 0.27940 0.03678 7.60 Condition -0.27634 0.10314 -2.68 Correlation of Fixed Effects: (Intr) Order Order -0.753 Condition -0.138 0.009
# examining model m_5
summary(ml.5)
Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: logT ~ Condition + (Condition | ID) Data: data AIC BIC logLik deviance df.resid 372.7 391.3 -180.4 360.7 158 Scaled residuals: Min 1Q Median 3Q Max -2.34127 -0.62213 0.05227 0.57998 2.21132 Random effects: Groups Name Variance Std.Dev. Corr ID (Intercept) 0.1069 0.3270 Condition 0.1603 0.4003 0.52 Residual 0.3708 0.6090 Number of obs: 164, groups: ID, 41 Fixed effects: Estimate Std. Error t value (Intercept) 6.46783 0.08444 76.60 Condition -0.28315 0.11381 -2.49 Correlation of Fixed Effects: (Intr) Condition -0.298
Interestingly, both of the models have the same variance with different sign (indication for colinearty?). However, the order is not exactly random. Each prticipant has two documents of each. It is a random ordering. Each participant will have for sure two ML and two Non-ML conditions / documents.
After we remove the order in model $m_5$ the condition variance is roughly the same. Before was -27 now is -28.
# Some descriptitves
# TOTAL first (Annotations & NonAnnotations together)
cat('____________TOTAL DATA:______________________', '\n')
cat("Total Annotations added:", sum(data$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(data$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(data$TotalAdded), '\n')
cat('_____________________________________________', '\n')
cat("Total Annotations deleted:", sum(data$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(data$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(data$TotalDeleted), '\n')
cat('\n')
cat('____________MACHINE LEARNING:________________', '\n')
# Now only for Machine-Learning Condition
cat("Total Annotations added:", sum(Annotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(Annotations$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(Annotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')
cat("Total Annotations deleted:", sum(Annotations$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(Annotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(Annotations$TotalDeleted), '\n')
cat('\n')
cat('____________NON-MACHINE LEARNING: ___________', '\n')
# And for the Non-Machine-Learning Condition
cat("Total Annotations added:", sum(NoAnnotations$TotalAdded), '\n')
cat("Mean Total Annotations added:", mean(NoAnnotations$TotalAdded), '\n')
cat("SD Total Annotations added:", sd(NoAnnotations$TotalAdded), '\n')
cat('_____________________________________________', '\n')
cat("Total Annotations deleted:", sum(NoAnnotations$TotalDeleted), '\n')
cat("Mean Total Annotations deleted:", mean(NoAnnotations$TotalDeleted), '\n')
cat("SD Total Annotations deleted:", sd(NoAnnotations$TotalDeleted), '\n')
____________TOTAL DATA:______________________ Total Annotations added: 486 Mean Total Annotations added: 2.963415 SD Total Annotations added: 3.033333 _____________________________________________ Total Annotations deleted: 127 Mean Total Annotations deleted: 0.7743902 SD Total Annotations deleted: 1.957607 ____________MACHINE LEARNING:________________ Total Annotations added: 103 Mean Total Annotations added: 1.256098 SD Total Annotations added: 1.755492 _____________________________________________ Total Annotations deleted: 127 Mean Total Annotations deleted: 1.54878 SD Total Annotations deleted: 2.549037 ____________NON-MACHINE LEARNING: ___________ Total Annotations added: 383 Mean Total Annotations added: 4.670732 SD Total Annotations added: 3.087429 _____________________________________________ Total Annotations deleted: 0 Mean Total Annotations deleted: 0 SD Total Annotations deleted: 0
# using a subset of the above main data here:
data_s <- read.xls("data/subset_selfreported.xlsx", verbose=FALSE, na.strings=c("NA"))
# and filtering out douplicates
data_s2 <- unique(data_s)
# quick check on the data
head(data_s2)
ID | CExperience | Tasks | NReviews | |
---|---|---|---|---|
1 | -JjAzwjQakZk-3kIbeMfw | 1 | 9 | 10+ |
5 | hNQKTiKtHSY_BmxLyZb9Q | 1 | 8 | 5to10 |
9 | pyTvUi2W85Nb9handPSvZ | 1 | 9 | 5to10 |
13 | 2K-K26QZH9xPJH9RVMltM | 1 | 9 | 10+ |
17 | 8arsi19w5sMwMmpkmM0sX | 1 | 9 | 10+ |
21 | GR96RbbiPLL4gS3-ilxky | 1 | 9 | 1to5 |
# Calculating Descriptive Self-reported characteristics data:
NumberOfReviews0 <- sum(data_s2$NReviews=='0')
NumberOfReviews1_5 <- sum(data_s2$NReviews=='1to5')
NumberOfReviews5_0 <- sum(data_s2$NReviews=='5to10')
NumberOfReviews10plus <- sum(data_s2$NReviews=='10+')
NumberOfReviews0
NumberOfReviews1_5
NumberOfReviews5_0
NumberOfReviews10plus
NumberOfReviews0_percentage <- NumberOfReviews0/41
NumberOfReviews1_5_percentage <- NumberOfReviews1_5/41
NumberOfReviews5_0_percentage <- NumberOfReviews5_0/41
NumberOfReviews10plus_percentage <- NumberOfReviews10plus/41
NumberOfReviews0_percentage
NumberOfReviews1_5_percentage
NumberOfReviews5_0_percentage
NumberOfReviews10plus_percentage
TotalSumOfPeopleWithCochraneExperience <- sum(data_s2$CExperience=='1')
TotalSumOfPeopleWithoutCochraneExperience <- sum(data_s2$CExperience=='0')
TotalSumOfPeopleWithCochraneExperience
TotalSumOfPeopleWithoutCochraneExperience
Percentage_TotalSumOfPeopleWithCochraneExperience <- TotalSumOfPeopleWithCochraneExperience/41
Percentage_TotalSumOfPeopleWithoutCochraneExperience <- TotalSumOfPeopleWithoutCochraneExperience/41
Percentage_TotalSumOfPeopleWithCochraneExperience
Percentage_TotalSumOfPeopleWithoutCochraneExperience
MedianOfTasksPerfomed <- median(data_s2$Tasks)
IRQofTasksPerformed <- quantile(data_s2$Tasks)
MedianOfTasksPerfomed
IRQofTasksPerformed
# Importing the data
data_agreement <- read.xls("data/agreement.xlsx", verbose=FALSE, na.strings=c("NA"))
# creating two groups (ML (1) and NonML (0) )
ML <- subset(data_agreement, Condition=='1')
NoML <- subset(data_agreement, Condition=='0')
# checking how many datapoints in one column:
count(ML)
n |
---|
82 |
# calculating the changed data and percentiles:
RSG <- sum(ML$Changed)
cat("RSG changed:", RSG, "RSG %:", RSG/82, '\n')
AC <- sum(ML$Changed.1)
cat("AC changed:", AC, "AC %:", AC/82, '\n')
BPP <- sum(ML$Changed.2)
cat("BPP changed:", BPP, "BPP %:", BPP/82, '\n')
BOA <- sum(ML$Changed.3)
cat("BOA changed:", BOA, "BOA %:", BOA/82, '\n')
Overall <- RSG+AC+BPP+BOA
cat("Overall changed:", Overall, "RSG %:", Overall/328, '\n')
RSG changed: 7 RSG %: 0.08536585 AC changed: 7 AC %: 0.08536585 BPP changed: 6 BPP %: 0.07317073 BOA changed: 7 BOA %: 0.08536585 Overall changed: 27 RSG %: 0.08231707
# Overall mean annotations
data <- read.xls("data/TimeAnalysis2_1.xlsx", verbose=FALSE, na.strings=c("NA"))
data2 <- subset(data, Condition=='1')
data3 <- subset(data, Condition=='0')
mean(data2$TotalSubmitted)
mean(data3$TotalSubmitted)
##############
# annotations
data <- read.xls("data/annotationschanged.xlsx", verbose=FALSE, na.strings=c("NA"))
data2 <- subset(data, Condition=='1')
data3 <- subset(data, Condition=='0')
RSG1 <- sum(data2$Changed1=='0')
RSG2 <- sum(data2$Changed1=='1')
RSG3 <- sum(data2$Changed1=='2')
RSG1/82
RSG2/82
RSG3/82
AC1 <- sum(data2$Changed2=='0')
AC2 <- sum(data2$Changed2=='1')
AC3 <- sum(data2$Changed2=='2')
AC1/82
AC2/82
AC3/82
BPP1 <- sum(data2$Changed3=='0')
BPP2 <- sum(data2$Changed3=='1')
BPP3 <- sum(data2$Changed3=='2')
BPP1/82
BPP2/82
BPP3/82
BOA1 <- sum(data2$Changed4=='0')
BOA2 <- sum(data2$Changed4=='1')
BOA3 <- sum(data2$Changed4=='2')
BOA1/82
BOA2/82
BOA3/82
TotalUnchanged <- RSG1+AC1+BPP1+BOA1
TotalUnchanged/328
TotalChangedML <- RSG2+AC2+BPP2+BOA2
TotalChangedML/328
TotalChangedNoML <- RSG3+AC3+BPP3+BOA3
TotalChangedNoML/328
# loading the data:
dataq <- read.xls("data/UXData1.xlsx", verbose=FALSE, na.strings=c("NA"))
# subsetting
data_rel <- as.data.frame(dataq)
ss2 <- c(1,3,4, 25:44)
data_q2 <- subset(data_rel, select=ss2)
# checking the data
head(data_q2)
ParticipantNo | Sequence | Condition | CapacityReview | NoOfTasksPerformed | HowManyReviews | CochraneRoBExp | UseFrequently | Complex | EasyToUse | ⋯ | UseQuickly | Cumbersome | Confident | NeededLearn | TextHelpful | DifficultToNav | ImproveQuality | Irrelevant | Confused | ContinueUse |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1010 | A | develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review | 9 | 10+ | Yes | 5 | 1 | 4 | ⋯ | 5 | 1 | 5 | 1 | 4 | 1 | 4 | 3 | 2 | 5 |
1 | 1010 | NOA | NA | NA | NA | NA | ⋯ | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||
2 | 1100 | A | planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review | 8 | 5to10 | Yes | 5 | 1 | 4 | ⋯ | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 3 | 1 | 5 |
2 | 1100 | NOA | NA | NA | NA | NA | ⋯ | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||
3 | 101 | A | develop questions; planning methods or write and publish protocols; develop and run search; select studies; collect data; assess RoB; analyse data; interprete findings; write and publish review | 9 | 5to10 | Yes | 5 | 4 | 5 | ⋯ | 5 | 2 | 2 | 2 | 5 | 1 | 5 | 2 | 2 | 5 |
3 | 101 | NOA | NA | NA | NA | NA | ⋯ | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
ss3 <- c(1, 3, 8:23)
clean_q2 <- subset(data_q2, select=ss3)
head(clean_q2)
ParticipantNo | Condition | UseFrequently | Complex | EasyToUse | NeedSupport | WellIntegrated | Inconsistency | UseQuickly | Cumbersome | Confident | NeededLearn | TextHelpful | DifficultToNav | ImproveQuality | Irrelevant | Confused | ContinueUse |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | A | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 | 4 | 1 | 4 | 3 | 2 | 5 |
1 | NOA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
2 | A | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 3 | 1 | 5 |
2 | NOA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
3 | A | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 | 5 | 1 | 5 | 2 | 2 | 5 |
3 | NOA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
clean_q2 <- subset(clean_q2, Condition=="A")
head(clean_q2)
ParticipantNo | Condition | UseFrequently | Complex | EasyToUse | NeedSupport | WellIntegrated | Inconsistency | UseQuickly | Cumbersome | Confident | NeededLearn | TextHelpful | DifficultToNav | ImproveQuality | Irrelevant | Confused | ContinueUse | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | A | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 | 4 | 1 | 4 | 3 | 2 | 5 |
3 | 2 | A | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 3 | 1 | 5 |
5 | 3 | A | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 | 5 | 1 | 5 | 2 | 2 | 5 |
7 | 4 | A | 5 | 2 | 5 | 4 | 5 | 2 | 5 | 5 | 4 | 2 | 5 | 2 | 4 | 2 | 2 | 5 |
9 | 5 | A | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 1 | 4 | 1 | 3 | 1 | 3 | 3 | 2 | 4 |
11 | 6 | A | 5 | 3 | 3 | 3 | 2 | 4 | 5 | 2 | 1 | 1 | 2 | 5 | 5 | 1 | 2 | 5 |
clean_q2_A <- subset(clean_q2, Condition=='A')
clean_q2_NOA <- subset(clean_q2, Condition=='NOA')
head(clean_q2_A)
ParticipantNo | Condition | UseFrequently | Complex | EasyToUse | NeedSupport | WellIntegrated | Inconsistency | UseQuickly | Cumbersome | Confident | NeededLearn | TextHelpful | DifficultToNav | ImproveQuality | Irrelevant | Confused | ContinueUse | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | A | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 | 4 | 1 | 4 | 3 | 2 | 5 |
3 | 2 | A | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 3 | 1 | 5 |
5 | 3 | A | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 | 5 | 1 | 5 | 2 | 2 | 5 |
7 | 4 | A | 5 | 2 | 5 | 4 | 5 | 2 | 5 | 5 | 4 | 2 | 5 | 2 | 4 | 2 | 2 | 5 |
9 | 5 | A | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 1 | 4 | 1 | 3 | 1 | 3 | 3 | 2 | 4 |
11 | 6 | A | 5 | 3 | 3 | 3 | 2 | 4 | 5 | 2 | 1 | 1 | 2 | 5 | 5 | 1 | 2 | 5 |
str(clean_q2_A)
'data.frame': 41 obs. of 18 variables: $ ParticipantNo : int 1 2 3 4 5 6 7 8 9 10 ... $ Condition : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ... $ UseFrequently : int 5 5 5 5 5 5 4 1 2 4 ... $ Complex : int 1 1 4 2 2 3 1 3 2 1 ... $ EasyToUse : int 4 4 5 5 5 3 5 4 2 4 ... $ NeedSupport : int 1 1 2 4 2 3 2 1 1 1 ... $ WellIntegrated: int 4 5 5 5 5 2 4 4 4 4 ... $ Inconsistency : int 2 1 2 2 1 4 3 1 3 1 ... $ UseQuickly : int 5 5 5 5 4 5 4 3 3 4 ... $ Cumbersome : int 1 2 2 5 1 2 2 4 2 1 ... $ Confident : int 5 5 2 4 4 1 4 2 3 4 ... $ NeededLearn : int 1 2 2 2 1 1 2 2 3 1 ... $ TextHelpful : int 4 5 5 5 3 2 3 5 4 3 ... $ DifficultToNav: int 1 1 1 2 1 5 2 3 1 1 ... $ ImproveQuality: int 4 4 5 4 3 5 3 3 5 2 ... $ Irrelevant : int 3 3 2 2 3 1 4 2 2 4 ... $ Confused : int 2 1 2 2 2 2 3 2 2 4 ... $ ContinueUse : int 5 5 5 5 4 5 4 4 2 4 ...
# need to change this to factors!
clean_q2_A$UseFrequently = factor(clean_q2_A$UseFrequently,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Complex = factor(clean_q2_A$Complex,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$EasyToUse = factor(clean_q2_A$EasyToUse,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$NeedSupport = factor(clean_q2_A$NeedSupport,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$WellIntegrated = factor(clean_q2_A$WellIntegrated,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Inconsistency = factor(clean_q2_A$Inconsistency,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$UseQuickly = factor(clean_q2_A$UseQuickly,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Cumbersome = factor(clean_q2_A$Cumbersome,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Confident = factor(clean_q2_A$Confident,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$NeededLearn = factor(clean_q2_A$NeededLearn,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$TextHelpful = factor(clean_q2_A$TextHelpful,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$DifficultToNav = factor(clean_q2_A$DifficultToNav,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$ImproveQuality = factor(clean_q2_A$ImproveQuality,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Irrelevant = factor(clean_q2_A$Irrelevant,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$Confused = factor(clean_q2_A$Confused,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
clean_q2_A$ContinueUse = factor(clean_q2_A$ContinueUse,
levels = c("1", "2", "3", "4", "5"), ordered = TRUE)
# checkl again
str(clean_q2_A)
'data.frame': 41 obs. of 18 variables: $ ParticipantNo : int 1 2 3 4 5 6 7 8 9 10 ... $ Condition : Factor w/ 2 levels "A","NOA": 1 1 1 1 1 1 1 1 1 1 ... $ UseFrequently : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 5 5 4 1 2 4 ... $ Complex : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 4 2 2 3 1 3 2 1 ... $ EasyToUse : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 5 5 3 5 4 2 4 ... $ NeedSupport : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 2 4 2 3 2 1 1 1 ... $ WellIntegrated: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 5 2 4 4 4 4 ... $ Inconsistency : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 1 4 3 1 3 1 ... $ UseQuickly : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 3 3 4 ... $ Cumbersome : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 5 1 2 2 4 2 1 ... $ Confident : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 2 4 4 1 4 2 3 4 ... $ NeededLearn : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 2 2 2 1 1 2 2 3 1 ... $ TextHelpful : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 5 5 5 3 2 3 5 4 3 ... $ DifficultToNav: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 2 1 5 2 3 1 1 ... $ ImproveQuality: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 4 4 5 4 3 5 3 3 5 2 ... $ Irrelevant : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 3 3 2 2 3 1 4 2 2 4 ... $ Confused : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 2 2 2 2 3 2 2 4 ... $ ContinueUse : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 5 5 4 5 4 4 2 4 ...
summary(clean_q2_A)
ParticipantNo Condition UseFrequently Complex EasyToUse NeedSupport Min. : 1 A :41 1 : 1 1 :24 1 : 3 1 :25 1st Qu.:11 NOA: 0 2 : 5 2 :10 2 : 1 2 :11 Median :21 3 : 8 3 : 4 3 : 6 3 : 2 Mean :21 4 :10 4 : 1 4 :13 4 : 1 3rd Qu.:31 5 :15 5 : 0 5 :16 5 : 0 Max. :41 NA's: 2 NA's: 2 NA's: 2 NA's: 2 WellIntegrated Inconsistency UseQuickly Cumbersome Confident NeededLearn 1 : 1 1 :15 1 : 0 1 :20 1 : 2 1 :17 2 : 2 2 :13 2 : 2 2 :11 2 : 5 2 :15 3 : 7 3 : 7 3 : 2 3 : 5 3 : 8 3 : 3 4 :17 4 : 4 4 :17 4 : 2 4 :12 4 : 2 5 :12 5 : 0 5 :18 5 : 1 5 :12 5 : 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 TextHelpful DifficultToNav ImproveQuality Irrelevant Confused ContinueUse 1 : 1 1 :24 1 : 0 1 : 4 1 :12 1 : 0 2 : 4 2 : 7 2 : 6 2 :12 2 :16 2 : 5 3 :12 3 : 4 3 :13 3 :11 3 : 2 3 : 7 4 :12 4 : 3 4 :11 4 : 8 4 : 8 4 :12 5 :10 5 : 1 5 : 9 5 : 4 5 : 1 5 :15 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2
#need to remove the 'out of level columns'
ss4 <- c(3:18)
final_data <- subset(clean_q2_A, select=ss4)
summary(final_data)
UseFrequently Complex EasyToUse NeedSupport WellIntegrated Inconsistency 1 : 1 1 :24 1 : 3 1 :25 1 : 1 1 :15 2 : 5 2 :10 2 : 1 2 :11 2 : 2 2 :13 3 : 8 3 : 4 3 : 6 3 : 2 3 : 7 3 : 7 4 :10 4 : 1 4 :13 4 : 1 4 :17 4 : 4 5 :15 5 : 0 5 :16 5 : 0 5 :12 5 : 0 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 UseQuickly Cumbersome Confident NeededLearn TextHelpful DifficultToNav 1 : 0 1 :20 1 : 2 1 :17 1 : 1 1 :24 2 : 2 2 :11 2 : 5 2 :15 2 : 4 2 : 7 3 : 2 3 : 5 3 : 8 3 : 3 3 :12 3 : 4 4 :17 4 : 2 4 :12 4 : 2 4 :12 4 : 3 5 :18 5 : 1 5 :12 5 : 2 5 :10 5 : 1 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 NA's: 2 ImproveQuality Irrelevant Confused ContinueUse 1 : 0 1 : 4 1 :12 1 : 0 2 : 6 2 :12 2 :16 2 : 5 3 :13 3 :11 3 : 2 3 : 7 4 :11 4 : 8 4 : 8 4 :12 5 : 9 5 : 4 5 : 1 5 :15 NA's: 2 NA's: 2 NA's: 2 NA's: 2
results <- likert(final_data)
# Legend: 5 = Strongly agree 1 = Stronly disagree
plot(results, type='bar')
# Alternative hearmap graph
plot(results,
type="heat",
low.color = "white",
high.color = "blue",
text.color = "black",
text.size = 4,
wrap = 50)
# checkng the distributions
plot(results,
type="density",
facet = TRUE,
bw = 0.5)
Jason Bryer, J. and Speerschneider, K. Package ‘likert’. cran.r-project.org/web/packages/likert/likert.pdf.
r1 <- c(1, 3:12)
sus <- subset(clean_q2_A, select=r1)
head(sus)
ParticipantNo | UseFrequently | Complex | EasyToUse | NeedSupport | WellIntegrated | Inconsistency | UseQuickly | Cumbersome | Confident | NeededLearn | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 |
3 | 2 | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 |
5 | 3 | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 |
7 | 4 | 5 | 2 | 5 | 4 | 5 | 2 | 5 | 5 | 4 | 2 |
9 | 5 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 1 | 4 | 1 |
11 | 6 | 5 | 3 | 3 | 3 | 2 | 4 | 5 | 2 | 1 | 1 |
sus
ParticipantNo | UseFrequently | Complex | EasyToUse | NeedSupport | WellIntegrated | Inconsistency | UseQuickly | Cumbersome | Confident | NeededLearn | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 |
3 | 2 | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 |
5 | 3 | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 |
7 | 4 | 5 | 2 | 5 | 4 | 5 | 2 | 5 | 5 | 4 | 2 |
9 | 5 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 1 | 4 | 1 |
11 | 6 | 5 | 3 | 3 | 3 | 2 | 4 | 5 | 2 | 1 | 1 |
13 | 7 | 4 | 1 | 5 | 2 | 4 | 3 | 4 | 2 | 4 | 2 |
15 | 8 | 1 | 3 | 4 | 1 | 4 | 1 | 3 | 4 | 2 | 2 |
17 | 9 | 2 | 2 | 2 | 1 | 4 | 3 | 3 | 2 | 3 | 3 |
19 | 10 | 4 | 1 | 4 | 1 | 4 | 1 | 4 | 1 | 4 | 1 |
21 | 11 | 5 | 1 | 5 | 1 | 5 | 1 | 5 | 1 | 5 | 1 |
23 | 12 | 3 | 1 | 5 | 1 | 5 | 1 | 4 | 1 | 4 | 1 |
25 | 13 | 3 | 1 | 3 | 1 | 4 | 1 | 4 | 3 | 3 | 1 |
27 | 14 | 3 | 3 | 3 | 1 | 4 | 1 | 5 | 2 | 5 | 1 |
29 | 15 | 2 | 1 | 5 | 1 | 5 | 2 | 4 | 1 | 5 | 1 |
31 | 16 | 3 | 1 | 4 | 2 | 5 | 1 | 4 | 1 | 3 | 2 |
33 | 17 | 2 | 2 | 4 | 1 | 2 | 2 | 4 | 3 | 3 | 2 |
35 | 18 | 5 | 1 | 5 | 1 | 3 | 3 | 5 | 1 | 5 | 4 |
37 | 19 | 5 | 2 | 3 | 2 | 4 | 4 | 4 | 2 | 4 | 1 |
39 | 20 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
41 | 21 | 3 | 1 | 4 | 2 | 3 | 3 | 4 | 1 | 3 | 3 |
43 | 22 | 3 | 3 | 3 | 2 | 3 | 4 | 2 | 3 | 5 | 1 |
45 | 23 | 3 | 2 | 4 | 1 | 3 | 1 | 4 | 2 | 4 | 2 |
47 | 24 | 4 | 1 | 5 | 1 | 4 | 2 | 5 | 1 | 4 | 2 |
49 | 25 | 4 | 1 | 1 | 1 | 1 | 4 | 4 | 2 | 3 | 2 |
51 | 26 | 5 | 1 | 5 | 1 | 5 | 2 | 5 | 1 | 5 | 1 |
53 | 27 | 5 | 1 | 5 | 2 | 4 | 2 | 5 | 1 | 5 | 3 |
55 | 28 | 4 | 2 | 4 | 1 | 4 | 2 | 4 | 4 | 2 | 2 |
57 | 29 | 4 | 1 | 1 | 1 | 4 | 1 | 4 | 1 | 5 | 2 |
59 | 30 | 5 | 1 | 5 | 1 | 5 | 1 | 5 | 1 | 5 | 1 |
61 | 31 | 2 | 2 | 4 | 3 | 4 | 2 | 4 | 2 | 1 | 5 |
63 | 32 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
65 | 33 | 4 | 1 | 1 | 2 | 3 | 3 | 4 | 1 | 2 | 1 |
67 | 34 | 5 | 1 | 4 | 2 | 4 | 2 | 5 | 2 | 4 | 5 |
69 | 35 | 5 | 1 | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 1 |
71 | 36 | 4 | 1 | 5 | 1 | 5 | 1 | 5 | 1 | 3 | 2 |
73 | 37 | 3 | 2 | 4 | 1 | 3 | 3 | 4 | 3 | 4 | 2 |
75 | 38 | 4 | 1 | 3 | 1 | 4 | 1 | 5 | 1 | 4 | 1 |
77 | 39 | 5 | 1 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 4 |
79 | 40 | 2 | 2 | 4 | 1 | 3 | 3 | 2 | 3 | 2 | 2 |
81 | 41 | 4 | 1 | 5 | 1 | 4 | 2 | 5 | 1 | 3 | 1 |
sus_ready <- read.csv("data/sus_calculation.csv")
head(sus_ready)
Participant | q1 | q2 | q3 | q4 | q5 | q6 | q7 | q8 | q9 | q10 | SUS.Score |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 5 | 1 | 4 | 1 | 4 | 2 | 5 | 1 | 5 | 1 | 92.5 |
2 | 5 | 1 | 4 | 1 | 5 | 1 | 5 | 2 | 5 | 2 | 92.5 |
3 | 5 | 4 | 5 | 2 | 5 | 2 | 5 | 2 | 2 | 2 | 75.0 |
4 | 5 | 2 | 5 | 4 | 5 | 2 | 5 | 5 | 4 | 2 | 72.5 |
5 | 5 | 2 | 5 | 2 | 5 | 1 | 4 | 1 | 4 | 1 | 90.0 |
6 | 5 | 3 | 3 | 3 | 2 | 4 | 5 | 2 | 1 | 1 | 57.5 |
g <- c(2:11)
ggplot(data=sus_ready, aes(x=Participant, y=SUS.Score, group=1)) +
geom_point()+
geom_point()
mean(sus_ready$SUS.Score)
The actual SUS calculation is in sus_calculation.xlsx
SUS: $60-70=ok$, $70-80=good$, $80+ =excellent$