$condition12 <- as.factor(data_scalescomputed$condition12)
data_scalescomputed
%>%
data_scalescomputed ggplot(aes(x = condition12, y = scale1_index, fill = condition12)) +
geom_boxplot(alpha = .5) +
geom_jitter(alpha = .5, width = 0.2)
7 Infer
8 Moving into analyses
thoughts from Jenny : I wonder whether it would be useful to structure this in terms of lets combine skills from previous chapters to..
- ask a question,
- get plots/descriptives
- then answer it with inferentials?
- And repeat that process for t-test, linear regression and anova?
8.1 Plots and descriptives
Now, let’s test whether this scale varies by condition (condition 1 vs. condition 2 in condition12).
Before jumping into t-tests, let’s first visualise the data (scale1_index) to get a sense of what it looks like.
The first command here tells R to treat condition12 as a categorical variable that groups the data (technically, a factor). This is helpful for plotting. The code uses the dataframe$variablename
structure to do a command on just one variable.
The second command pipes the data into a command to create a boxplot (geom_boxplot
) with the datapoints plotted as well (geom_jitter
).The width
command makes the plotting of the dots a bit narrower to help with interpretation.
Based on the boxplot, we’d expect a t-test to be nonsignificant - the wide boxes of the plots for each condition overlap, even though the mean of condition 2 is slightly higher than condition 1 (the horizontal line through the box).
8.2 t-test
The first line of code uses the base R command to run a t-test. It tells R to do the test on the data_scalescomputed
data frame, asking whether scale1_index
varies by (~) condition. var.
sets the assumption of variance and conf.level
sets the desired alpha of the test. That command is wrapped in the handy t_apa
function, which makes the output much easier to read (and pull directly into a write-up!), and also allows us to get a confidence interval on the effect size (es_ci
).
For reporting, you’d probably also want to know the means and standard deviations by condition. The next bit of code asks R to do this, using the summarise
function, separately for each condition. Reminder to use na.rm
here to account for any missing values.
The gt()
command shows the requested dataframe in a nice formatting.
t_apa(t.test(data = data_scalescomputed, scale1_index ~ condition12, var. = TRUE, conf.level = .95), es_ci = TRUE)
<- data_scalescomputed %>%
scale1_by_condition12 group_by(condition12) %>%
summarise(mean_scale1 = mean(scale1_index, na.rm = TRUE),
sd_scale1 = sd(scale1_index, na.rm = TRUE))
gt(scale1_by_condition12)
9 Linear regression
lw_addnarrativehere
- outcome: variable_6
- step 1: demographics_categ
- step 2: variable_4
- step 3: scale1_index
this asks the lm package to use predictor to predict outcome
lm(outcome ~ predictor )
<- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4 + scale1_index)
regression1
summary(regression1)
lw_addnarrativehere simulates hierarchical stepwise regression?
<- lm(data = data_scalescomputed, variable6 ~ demographicscateg)
regressionlm1 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4)
regressionlm2 <- lm(data = data_scalescomputed, variable6 ~ demographicscateg + variable4 + scale1_index)
regressionlm3
summary(regressionlm1)
summary(regressionlm2)
summary(regressionlm3)
apa.reg.table(regressionlm1, regressionlm2, regressionlm3, filename = here("output_files","RegressionTable3_APA.doc"), table.number = 3)
report(regressionlm3)
10 Analysis of Variance (ANOVA)
lw add narrative
4 (between participants - condition1234) x2 (between participants - individual difference categorical variable - demographicscateg) design
outcome: scale1_index
## first tell R that condition and demographicscateg are categorical (factors)
$condition1234 <- as.factor(data_scalescomputed$condition1234)
data_scalescomputed$demographicscateg <- as.factor(data_scalescomputed$demographicscateg)
data_scalescomputed
## this creates a dataframe that contains the summary statistics (means, standard deviations, etc.)
<- ezStats(data = data_scalescomputed,
scale1_stats dv = scale1_index,
wid = participantid,
between = c("condition1234","demographicscateg"))
print(scale1_stats)
## this carries out an anova with condition1234 and demographicscateg as between-participants factors
## the output includes statistical tests of the main effect of condition1234, the main effect of demographicscateg, and the interaction between the two
<- ezANOVA(data = data_scalescomputed,
scale1_anova dv = scale1_index,
wid = participantid,
between = c("condition1234","demographicscateg"),
return_aov = TRUE)
print(scale1_anova)
## NOTE THAT THE FOLLOWING WOULD BE DONE TO FURTHER INVESTIGATE AN INTERACTION OR A DIFFERENCE IN A MULTI-CATEGORICAL VARIABLE.
## to build the contrasts we want to do (e.g., baseline vs. midpoint for PhD), we need to ask emmeans to create output to see what the rows represent.
<- emmeans(scale1_anova$aov, ~ condition1234 * demographicscateg)
scale1_emm
scale1_emm
## now we can create a set of named vectors that represent each of the 8 means, based on the position in the 8-item list from emmeans.
= c(1,0,0,0,0,0,0,0)
cond1categ1 = c(0,1,0,0,0,0,0,0)
cond2categ1 = c(0,0,1,0,0,0,0,0)
cond3categ1 = c(0,0,0,1,0,0,0,0)
cond4categ1 = c(0,0,0,0,1,0,0,0)
cond1categ2 = c(0,0,0,0,0,1,0,0)
cond2categ2 = c(0,0,0,0,0,0,1,0)
cond3categ2 = c(0,0,0,0,0,0,0,1)
cond4categ2
= c(1,0,0,0,1,0,0,0)
cond1 = c(0,1,0,0,0,1,0,0)
cond2 = c(0,0,1,0,0,0,1,0)
cond3 = c(0,0,0,1,0,0,0,1)
cond4
## now we can ask for contrasts based on these vectors; we will explore the main effect of condition1234 by comparing each of the conditions to one another
<- contrast(scale1_emm, method = list("cond1 - cond2" = cond1 - cond2,
scale1_contrasts "cond1 - cond3" = cond1 - cond3,
"cond1 - cond4" = cond1 - cond4,
"cond2 - cond3" = cond2 - cond3,
"cond2 - cond4" = cond2 - cond4,
"cond3 - cond4" = cond3 - cond4))
scale1_contrasts
## here's a version with confidence intervals instead of t and p values
<- scale1_contrasts %>%
scale1_contrastsCI confint()
scale1_contrastsCI
#you may have a specific planned contrast to run. Here let's imagine you want to compare condition 2 to all of the other 3 conditions in one comparison.
<- ezStats(data = data_scalescomputed,
scale1_plannedcontraststats dv = scale1_index,
wid = participantid,
between = c("condition1234"))
scale1_plannedcontraststats
= c(0,1,0,0,0,1,0,0)
cond2v134 = c(1,0,1,1,1,0,1,1)
cond134v2
<- contrast(scale1_emm, method = list("condition 1 vs all the rest" = cond2v134 - cond134v2))
scale1_plannedcontrasts
scale1_plannedcontrasts
Let’s make a figure!
%>%
data_scalescomputed ggplot(aes(x = condition1234, y = scale1_index, fill = demographicscateg)) +
geom_boxplot() +
theme_light()
%>%
data_scalescomputed ggplot(aes(x = condition1234, y = scale1_index)) +
geom_boxplot() +
theme_light()