---
title: "VPT - Difference of Means"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{vpt_diff_of_means}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(splithalfr)
```
This vignette describes a scoring method similar to [Mogg and Bradley (1999)](https://doi.org/10.1080/026999399379050);
difference of mean reaction times (RTs) between conditions with probe-at-test and probe-at-control, for correct responses, after removing RTs below 200 ms and above 520 ms, on Visual Probe Task data.
# Dataset
Load the included VPT dataset and inspect its documentation.
```
data("ds_vpt", package = "splithalfr")
?ds_vpt
```
## Relevant variables
The columns used in this example are:
* `UserID`, which identifies participants
* `block_type`, in order to select assessment blocks only
* `patt`, in order to compare trials in which the probe is at the test or at the control stimulus
* `response`, in order to select correct responses only
* `rt`, in order to drop RTs outside of the range [200, 520] and calculate means per level of patt
* `thor`, which is the horizontal position of test stimulus
* `keep`, which is whether probe was superimposed on the stimuli or replaced stimuli
## Data preparation
Only select trials from assessment blocks
```
ds_vpt <- subset(ds_vpt, block_type == "assess")
```
## Counterbalancing
The variables `patt`, `thor`, and `keep` were counterbalanced. Below we illustrate this for the first participant.
```
ds_1 <- subset(ds_vpt, UserID == 1)
table(ds_1$patt, ds_1$thor, ds_1$keep)
```
# Scoring the VPT
## Scoring function
The scoring function calculates the score of a single participant as follows:
1. select only correct responses
2. drop responses with RTs outside of the range [200, 520]
3. calculate the mean RT of remaining responses
```
fn_score <- function (ds) {
ds_keep <- ds[ds$response == 1 & ds$rt >= 200 & ds$rt <= 520, ]
rt_yes <- mean(ds_keep[ds_keep$patt == "yes", ]$rt)
rt_no <- mean(ds_keep[ds_keep$patt == "no", ]$rt)
return (rt_no - rt_yes)
}
```
## Scoring a single participant
Let's calculate the VPT score for the participant with UserID 23. NB - This score has also been calculated manually via Excel in the splithalfr repository.
```
fn_score(subset(ds_vpt, UserID == 23))
```
## Scoring all participants
To calculate the VPT score for each participant, we will use R's native `by` function and convert the result to a data frame.
```
scores <- by(
ds_vpt,
ds_vpt$UserID,
fn_score
)
data.frame(
UserID = names(scores),
score = as.vector(scores)
)
```
# Estimating split-half reliability
## Calculating split scores
To calculate split-half scores for each participant, use the function `by_split`. The first three arguments of this function are the same as for `by`. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. The trial properties `patt`, `thor` and `keep` were counterbalanced in the VPT design. We will stratify splits by these trial properties. See the vignette on splitting methods for more ways to split the data.
The `by_split` function returns a data frame with the following columns:
* `participant`, which identifies participants
* `replication`, which counts replications
* `score_1` and `score_2`, which are the scores calculated for each of the split datasets
*Calculating the split scores may take a while. By default, `by_split` uses all available CPU cores, but no progress bar is displayed. Setting `ncores = 1` will display a progress bar, but processing will be slower.*
```
split_scores <- by_split(
ds_vpt,
ds_vpt$UserID,
fn_score,
replications = 1000,
stratification = paste(ds_vpt$patt, ds_vpt$thor, ds_vpt$keep)
)
```
## Calculating reliability coefficients
Next, the output of `by_split` can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (`spearman_brown`), Flanagan-Rulon (`flanagan_rulon`), Angoff-Feldt (`angoff_feldt`), and Intraclass Correlation (`short_icc`) coefficients. Each of these coefficient functions can be used with `split_coef` to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple `mean`. A bias-corrected and accelerated bootstrap confidence interval can be calculated via `split_ci`. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete.
```
# Spearman-Brown adjusted Pearson correlations per replication
coefs <- split_coefs(split_scores, spearman_brown)
# Distribution of coefficients
hist(coefs)
# Mean of coefficients
mean(coefs)
# Confidence interval of coefficients
split_ci(split_scores, spearman_brown)
```