--- title: "VPT - Difference of Means" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{vpt_diff_of_means} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(splithalfr) ``` This vignette describes a scoring method similar to [Mogg and Bradley (1999)](https://doi.org/10.1080/026999399379050); difference of mean reaction times (RTs) between conditions with probe-at-test and probe-at-control, for correct responses, after removing RTs below 200 ms and above 520 ms, on Visual Probe Task data.
# Dataset Load the included VPT dataset and inspect its documentation. ``` data("ds_vpt", package = "splithalfr") ?ds_vpt ``` ## Relevant variables The columns used in this example are: * `UserID`, which identifies participants * `block_type`, in order to select assessment blocks only * `patt`, in order to compare trials in which the probe is at the test or at the control stimulus * `response`, in order to select correct responses only * `rt`, in order to drop RTs outside of the range [200, 520] and calculate means per level of patt * `thor`, which is the horizontal position of test stimulus * `keep`, which is whether probe was superimposed on the stimuli or replaced stimuli ## Data preparation Only select trials from assessment blocks ``` ds_vpt <- subset(ds_vpt, block_type == "assess") ``` ## Counterbalancing The variables `patt`, `thor`, and `keep` were counterbalanced. Below we illustrate this for the first participant. ``` ds_1 <- subset(ds_vpt, UserID == 1) table(ds_1$patt, ds_1$thor, ds_1$keep) ```
# Scoring the VPT ## Scoring function The scoring function calculates the score of a single participant as follows: 1. select only correct responses 2. drop responses with RTs outside of the range [200, 520] 3. calculate the mean RT of remaining responses ``` fn_score <- function (ds) { ds_keep <- ds[ds$response == 1 & ds$rt >= 200 & ds$rt <= 520, ] rt_yes <- mean(ds_keep[ds_keep$patt == "yes", ]$rt) rt_no <- mean(ds_keep[ds_keep$patt == "no", ]$rt) return (rt_no - rt_yes) } ``` ## Scoring a single participant Let's calculate the VPT score for the participant with UserID 23. NB - This score has also been calculated manually via Excel in the splithalfr repository. ``` fn_score(subset(ds_vpt, UserID == 23)) ``` ## Scoring all participants To calculate the VPT score for each participant, we will use R's native `by` function and convert the result to a data frame. ``` scores <- by( ds_vpt, ds_vpt$UserID, fn_score ) data.frame( UserID = names(scores), score = as.vector(scores) ) ```
# Estimating split-half reliability ## Calculating split scores To calculate split-half scores for each participant, use the function `by_split`. The first three arguments of this function are the same as for `by`. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. The trial properties `patt`, `thor` and `keep` were counterbalanced in the VPT design. We will stratify splits by these trial properties. See the vignette on splitting methods for more ways to split the data. The `by_split` function returns a data frame with the following columns: * `participant`, which identifies participants * `replication`, which counts replications * `score_1` and `score_2`, which are the scores calculated for each of the split datasets *Calculating the split scores may take a while. By default, `by_split` uses all available CPU cores, but no progress bar is displayed. Setting `ncores = 1` will display a progress bar, but processing will be slower.* ``` split_scores <- by_split( ds_vpt, ds_vpt$UserID, fn_score, replications = 1000, stratification = paste(ds_vpt$patt, ds_vpt$thor, ds_vpt$keep) ) ``` ## Calculating reliability coefficients Next, the output of `by_split` can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (`spearman_brown`), Flanagan-Rulon (`flanagan_rulon`), Angoff-Feldt (`angoff_feldt`), and Intraclass Correlation (`short_icc`) coefficients. Each of these coefficient functions can be used with `split_coef` to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple `mean`. A bias-corrected and accelerated bootstrap confidence interval can be calculated via `split_ci`. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete. ``` # Spearman-Brown adjusted Pearson correlations per replication coefs <- split_coefs(split_scores, spearman_brown) # Distribution of coefficients hist(coefs) # Mean of coefficients mean(coefs) # Confidence interval of coefficients split_ci(split_scores, spearman_brown) ```