---
title: "IAT D-Score Repeat Incorrect Responses"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{iat_dscore_ri}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
message = FALSE,
warning = FALSE,
comment = "#>"
)
```
```{r setup, message = FALSE, warning = FALSE}
library(splithalfr)
library(dplyr)
```
This vignette describes a scoring method introduced by [Greenwald, Nosek, and Banaji (2003)](https://pubmed.ncbi.nlm.nih.gov/1291656); the improved d-score for Implicit Association Task (IATs) that require a correct response in order to continue to the next trial. This version of the d-score algorithm adds up all response times of all responses per trial. As this algorithm also specifies which participants to keep and to drop, functions from the [dplyr package](https://dplyr.tidyverse.org/) will be used to produce relevant summary statistics. Note that this vignette is more advanced that the others included in the `splithalfr` package, so it is not recommended as a first introduction on to how to use the splithalfr.
# Dataset
Load the included IAT dataset and inspect its documentation.
```
data("ds_iat", package = "splithalfr")
?ds_iat
```
## Relevant variables
The columns used in this example are:
* participation_id, which identifies participants
* block_type, which specifies IAT blocks relevant to calculate the d-score
* attempt, in order to add RTs for trials
* response, in order to select correct responses only
* rt, in order to drop RTs outside of the range [200, 520] and calculate means per level of patt
* cat, which is the category each stimulus belonged to
## Preprocessing
The improved d-score algorithm specifies that participants whose RTs for over 10% of reponses are below 300 ms should be dropped. The R-script below identifies such participants.
```
ds_summary <- ds_iat %>%
dplyr::group_by(participation_id) %>%
dplyr::summarize(
too_fast = sum(rt < 300) / dplyr::n() > 0.1,
)
```
One participant (participation_id 29) meets this exclusion criterion. Below, we remove this participant from the dataset.
```
ds_iat <- ds_iat[
!(ds_iat$participation_id %in%
ds_summary[ds_summary$too_fast,]$participation_id),
]
```
Next, delete any attempts with RTs > 10,000 ms. These do not exist in this IAT because a response window of 1500 ms was used, but the R-script is still added below for demonstration purposes.
```
ds_iat <- ds_iat[ds_iat$rt <= 10000, ]
```
Keep only data from the combination blocks.
```
ds_iat <- ds_iat[
ds_iat$block_type %in%
c("tar1att1_1", "tar1att2_1", "tar1att1_2", "tar1att2_2"),
]
```
Finally, RTs for each participant, block, and trial are summed. The block_type and cat variables are also included, since they are used in further processing steps below.
```
ds_iat <- ds_iat %>%
dplyr::group_by(participation_id, block, trial) %>%
summarise(
block_type = first(block_type),
cat = first(cat),
rt = sum(rt)
)
```
## Counterbalancing
The variables `block_type` and `cat` were counterbalanced. Below we illustrate this for the first participant.
```
ds_1 <- subset(ds_iat, participation_id == 1)
table(ds_1$block_type, ds_1$cat)
```
# Scoring the IAT
## Scoring function
The score function receives these four data frames from a single participant. For both the pair of practice and test blocks, the following 'block score' is calculated:
1. Mean RT of target 1 with attribute 1 is calculated
2. Mean RT of target 1 with attribute 2 is calculated
3. The difference in mean RTs of both blocks is divided by the inclusive standard deviation (SD)
The d-score is the mean of the block scores for practice and test blocks.
```
fn_score <- function(ds) {
fn_block <- function(ds_tar1att1, ds_tar1att2) {
m_tar1att1 <- mean(ds_tar1att1$rt)
m_tar1att2 <- mean(ds_tar1att2$rt)
inclusive_sd <- sd(c(ds_tar1att1$rt, ds_tar1att2$rt))
return ((m_tar1att2 - m_tar1att1) / inclusive_sd)
}
d1 <- fn_block(
ds[ds$block_type == "tar1att1_1", ],
ds[ds$block_type == "tar1att2_1", ]
)
d2 <- fn_block(
ds[ds$block_type == "tar1att1_2", ],
ds[ds$block_type == "tar1att2_2", ]
)
return (mean(c(d1, d2)))
}
```
## Scoring a single participant
Let's calculate the IAT score for the participant with UserID 1. NB - This score has also been calculated manually via Excel in the splithalfr repository.
```
fn_score(subset(ds_iat, participation_id == 1))
```
## Scoring all participants
To calculate the IAT score for each participant, we will use R's native `by` function and convert the result to a data frame.
```
scores <- by(
ds_iat,
ds_iat$participation_id,
fn_score
)
data.frame(
participation_id = names(scores),
score = as.vector(scores)
)
```
# Estimating split-half reliability
## Calculating split scores
To calculate split-half scores for each participant, use the function `by_split`. The first three arguments of this function are the same as for `by`. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. The trial properties `block_type` and `cat` were counterbalanced in the IAT design. We will stratify splits by these trial properties. See the vignette on splitting methods for more ways to split the data.
The `by_split` function returns a data frame with the following columns:
* `participant`, which identifies participants
* `replication`, which counts replications
* `score_1` and `score_2`, which are the scores calculated for each of the split datasets
*Calculating the split scores may take a while. By default, `by_split` uses all available CPU cores, but no progress bar is displayed. Setting `ncores = 1` will display a progress bar, but processing will be slower.*
```
split_scores <- by_split(
ds_iat,
ds_iat$participation_id,
fn_score,
replications = 1000,
stratification = paste(ds_iat$block_type, ds_iat$cat)
)
```
## Calculating reliability coefficients
Next, the output of `by_split` can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (`spearman_brown`), Flanagan-Rulon (`flanagan_rulon`), Angoff-Feldt (`angoff_feldt`), and Intraclass Correlation (`short_icc`) coefficients. Each of these coefficient functions can be used with `split_coef` to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple `mean`. A bias-corrected and accelerated bootstrap confidence interval can be calculated via `split_ci`. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete.
```
# Spearman-Brown adjusted Pearson correlations per replication
coefs <- split_coefs(split_scores, spearman_brown)
# Distribution of coefficients
hist(coefs)
# Mean of coefficients
mean(coefs)
# Confidence interval of coefficients
split_ci(split_scores, spearman_brown)
```