--- title: "Splitting Methods" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{splitting_methods} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(splithalfr) ``` This vignette demonstrates the methods of splitting data that are supported by the `splithalfr`. Each splitting method is illustrated by calling `by_split` with the right arguments, printing to the terminal what data is in each of the two parts produced by a split. For a comprehensive review of each splitting method, see [Pronk et al. (2021)](https://doi.org/10.3758/s13423-021-01948-3). # Example data We'll use this example dataset with eight trials of one participant, each trial having a condition and rt variable. ``` ds <- data.frame( participant = rep(1, 8), condition = rep(c("a", "b"), each = 4), rt = 100 * 1 : 8 ) ``` # First-second splitting First-second splitting assigns trials of the first half of rows to one part and trials of the second half of rows to the other ([Green et al., 2016](https://doi.org/10.3758/s13423-015-0968-3); [Webb, Shavelson, & Haertel, 1996](https://doi.org/10.1016/S0169-7161(06)26004-8); [Williams & Kaufmann, 2012](https://doi.org/10.1016/j.jesp.2012.03.001)). For this splitting method, set `method` to `first_second`. ``` dummy = by_split( ds, ds$participant, method = "first_second", function(ds) { print(ds); }, ncores = 1, verbose = F ) ``` # Odd-even splitting Odd-even splitting assigns trials with an odd row number to one part and trials with an even row number to the other ([Green et al., 2016](https://doi.org/10.3758/s13423-015-0968-3); [Webb, Shavelson, & Haertel, 1996](https://doi.org/10.1016/S0169-7161(06)26004-8); [Williams & Kaufmann, 2012](https://doi.org/10.1016/j.jesp.2012.03.001)). For this splitting method, set `method` to `odd_even`. ``` dummy = by_split( ds, ds$participant, method = "odd_even", function(ds) { print(ds); }, ncores = 1, verbose = F ) ``` # Permutated splitting Permutated splitting is also known as random splitting ([Kopp, Lange, & Steinke, 2021](https://doi.org/10.1177/1073191119866257)), bootstrapped splitting ([Parsons, Kruijt, & Fox, 2019](https://doi.org/10.1177/2515245919879695)) and random sample of split halves ([Williams & Kaufmann, 2012](https://doi.org/10.1016/j.jesp.2012.03.001)). It assigns trials to each part via random sampling without replacement. This splitting method is the default, but you can make it explicit by setting `method` to `random`. In practice, random splits are averaged over many replications, but for illustration we're only printing one. ``` dummy = by_split( ds, ds$participant, method = "random", replications = 1, function(ds) { print(ds); }, ncores = 1, verbose = F ) ``` # Monte Carlo splitting Monte Carlo splitting assigns trials to each part by sampling with replacement ([Williams & Kaufmann, 2012](https://doi.org/10.1016/j.jesp.2012.03.001)). For constructing parts that are of any length, use the `split_p` argument and set `replace` to `TRUE`. The example below constructs two parts of the same length as the original dataset by setting `split_p` to 1. ``` dummy = by_split( ds, ds$participant, method = "random", replace = TRUE, split_p = 1, replications = 1, function(ds) { print(ds); }, ncores = 1, verbose = F ) ``` # Stratified splitting If a split is stratified by a variable, then trials are separately assigned to each part for each level of that variable ([Green et al., 2016](https://doi.org/10.3758/s13423-015-0968-3)). For example, if splits are stratified by `ds$condition`, the trials with condition a and b are split separately. Stratification can be used in combination with any of the methods above. For illustration we combine it with first-second splitting ``` dummy = by_split( ds, ds$participant, method = "first_second", stratification = ds$condition, function(ds) { print(ds); }, ncores = 1, verbose = F ) ``` # Subsampled splitting In a subsampled split, a subset of the trials is randomly sampled without replacement and then split (see the supplementary materials of [Hedge, Powell, & Sumner, 2018](https://doi.org/10.3758/s13428-017-0935-1)). Sub-sampling only works well with splitting methods that uses random sampling (permutated and Monte Carlo). Since the sub-sampling procedure already randomizes the trials selected for splitting, splitting methods that assign trials to part based on their row number, such as first-second and odd-even, should give results that are similar to permutated splitting. Any stratifications are applied both to the sub-sampling and splitting. ``` dummy = by_split( ds, ds$participant, method = "random", stratification = ds$condition, subsample_p = 0.5, function(ds) { print(ds); }, ncores = 1, verbose = F ) ```