Bayesian Hierarchical Measurement Model for Repetition Learning

General Information:

Generally, this model allows to estimate memorization performance (as expressed in the probability of providing a correct answer to a memory test) as a function of repeated exposure to the same memory stimuli (i.e., a repetition learning effect in, for instance, a classical Hebb paradigm). This model was developed for modeling repetition learning effects in the Hebb paradigm. In the Hebb paradigm, participants are presented with several memory sets (like lists of consonants or visuospatial arrays) for an immediate working memory test, and on each trial, working memory performance is assessed. Throughout the experiment, one memory set, the repeated Hebb set, is presented repeatedly, allowing participants to learn and improve in memorization performance on that memory set. The presentation of the repeated Hebb set is typically interleaved with a varying number of unrepeated Filler sets, which are new memory sets that have not been shown before. This results in several training cycles, in which the repeated memory set is shown once within each cycle together with several unrepeated memory sets. Although developed for this specific paradigm, the modeling approach can be applied to any sort of task, in which learning effects for repeated stimuli or categories are assessed as a function of practice in comparison to a baseline condition.

The model allows to asses learning effects on the level of individual participants and is based on the theoretical assumption that the occurrence of repetition learning effects depends on the explicit recognition of what is repeated. To account for this, the model is set up as a mixture model, which discriminates between participants that show a learning effect (after recognizing a repetition), and those who don't (because they failed to recognize a repetition). For those participants who produced a learning effect, 4 parameters are estimated: 1) a baseline parameter, which reflects performance on the memory test prior to the onset of any learning effects, 2) a parameter for the onset of the learning effect, 3) a parameter for the rate of the increase on memory performance once a participant started to learn, and 4) a parameter for the asymptote of memory performance that is approached.

Details on the Model:

To account for the fact that some participants produce repetition learning effects and some do not, the model assumes that the observed data could have been produced by one of two different generative processes: a learning process in which memory for the repeated memory set improves over trials, or a non-learning process in which no learning effect for the repeated memory set is produced. For both components of the model, the model assumes that the number of correctly recalled items on each trial follows a binomial likelihood with latent parameter $\theta$, leading to the two following likelihood components:

$$ Non-Learning \sim Binomial(successes | trials_{j,} \theta_{non-learning_{ij}}) $$ $$ Learning \sim Binomial(successes | trials_{j,} \theta_{learning_{ij}}) $$

In both models, the latent parameter $\theta_{i,j}$ reflects the ability of the ith participant to recall the current memory set in trial j. The learning and the non-learning process differ in how $\theta_{i,j}$ is modeled.

For the non-learning process, the model assumes that the ability of recalling a memory set does not differ between repeated and unrepeated memory sets throughout the experiment. Therefore, $\theta_{non-learning}$ is modeled as a linear function of time without distinguishing between repeated and unrepeated memory sets. In a classical Hebb experiment for which the model was developed, each training cycle includes the presentation of one repetition of the repeated set and multiple unrepeated Filler sets. The linear effect of the training cycle allows the model to account for slight changes in memory performance throughout the experiment which might be caused by fatigue or practice effects. To make sure that $\theta_{non-learning}$ is a probability, a logit-link function is applied. This leads to the following model equation for $\theta_{non-learning}$:

$$ \theta_{non-learning} = logit^{-1}(\alpha + \beta_{cycle_{i}} * cycle_i)$$

with $\alpha$ reflecting the intercept of the ith participant and &\beta_{cycle} reflecting the slope of the linear effect of the training cycle for the ith participant in trial j.

For the learning process, the model assumes that, at a certain point in the experiment, a person’s ability to recall the repeated memory set improves over training cycles as compared to the ability to recall an unrepeated Filler set. For this, the non-learning model is used as a baseline to describe participants’ memory performance in unrepeated Filler sets and before learning had started. On top of this, a linear term is added which allows performance on the repeated memory set to increase after an estimated onset point of learning. Again, a logit-link is applied to make sure that $\theta_{learning}$ is a probability.

$$ \theta_{non-learning} = logit^{-1}(\alpha + \beta_{cycle_{i}} * cycle_{j} + setType_{j} *\beta_{learning_{i}} * max(0, cycle_{j} - \beta_{onset_{i}})$$

In this equation, setType codes if the presented memory set in trial j was an unrepeated set (= 0) or the repeated set (= 1), and $\beta_{learning_{i}}$ reflects the rate of the learning effect of participant i on the repeated memory set. The “max” function allows to offset the onset of learning on the time scale of the experiment so that no learning benefit is added before the onset of learning is reached. The onset of learning for each participant i is reflected in $\beta_{onset_{i}}$.

In the formula stated above, the model predicts that performance on the repeated memory set approaches perfection once the learning process has started (upper asymptote of 1). Although this is in principle plausible once the repeated set was learned, the hard constraint on the upper asymptote can still cause sampling problems when participants make mistakes on the repeated set after learning it (e.g., by clicking the wrong button or having an attentional lapse). To account for this sort of errors, the constraint on the upper asymptote is loosened by introducing an additional parameter which estimates the upper asymptote in the boundaries between 0.85 and 1:

$$ \theta_{non-learning} = \beta_{asym_{i}} * logit^{-1}(\alpha + \beta_{cycle_{i}} * cycle_{j} + setType_{j} *\beta_{learning_{i}} * max(0, cycle_{j} - \beta_{onset_{i}})$$

Given the non-learning and the learning part of the model, the likelihood components for each participant under both models can be computed as:

$$ L_{non-learning_{i}} = \prod_{j = 1}^{ntrials} Binomial(successes_{j} | trials_{j}, \theta_{non-learning_{i}}) $$

$$ L_{learning_{i}} = \prod_{j = 1}^{ntrials} Binomial(successes_{j} | trials_{j}, \theta_{learning_{i}}) $$

Given the two likelihood components for each participant, the mixture likelihood with mixing proportion $\lambda$ on the participant level can then be calculate for each participant i as:

$$ L_{i} \sim \lambda * L_{learning} + (1-\lambda) * L_{non-learning}$$

The mixture proportion $\lambda$ is applied on the participant level and therefore indicates the proportion of learning participants in the full sample. The posterior probability of belonging to the learning or the non-learning process for a single participant can be recovered by the following equation:

$$ p_{learning} = \frac{L_{learning_{i}} * \lambda}{L_{learning_{i}} * \lambda + L_{non-learning_{i}} * (1-\lambda)} $$

For fitting the model, all data variables should be scaled prior to modeling so that all parameters could be fitted roughly on unit scale. The indicator variable for the set type should be dummy coded with unrepeated = 0 and repeated = 1. The training cycle variable should be scaled into a range of [0, 1]. Consequently, the parameter for the onset of learning $\beta_{onset}$ is also restricted to the range of [0, 1]. Furthermore, the position of the onset point is also restricted to a maximum of the second to last training cycle to ensure identifiability of the two model components. Otherwise, the learning model could, in principle, mimic the non-learning model by setting the onset of the learning curve to the very last data point. A similar issue applies to the parameter for the learning rate $\beta_{learning}$. This parameter needs to be larger than 0 to ensure identifiability of the two model components. Here, the lower boundary of this parameter was set to 3, which is an arbitrary choice based on visual explorations of the implemented model function. Additionally, we set an upper boundary of 500 to this parameter which roughly corresponds to the point at which the learning curve approaches a step function. All bounded parameters were logit-transformed to allow the sampler to operate more efficiently in an unbounded space and to estimate parameters roughly on unit scales.

Bayesian Hierarchical Measurement Model for Repetition Learning

Statistical Models

General Information:

Details on the Model:

Accuracy

Mixture Proportion

Regression Weights

Regression Weights

Regression Weights

Regression Weights

Regression Weights