Pessimistic Q-learning model

The model is an adaptation of a standard Q-learning where the assumption that agents will always make the reward-maximizing action is replaced by a weighting scheme that the agent might also make the reward-minimizing decision. The pessimistic Q-learning model is used to model characteristics of anxious behavior.

Modeling frameworks
Reinforcement Learning

A type of machine learning where an agent learns to make decisions by receiving rewards or penalties.

More details
How does the model work

\begin{equation} Q(s,a) = \sum_{s'} p(s', r \mid s, a) \left( r + \gamma \left[ c \max(Q(s', a')) + (1-c) \min Q(s', a') \right] \right) \end{equation}

where p(s', r | s, a) represents the probability of transitioning from state s to the next state s' by taking action a; r is the immediate reward obtained by taking action a in state s; and gamma represents the discount rate that determines how much immediate rewards are valued over future rewards. If gamma is zero, the agent values only immediate rewards. Here c is the pessism. The c parameter takes on values between 0 and 1, where 1 indicates that the person updates their beliefs based on the assumption that it will always make reward-maximizing actions in the future and value of 0 indicates that the person believes that it will always make reward-minimizing actions.

Solving Q(s,a) is done by value iteration .

Publication
Zorowitz, S., Momennejad, I., & Daw, N. D. (2020). Anxiety, Avoidance, and Sequential Evaluation. Computational Psychiatry, 4(0), 1. https://doi.org/10.1162/cpsy_a_00026
Psychology disciplines
Clinical Psychology
DOI
Programming language

Python

Code repository url

https://github.com/ndawlab/seqanx