SDT Bayesian Models

The present repository contains codes for data simulation and analysis for three signal detection theory (SDT) analyses assessed in a Bayesian workflow manner.

These resources were of great help:

1) https://github.com/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3

2) https://mvuorre.github.io/posts/2017-10-09-bayesian-estimation-of-signal-detection-theory-models/#evsdt-for-multiple-participants).

Generative Model

The model is inspired by Lee and Wagenmakers (2013) approach.Two functions are relevant for this approach. These are the Gaussian cumulative density function referred as Φ, which relies on the error function erf:

erf(z) = (2÷√π)∫₀^zexp(-t²) dt, t = 0...z

Φ(x) = 0.5(1 + erf(x/√2))
With these a Bayesian SDT model can be built in the following way:
d ~ Normal(0, 1)

c ~ Normal(0, 1)

H = Φ(d/2 - c)

F = Φ(-d/2 - c)

y_H ~ Binomial(H , s)

y_F ~ Binomial(F, n)
We develop a generative model that simulates data with correlated d' and c parameters, namely with an association between sensitivity and bias (Lynn and Barret, 2014). So, the generative model in Python code looks like this:

import numpy as np
from scipy import stats
cdf = stats.norm.cdf #same as Φ

np.random.seed(33)

g = 2 #number of groups (conditions)
p = 100 #number of participants

# simulate experiment where sensitivity (d') is correlated with bias (c)
# as d' increases c decreases  
rho_high = -0.05 #correlation for high sensitivity condition
d_std = 0.5 #d' standard deviation
c_std = 0.5 #c standard deviation
mean = [2, 0.1] #d' mean (2) and c mean (0.5), i.e. high sensitivity and low bias
cov = [[d_std**2, rho_high * d_std * c_std],
       [rho_high * d_std * c_std, c_std**2]] #covariance with correlation
d_high, c_high = np.random.multivariate_normal(mean, cov, size=p).T #generate correlated variables via an mv normal
correlation_high = np.corrcoef(d_high, c_high)[0, 1]

rho_low = -0.6
d_std = 0.5
c_std = 0.5
mean = [1, 0.5]
cov = [[d_std**2, rho_low * d_std * c_std],
       [rho_low * d_std * c_std, c_std**2]]
d_low, c_low = np.random.multivariate_normal(mean, cov, size=p).T
correlation_low = np.corrcoef(d_low, c_low)[0, 1]

sig = np.array([np.repeat(25, p), np.repeat(25, p)]) #fixed number of signal trials (25)
noi = np.array([np.repeat(75, p), np.repeat(75, p)]) #fixed number of noise trials (75)

d_prime = np.array([d_high, d_low])
c_bias = np.array([c_high, c_low])

hits = np.random.binomial(sig, cdf(0.5*d_prime - c_bias)) #derive hits from d' and c
fas = np.random.binomial(noi, cdf(-0.5*d_prime - c_bias)) #derive false alarms from d' and c

Image below shows the summary of simulations as receiving operator characteristics (ROC) curves, with area under curves (AUC), and density plots for signal (hits) and noise (false alarms) cdfs.

Base Model

We first implement a Base Model with "fixed" parameters across group and participants. Where groups are group 1 (high sensitivity) and group 2 (low sensitivity).

d_g,p ~ Normal(0, 1)

c_g,p ~ Normal(0, 1)

h_g,p = Φ(d/2 - c)

f_g,p = Φ(-d/2 - c)

y_H ~ Binomial(H_g,p , s)

y_F ~ Binomial(F_g,p , n)

Where, g = groups (1...G, G=2), p = participants (1...P, P = 100), s = misses + hits, and n = correct rejections (CR) + false alarms (FA). Observations for y_H are total hits and observations for y_F are total FAs. Observations are simulated responses from the generative model presented above.

Model 1

We extend the Base Model via a non-centred parametrisation of Gaussian and half-Gaussian distributions.

d_l ~ Normal(0, 1)

d_{z; g,p} ~ Normal(0, 1)

d_s ~ HalfNormal(1)

d_g,p = d_l + d_zd_s

c_l ~ Normal(0, 1)

c_{z; g,p} ~ Normal(0, 1)

c_s ~ HalfNormal(1)

c_g,p = c_l + c_zc_s

H_g,p = Φ(d/2 - c)

H_g,p = Φ(-d/2 - c)

y_H ~ Binomial(H_g,p , s)

y_F ~ Binomial(F_g,p , n)

Besides the reparametrized parameters, all other elements in the model remain as in Model 1.

Model 2 (LKJ Model)

Finally, we build up a model with LKJ (see Lewandowski et al, 2009) correlation priors to assess the association between sensitivity and bias.

ρ₁, ρ₂ ~ LKJCorr(2, 2)

σ_d1, σ_c1, σ_d2, σ_c2 ~ HalfNormal(1)

μ_d1, μ_c1, μ_d2, μ_c2 ~ Normal(0, 1)

Σ₁ = [[σ²_d1, σ_d1σ_c1ρ₁], [σ_d1σ_c1ρ₁, σ²_c1]]

Σ₂ = [[σ²_d2, σ_d2σ_c2ρ₂], [σ_d2σ_c2ρ₂, σ²_c2]]

d₁, c₁ ~ MvNormal([μ_d1, μ_c1], Σ₁)

d₂, c₂ ~ MvNormal([μ_d2, μ_c2], Σ₂)

d = [d₁, d₂]

c = [c₁, c₂]

H_g,p = Φ(d/2 - c)

F_g,p = Φ(-d/2 - c)

y_H ~ Binomial(H_g,p , s)

y_F ~ Binomial(F_g,p , n)

Where LKJCorr are priors for correlations in the covariance matrices Σ₁ and Σ₂ and now d and c parameters are multivariate Gaussian distributions with Gaussian means and aforementioned Σ covariances.

Results

We run prior predictive checks (see prior_predictions folder), which show that priors have good coverage (see image below). We do not calibrate priors as we want to test the fixed priors from the Base Model and see how much the priors from models 1 and 2 can adapt to the "true" values of simulated d' and c.

After this, we sampled all models using Markov chain Monte Carlo (MCMC) No U-turn sampling (NUTS) with 1000 tuning steps, 1000 samples, 4 chains. You can see model_convergence folder for all details (i.e. r_hats, ess, etc.), but image below provides a general summary of posteriors and rank plots, indicating good convergence.

Next we perform a precision analysis, to see whether precision is good enough after 100 samples (i.e. 100 per group). All models reached a reasonable precision of around 0.1.

After this, we assessed posteriors via scatter plots and Hellinger H² distance measure (see Pardo, 2018), to see how much the posterior distributions of d' and c differed from their observed (simulated) counterparts. Image below summarises these results.

ROC curves plots with AUC measures also indicate better approximations of models 2 and 3.

Additionally, we performed posterior predictive checks, which indicate similar results. In images below, SDI refers to the Sorensen-Dice Index (see Costa, 2021), which measures similarity between variables.

Finally, we conducted a models comparison after using re-LOO on each model (see Vethari et al., 2017, see also: https://python.arviz.org/en/stable/user_guide/pymc_refitting.html). Comparisons indicate higher expected log predictive densities for Model 2.

Conclusion

Results indicate that the varying (multilevel) model is an improvement over the base model, but it has minor convergence issues. Model 2, with LKJ priors does not only provides the same multilevel parametric form of Model 1 (though centred), but also includes LKJ priors capable of capturing correlations between parameters (d an c) and converges better. Further, Model 2, which is basically just an extension of previous models (i.e. only changes parametrisation, but contains no new variables), approximates simulated measures as well as Model 1 and shows better predictive accuracy.

References

Costa, L. J. (2021). Further Generalizations of the Jaccard Index. https://doi.org/10.48550/arxiv.2110.09619

Lee, M. D., & Eric-Jan Wagenmakers. (2013). Bayesian Cognitive Modeling. Cambridge University Press.

Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001. https://doi.org/10.1016/j.jmva.2009.04.008

Lynn, S. K., & Barrett, L. F. (2014). “Utilizing” Signal Detection Theory. Psychological Science, 25(9), 1663–1673. https://doi.org/10.1177/0956797614541991

Pardo, L. (2018). Statistical Inference Based on Divergence Measures. CRC Press.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
01_simulations		01_simulations
02_prior_predictions		02_prior_predictions
03_precision_analysis		03_precision_analysis
04_model_convergence		04_model_convergence
05_model_posteriors		05_model_posteriors
06_posterior_predictions		06_posterior_predictions
07_model_comparison		07_model_comparison
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDT Bayesian Models

Generative Model

Base Model

Model 1

Model 2 (LKJ Model)

Results

Conclusion

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SDT Bayesian Models

Generative Model

Base Model

Model 1

Model 2 (LKJ Model)

Results

Conclusion

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages