Active-Inference-in-R/SimpSimScript.R at main · StSchwerdtfeger/Active-Inference-in-R · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
    #####################################
    #-----------------------------------#
    # Simplified Simulation Script in R #
    #-----------------------------------#
    #####################################

    # Conversion of the Supplementary Code for: A Step-by-Step Tutorial on Active Inference
    # Modelling and its Application to Empirical Data

    # By: Ryan Smith, Karl J. Friston, Christopher J. Whyte

    # CONVERSION by Steffen Schwerdtfeger

    ## rng('suffle') in Matlab
    set.seed(Sys.time())

    ## Clear environment
    rm(list=ls())

    # This code simulates a single trial of the explore-exploit task introduced
    # in the active inference tutorial using a stripped down version of the
    # modelinversion scheme implemented in the spm_MDP_VB_X.m script.

    # Note that this implementation uses the marginal message passing scheme
    # described in (Parr et al., 2019), and will return very slightly
    # (negligably) different values than the spm_MDP_VB_X.m script in
    # simulation results.

    # Parr, T., Markovic, D., Kiebel, S., & Friston, K. J. (2019). Neuronal
    # message passing using Mean-field, Bethe, and Marginal approximations.
    # Scientific Reports, 9, 1889.

    #######################
    # Simulation Settings #
    #######################

    # To simulate the task when prior beliefs (d) are separated from the
    # generative process, set the 'Gen_model' variable directly
    # below to 1. To do so for priors (d), likelihoods (a), and habits (e),
    # set the 'Gen_model' variable to 2:

    Gen_model = 2 # as in the main tutorial code, many parameters
    # can be adjusted in the model setup, within the
    # explore_exploit_model function [[starting on line 450]]
    # (Matlab code). This includes, among others (similar to
    # in the main tutorial script):

    # prior beliefs about context (d): alter line 525f
    # beliefs about hint accuracy in the likelihood (a): alter lines 685f
    # to adjust habits (e), alter line 900f


    #############
    # Libraries #
    #############

    library("pracma")     # necessary for imagesc()
    library("logOfGamma") # for gammaln()

    #############
    # Functions #
    #############

    # Reg. Softmax function
    softmax <- function(par){             # THANKS TO: https://rpubs.com/FJRubio/softmax
      n.par <- length(par)                # EQUIVALENT: exp(result)/sum(exp(result))
      par1 <- sort(par, decreasing = TRUE)
      Lk <- par1[1]
      for (k in 1:(n.par-1)) {
        Lk <- max(par1[k+1], Lk) + log1p(exp(-abs(par1[k+1] - Lk)))
      }
      val <- exp(par - Lk)
      return(val)
    }

    # Alternative Softmax function, columnwise:
    softmaxCOL <- function(probs){
      if(is.null(dim(probs))) probs <- matrix(probs,ncol= length(probs))
      exp(probs)/apply(probs,1, function(x) sum(exp(x)))
    }

    # natural log that replaces zero values with very small values for
    # numerical reasons (as log(0) is not defined).
    nat_log = function (x) {
      x=log(x+exp(-16))
      return(x)
    }

    # Isfield function, Matlab-Style
    isfield  <- function(structure, field){
      any(field %in% names(structure) )
    }

    # Replication of the gradient function Matlab-Style
    # for calcualting the 1D numerical gradient.
    # Alternative use function with same name via library("pracma")
    gradient = function (x) {
      step = length(x)
      upLim  = step-1 # needed for differences on interior points

      # List for result
      result = vector("list",1*length(x))
      dim(result) = c(1,length(x))
      for (i in 1:step){
        if (i == 1){
          result[1] = (x[1+1]-x[1])/1  # Take forward differences on left...
        }
        if (i == step){                # ....... and right edges.
          result[i] = (x[step]-x[step-1])/1
        }
        for (i in 2:upLim) { # needed for differences on interior points
          result[i] = (x[i+1]-x[i-1])/2
        }
      }
      # Print
      print(result)
    } # END of function

    # Normalizing vectors. So far this seems to be enough in R
    # Compare with Matlab code for col_norm(B).
    col_norm = function(x){
      lapply(x,lapply, function(x)(t(t(x)/colSums(x))))
    }

    # This is a version without lapply for B_norm:
    col_normSIMP = function(x){t(t(x)/colSums(x))} # Without lapply for B

    # B_norm (This function is really messy, but it works; other attemps failed
    # it is not even possible to combine the two functions; its probably some
    # redundant transposing.
    B_norm = function(x){
      if (ncol(x) == 1){
        bb = x
        z  = sum(bb)        # create normalizing constant from sum of columns
        bb = bb/z           # divide columns by constant
        bb[is.nan(bb)] = 0  # replace NaN with zero
        b = bb
        return(b)
      }
      else {
        x = B_norm2(x)
        z=t(x)
        return(z)
      }
    }

    B_norm2 = function(y){
      x=col_normSIMP(t(y))
      x[is.nan(x)] = 0
      x
    }


    # GreaterZero is used to only keep those values in a list that are
    # greater than zero; refers to line 104 in the Matlab script:
    GreaterZero = function(x){
      TrueGreatZero = function(x){x>0}
      checkLogic = lapply(x,TrueGreatZero)
      trueISone = function(x){ x = x*1}
      checkNum = lapply(checkLogic,trueISone)
    } # End Function

    # dot product along dimension f
    md_dot = function (A,s,f){
      if (f == 1){
        B = t(A)%*%s
      }
      else if (f == 2){
        B = A%*%s
      }
      return(B)
    }

    # Select last element of a vector, or 1D matrix:
    last <- function(x) { return( x[[length(x)]] ) }

    # Replicates the nargin Matlab function in R (nargin in this form only for
    # 2 as max function inputs; needs adjustment to be generalized for any
    # input length (in case needed..). Full example: https://stackoverflow.com/questions/64422780/nargin-function-in-r-number-of-function-inputs
    # Note: has to be called inside a function. See NARGIN_TEST below for an
    # example.
    nargin <- function() {
      if(sys.nframe()<2) stop("must be called from inside a function")
      length(as.list(sys.call(-1)))-1
    }

    # NARGIN Example for max of 2 inputs
    NARGIN_TEST = function (x,y){
      if(nargin()==2) {
        z=x+1
      }
      else if (nargin()==1) {
        y=0
        z=x+y
      }
      else { # For no input
        y=0
        x=0
        z = y+x
      }
      return(z)
    }


    #################
    # SPM Functions #
    #################

    cell_md_dot = function(X, x,a_plain){
      for(m in 1:length(x)){
        if(length(x[[m]]) == ncol(X[[1]])){
          x[[m]] = t(x[[m]])
        }
      } # transpose x[[1]]
      for(i in 1:length(x)){
        if(ncol(X[[1]]) == ncol(x[[i]])){
          for(j in 1:length(X)){
            X[[j]] = t(t(X[[j]])*as.vector(x[[i]]))
          }
        }
        else if(ncol(X[[1]]) != ncol(x[[i]])){
          for(k in 1:length(X)){
            X[[k]] = X[[k]]*x[[i]][k]
          }
        }

      }
      # Sum over 3rd
      res=0
      for(n in 1:length(X)){
        res = res+X[[n]]
      }
      # Sum over 2nd
      Xout = as.matrix(rowSums(res))
      return(Xout)
    }

    cell_md_dotCOMB = function(X,x, a_plain){ # Works but probably needs improvement to be more flexible
      # NOTE: a_plain, i.e., a[[1 to 3]] needed for DIM: in Matlab a{1} still
      # entails information on the whole cell length, which is dropped in R
      # when naming e.g. X = a[[1]].

      # Convert X to column binded version, similar to G_epistemic_value
      Xarr = array(as.numeric(unlist(X)), c(nrow(X[[1]]),ncol(X[[1]]),length(X)))
      XarrDIM = Xarr
      # Initialize dimension
      DIM = (1:length(x)) + length(a) - length(x)

      # Compute dot product using recursive sums
      # To do so, we first do all the reshaping:
      for (d in 1:length(x)){ # Re-shape
        s = matrix(1, ndims(XarrDIM))
        s[[DIM[[d]]]] = length(x[[d]])
        s[is.na(s)] = 1
        # Reshape
        xResh = array(as.numeric(unlist(x[[d]])), c(s))
        for (i in 1:length(a_plain[[1]])){
          if(length(xResh[1,1,])==1){
            Xarr[,,i] = Xarr[,,i]*as.vector(xResh)
          }
          else {
            Xarr[,,i] = Xarr[,,i]*xResh[,,i]
          }
        }
        # Summing over seconda and third dimension (use sum(cell_md_dot()) for Comb
        if (DIM[[d]] == 1){
          #  Xarr = apply(Xarr, FUN=colSums, MARGIN =1)
        }
        if (DIM[[d]] == 2){
          Xarr = apply(Xarr, FUN=colSums, MARGIN =1)
          Xarr = array(as.numeric(unlist(t(Xarr))), c(ncol(Xarr),1, nrow(Xarr)))
        }
        else if (DIM[[d]] == 3){
          Xarr = apply(Xarr, FUN=rowSums, MARGIN =2)# correct for dim 3
        }
        #  else if (DIM[[d]] == 4){
        #
        # }
      }
      return(Xarr)
      # return(Xarr)
    } # End of function cell_md_dot

    spm_wnorm = function(X,CONV){ # Start
      if(CONV == FALSE){
        # This is a replication of the bsxfun function to subtract the
        # inverse of each column entry from the inverse of the sum of the
        # columns and then divide by 2.
        X   = lapply(X,"+", exp(-16))
        for(i in 1:length(X)){
          A = 1/colSums(X[[i]]) # Matlab:  1./sum(A,1)
          B = 1/X[[i]]
          X2 =  (A-B)/2         # Alternative? lapply(X2,"-",X3)
          X[[i]] = X2
        } # End of loop
        X = X # Necessary to assign the values obtained from the loop!
        return(X)
      } # End if CONV == FALSE
      else if (CONV == TRUE){
        # Convert to array
        Xdim = array(as.numeric(unlist(X)), c(nrow(X[[1]]),ncol(X[[1]]),length(X)))
        X = Xdim + exp(-16)
        X = (as.numeric(1/colSums(X))-(1/X))/2  # bsxfun in Matlab script
        return(X)
      }
    } # End of function


    # epistemic value term (Bayesian surprise) in expected free energy
    G_epistemic_value = function(X1,X2) {
      # Relist X1 Input:
      X1arr = list()
      for(i in 1:length(X1)){
        X1arr[[i]] = array(as.numeric(unlist(X1[[i]])), dim = c(nrow(X1[[i]][[1]]),(ncol(X1[[i]][[1]])*length(X1[[i]])) ))
      }
      # probability distribution over the hidden causes: i.e., Q(s)
      qx = outer(X2[[1]],X2[[2]]) # this is the outer product of the posterior over states
      # calculated with respect to itself
      qx = drop(qx)

      # accumulate expectation of entropy: i.e., E[lnP(o|s)]
      G     = 0
      qo    = 0

      find = which(qx > exp(-16)) # replicates the Matlab loop

      for (i in find){
        # probability over outcomes for this combination of causes
        po = matrix(c(1))
        for (g in 1:length(X1)){
          po = drop(po)
          po = outer(po,X1arr[[g]][,i]) # X1arr needed here
        }
        qo = qo + qx[[i]]*po
        G = G + qx[[i]]*as.vector(po)%*%nat_log(po)
      }
      # subtract entropy of expectations: i.e., E[lnQ(o)]
      G  = G - as.vector(qo)%*%nat_log(as.numeric(unlist(qo)))
      return(G)
    } # End of function G_epistemic_value


    spm_betaln = function(x){
      if (is.list(x)==FALSE){
        find=which(x!=0)
        l = list()
        for (k in 1:length(find)){
          l[[k]] = x[find[[k]]]
        } # End for k
        z = as.numeric(unlist(l))
        yinter = gammaln(z)
        yinter[yinter == "NaN"] = 0
        y = sum(as.numeric(yinter)) - gammaln(sum(z))
      } # End if is.list
      else{
        xarr = array(as.numeric(unlist(x)), c(nrow(x[[1]]), ncol(x[[1]]), length(x)))
        y = array(0, c(1, ncol(x[[1]]), length(x)))
        for (i in 1:length(x[[1]][1,])){
          for (j in 1:length(x)){
            y[[1,i,j]] = spm_betaln(as.vector(xarr[,i,j]))
          } # End for j
        } # End for i
      } # End else
      return(y)
    } # End of function spm_betaln

    spm_psi = function(x){
      # normalization of a probability transition rate matrix (columns)

      # for single matrix input:
      if(is.matrix(x)){
        x1=x
        xout=x
        for(i in 1:length(x)){
          x1[[i]] = psigamma(x[[i]])
        }
        x2 = sum(x)
        x2 = psigamma(x2)
        for(j in 1:length(x)){
          xout[j] = x1[[j]]-x2
        }
        return(xout)
      }
      else{
        # Set x as pre-dim list:
        x1=x
        xout=list()
        for(i in 1:length(x)){
          for(j in 1:nrow(x[[1]])){
            for(k in 1:ncol(x[[1]])){
              x1[[i]][j,k] = psigamma(x[[i]][j,k])
            } # End for k
          } # End for j
        } # End for i
        x2 = lapply(x,colSums)
        x2 = lapply(x2, psigamma)
        for(n in 1:length(x2)){
          xout[[n]] = t(t(x1[[n]])-as.vector(x2[[n]]))
        } # End for m
      }
      return(xout)
    } # End of function

    spm_KL_dir = function(q,p){
      # KL divergence between two Dirichlet distributions
      # Matlab formula similar to:  d = spm_betaln(p) - spm_betaln(q) - colSums((p - q).*spm_psi(q + 1/32))
      # Does not work in one row in R with lists.
      # Element wise multiplication:
      Add = q # just for dimensional purposes
      if(is.matrix(q)==FALSE){
        pMinq = list()
        for(i in 1:length(p)){
          pMinq[[i]] = p[[i]]-q[[i]]
        }
        qPlus = list()
        for(i in 1:length(q)){
          qPlus[[i]] = q[[i]] + as.numeric(1/32)
        }
        for(i in 1:length(pMinq)){
          for(k in 1:nrow(pMinq[[1]])){
            for(j in 1:ncol(pMinq[[1]])){
              Add[[i]][[k,j]] = as.numeric(pMinq[[i]][[k,j]]*spm_psi(qPlus)[[i]][[k,j]])
            }
          }
        }
        Add=lapply(Add,colSums)
        d = list()
        for(i in 1:length(spm_betaln(p)[1,1,])){
          d[[i]] = spm_betaln(p)[,,i] - spm_betaln(q)[,,i] - Add[[i]]
        }
        d = sum(unlist(d))
        return(d)
      }
      else if(is.matrix(q)){
        qPlus = q
        qPlus = q + as.numeric(1/32)
        pMinq = p-q
        Add = q
        Add = pMinq*spm_psi(qPlus)
        Add = sum(Add)

        d = list()
        d = spm_betaln(p) - spm_betaln(q) - Add
        return(d)
      }
      p  = rand(6,1) + 1
      q  = rand(6,1) + p
      q0 = sum(p)
      p0 = sum(q)
      KL = spm_betaln(p) - spm_betaln(q) + d*spm_psi(q)
      klinter = gammaln(q0) - sum(gammaln(q)) - gammaln(p0) + sum(gammaln(p))
      klinter2 =  d*as.numeric(unlist(lapply(spm_psi(q),"-",spm_psi(as.matrix(q0)))))
      kl = klinter + klinter
    } # End of function KL dir


    ############################################
    # Set up POMDP model structure as function #
    ############################################

    # Please note that the main tutorial script ('Step_by_Step_AI_Guide.m')
    # has more thorough descriptions of how to specify this generative
    # model and the other parameters that might be included. Below we
    # only describe the elements used to specify this specific model.
    # Also, unlike the main tutorial script which focuses on learning
    # initial state priors (d), this version also enables habits (priors
    # over policies; e) and separation of the generative process from the
    # generative model for the likelihood function (a).

    POMDP_model_structure = function(Gen_model){

      # Number of time points or 'epochs' within a trial: T
      # =========================================================================

      # Here, we specify 3 time points (T), in which the agent 1) starts
      # in a 'Start' state, 2) first moves to either a 'Hint' state or a
      # 'Choose Left' or 'Choose Right' slot machine state, and 3) either
      # moves from the Hint state to one of the choice states or moves from
      # one of the choice states back to the Start state.

      Time = 3

      # Priors about initial states: D and d
      # =========================================================================

      #--------------------------------------------------------------------------
      # Specify prior probabilities about initial states in the generative
      # process (D)
      # Note: By default, these will also be the priors for the generative
      # model
      #--------------------------------------------------------------------------

      # Setup vector list for D
      D = c(list())

      # Predim list, otherwise assigning via e.e D[[1]][[1]] is not possible, but
      # necessary for the loop later on:
      D[[1]] = c(rep(list(matrix(0,c(2,1))),1)) # can be random.
      D[[2]] = c(rep(list(matrix(0,c(2,1))),1))

      # For the 'context' state factor, we can specify that the 'left better'
      # context (i.e., where the left slot machine is more likely to win)
      # is the true context:

      # matrix(c('left better','right better'))
      D[[1]][[1]] = matrix(c(1, 0))

      # For the 'behavior' state factor, we can specify that the agent
      # always begins a trial in the 'start' state (i.e., before choosing
      # to either pick a slot machine or first ask for a hint:

      # matrix(c('start','hint','choose-left','choose-right'))
      D[[2]][[1]] = matrix(c(1, 0, 0, 0))

      #--------------------------------------------------------------------------
      # Specify prior beliefs about initial states in the generative model
      # (d) Note: This is optional, and will simulate learning priors over
      # states if specified.
      #--------------------------------------------------------------------------

      # Note that these are technically what are called 'Dirichlet
      # concentration paramaters', which need not take on values between
      # 0 and 1. These values are added to after each trial, based on
      # posterior beliefs about initial states. For example, if the agent
      # believed at the end of trial 1 that it was in the 'left better'
      # context, then d{1} on trial 2 would be d{1} = [1.5 0.5]' (although
      # how large the increase in value is after each trial depends on a
      # learning rate). In general, higher values indicate more confidence
      # in one's beliefs about initial states, and entail that beliefs will
      # change more slowly (e.g., the shape of the distribution encoded by
      # d{1} = [25 25]' will change much more slowly than the shape of the
      # distribution encoded by d{1} = [.5 0.5]' with each new observation).

      # Setup vector list for d
      d = c(list())
      d[[1]] = c(rep(list(matrix(0,c(2,1))),1)) # can be random.
      d[[2]] = c(rep(list(matrix(0,c(2,1))),1)) # can be random.


      # For context beliefs, we can specify that the agent starts out believing
      # that both contexts are equally likely, but with somewhat low confidence
      # in these beliefs:

      # matrix(c('left better','right better'))
      d[[1]][[1]] = matrix(c(.25,.25))

      # For behavior beliefs, we can specify that the agent expects with
      # certainty that it will begin a trial in the 'start' state:

      # matrix(c('start','hint','choose-left','choose-right'))
      d[[2]][[1]] = matrix(c(1, 0, 0, 0))


      # State-outcome mappings and beliefs: A and a
      #=========================================================================

      #--------------------------------------------------------------------------
      # Specify the probabilities of outcomes given each state in the
      # generative process (A)
      # This includes one matrix per outcome modality
      # Note: By default, these will also be the beliefs in the generative
      # model
      #--------------------------------------------------------------------------

      # First we specify the mapping from states to observed hints (outcome
      # modality 1). Here, the rows correspond to observations, the columns
      # correspond to the first state factor (context), and the third
      # dimension corresponds to behavior. Each column is a probability
      # distribution that must sum to 1.

      # Setup a vector list:
      A = c(list())

      # We start by specifying that both contexts generate the 'No Hint'
      # observation across all behavior states:

      ## number of states in each state factor (2 and 4)
      Ns = matrix(c(length(D[[1]][[1]]), length(D[[2]][[1]])))
      Ns

      ### A_bhav / context (context  = left or right)
      A_bhavCont = matrix(c(1,1,     # No Hint
                            0,0,     # Machine-Left Hint
                            0,0),    # Machine-Right Hint
                          nrow = 3, byrow = TRUE)

      # Assign to vector list:
      # Alternative to the loop in the Matlab script (line 905, 12.06.2022)
      A[[1]] = c(rep(list(A_bhavCont),Ns[2]))


      # Then we specify that the 'Get Hint' behavior state generates a hint that
      # either the left or right slot machine is better, depending on the context
      # state. In this case, the hints are accurate with a probability of pHA.

      pHA = 1 # By default we set this to 1, but try changing its value to
      # see how it affects model behavior

      A_hintAcc = matrix(c( 0   ,    0   ,   # No Hint
                           pHA  , (1-pHA),   # Machine-Left Hint
                         (1-pHA),   pHA) ,   # Machine-Right Hint
                         nrow = 3, byrow = TRUE)
      # Assign to vector list:
      A[[1]][[2]] = A_hintAcc

      # Next we specify the mapping between states and wins/losses. The first
      # two behavior states ('Start' and 'Get Hint') do not generate either
      # win or loss observations in either context:

      A_contWL = matrix(c( 1, 1,   # Null
                           0, 0,   # Loss
                           0, 0),  # Win
                        nrow = 3, byrow = TRUE)
      # Assign to list:
      # Alt to loop in Matlab script (line 927, 12.06.2022)
      A[[2]] = c(rep(list(A_contWL),2))

      # Choosing the left machine (behavior state 3) generates wins with
      # probability pWin, which differs depending on the context state (columns):

      pWin = .8  # By default we set this to .8, but try changing its value to
      # see how it affects model behavior

      # Does not need an extra name to be assigned:
      A[[2]][[3]] = matrix(c(          0   ,     0    ,  # Null
                                   (1-pWin),   pWin   ,  # Loss
                                     pWin  , (1-pWin)),  # Win
                           nrow = 3, byrow = TRUE)

      # Choosing the right machine (behavior state 4) generates wins with
      # probability pWin, with the reverse mapping to context states from
      # choosing the left machine:

      # Assign to list:
      A[[2]][[4]] = matrix(c(        0   ,      0   ,   # Null
                                   pWin  ,  (1-pWin),   # Loss
                                 (1-pWin),    pWin) ,   # Win
                           ncol = 2, nrow = 3, byrow = TRUE)

      # Finally, we specify an identity mapping between behavior states and
      # observed behaviors, to ensure the agent knows that behaviors were carried
      # out as planned. Here, each row corresponds to each behavior state.

      A_Id = matrix(c(0,0,  # Start
                      0,0,  # Hint
                      0,0,  # Choose-left
                      0,0), # Choose-right
                    nrow = 4, byrow = TRUE)

      # Assign to vector list
      A[[3]] = c(rep(list(A_Id),nrow(A_Id)))

      # Adds a c(1,1) vector to the respective line (Matlab line 956)
      for (i in 1:Ns[2]){
        A[[3]][[i]][i,] = c(1,1)
      }

      #--------------------------------------------------------------------------
      # Specify prior beliefs about state-outcome mappings in the generative model
      # (a)
      # Note: This is optional, and will simulate learning state-outcome mappings
      # if specified.
      #--------------------------------------------------------------------------

      # Similar to learning priors over initial states, this simply
      # requires specifying a matrix (a) with the same structure as the
      # generative process (A), but with Dirichlet concentration parameters
      # that can encode beliefs (and confidence in those beliefs) that need
      # not match the generative process. Learning then corresponds to
      # adding to the values of matrix entries, based on what outcomes were
      # observed when the agent believed it was in a particular state. For
      # example, if the agent observed a win while believing it was in the
      # 'left better' context and the 'choose left machine' behavior state,
      # the corresponding probability value would increase for that
      # location in the state outcome-mapping (i.e., a{2}(3,1,3) might
      # change from .8 to 1.8).

      # One simple way to set up this matrix is by:

      #  1. initially identifying it with the generative process
      #  2. multiplying the values by a large number to prevent learning all
      #     aspects of the matrix (so the shape of the distribution changes
      #     very slowly)
      #  3. adjusting the elements you want to differ from the generative
      #     process.

      # Setup vector list (here a modification of the list A)
      a = A

      # As another example, to simulate learning the hint accuracy one
      # might specify:
      a[[1]] = lapply(a[[1]],"*", 200)
      a[[2]] = lapply(a[[2]],"*", 200)
      a[[3]] = lapply(a[[3]],"*", 200)


      a[[1]][[2]] =  matrix(c(0,     0,    # No Hint
                              .25,   .25,    # Machine-Left Hint
                              .25,   .25),   # Machine-Right Hint
                            nrow = 3, byrow = TRUE)

      # Needed for cell_md_dot(), where the input for X = a[[1]]. In Matlab
      # Entering a function with a{1} will still give you the number of i=a[[i]]
      # In R we will loose this information, entering with a[[1]]. I will
      # write the function, such that one enters three inputs (a[[1]], Expect_states, a),
      # such that the numel(a{1}) is equivalent to length(a). I will still
      # assign the item outside the function to make aware of this circumstance.
      Numel_a = length(a) # Still needed?

      # Controlled transitions and transition beliefs : B{:,:,u} and b(:,:,u)
      #==========================================================================

      #--------------------------------------------------------------------------
      # Next, we have to specify the probabilistic transitions between
      # hidden states under each action (sometimes called 'control states').
      # Note: By default, these will also be the transitions beliefs
      # for the generative model
      #--------------------------------------------------------------------------

      # Columns are states at time t. Rows are states at t+1.

      # Setup list:
      B = vector("list", 2*4)
      dim(B) = matrix(c(2,4))

      # The agent cannot control the context state, so there is only 1 'action',
      # indicating that contexts remain stable within a trial:

      # Note: this is an identity matrix.

      B[[1,1]] = matrix(c(1, 0,   # 'Left Better' Context
                          0, 1),  # 'Right Better' Context
                        nrow = 2, ncol = 2, byrow = TRUE)


      # The agent can control the behavior state, and we include 4 possible
      # actions:

      # Move to the Start state from any other state
      B[[2,1]] = matrix(c(1, 1, 1, 1,  # Start State
                          0, 0, 0, 0,  # Hint
                          0, 0, 0, 0,  # Choose Left Machine
                          0, 0, 0, 0), # Choose Right Machine
                        nrow = 4, byrow = TRUE)

      # Move to the Hint state from any other state
      B[[2,2]] = matrix(c(0, 0, 0, 0,  # Start State
                          1, 1, 1, 1,  # Hint
                          0, 0, 0, 0,  # Choose Left Machine
                          0, 0, 0, 0), # Choose Right Machine
                        nrow = 4, byrow = TRUE)

      # Move to the Choose Left state from any other state
      B[[2,3]] = matrix(c(0, 0, 0, 0,  # Start State
                          0, 0, 0, 0,  # Hint
                          1, 1, 1, 1,  # Choose Left Machine
                          0, 0, 0, 0), # Choose Right Machine
                        nrow = 4, byrow = TRUE)

      # Move to the Choose Right state from any other state
      B[[2,4]] = matrix(c(0, 0, 0, 0,  # Start State
                          0, 0, 0, 0,  # Hint
                          0, 0, 0, 0,  # Choose Left Machine
                          1, 1, 1, 1), # Choose Right Machine
                        nrow = 4, byrow = TRUE)

      # For Number of controllable transitions we will create a dummy
      # array that is without NULL values, but equivalent dimension:

      Bdim <- c(rep(list(array(0, c(2, 2, 1))), 1), list(array(0, c(4, 4, 4))))

      #--------------------------------------------------------------------------
      # Specify prior beliefs about state transitions in the generative model
      # (b). This is a set of matrices with the same structure as B.
      # Note: This is optional, and will simulate learning state transitions
      # if specified.
      #--------------------------------------------------------------------------

      # For this example, we will not simulate learning transition beliefs.
      # But, similar to learning d and a, this just involves accumulating
      # Dirichlet concentration parameters. Here, transition beliefs are
      # updated after each trial when the agent believes it was in a given
      # state at time t and and another state at t+1.

      # Preferred outcomes: C and c
      #==========================================================================

      #--------------------------------------------------------------------------
      # Next, we have to specify the 'prior preferences', encoded here as log
      # probabilities.
      #--------------------------------------------------------------------------
      # One matrix per outcome modality. Each row is an observation, and each
      # columns is a time point. Negative values indicate lower preference,
      # positive values indicate a high preference. Stronger preferences promote
      # risky choices and reduced information-seeking.

      # We can start by setting a 0 preference for all outcomes:

      # No: number of outcomes in each outcome modality
      No = matrix(c(nrow(A[[1]][[1]]), nrow(A[[2]][[1]]), nrow(A[[3]][[1]]) ))
      No

      # Setup vector list for C
      C = c(list())

      C[[1]] = c(rep(list(matrix(0,nrow=No[1],ncol=Time))))
      C[[2]] = c(rep(list(matrix(0,nrow=No[2],ncol=Time))))
      C[[3]] = c(rep(list(matrix(0,nrow=No[1],ncol=Time))))

      c(rep(list(matrix(0,nrow=No[1],ncol=Time))))
      # Alternative for an overview below, but otherwise not necessary:

      # Hints
      C[[1]][[1]]      = matrix(c(0, 0, 0,   # Not hint
                                  0, 0, 0,   # Machine-Left Hint
                                  0, 0, 0),  # Machine-Left Hint
                                nrow = 3, byrow = TRUE)

      # Wins/Losses
      C[[2]][[1]]      = matrix(c(0, 0, 0,   # Null
                                  0, 0, 0,   # Loss
                                  0, 0, 0),  # Win
                                nrow = 3, byrow = TRUE)

      # Observed Behaviors

      C[[3]][[1]]      = matrix(c(0, 0, 0,   # Start State
                                  0, 0, 0,   # Hint
                                  0, 0, 0,   # Choose Left Machine
                                  0, 0, 0),  # Choose Right Machine
                                nrow = 4, byrow = TRUE)

      # Then we can specify a 'loss aversion' magnitude (la) at time points 2
      # and 3, and a 'reward seeking' (or 'risk-seeking') magnitude (rs).
      # Here, rs is divided by 2 at the third time point to encode a smaller
      # win ($2 instead of $4) if taking the hint before choosing a slot
      # machine.

      la = 1   # By default we set this to 1, but try changing its value to
      # see how it affects model behavior

      rs = 4  # By default we set this to 4, but try changing its value to
      # see how it affects model behavior

      C[[2]][[1]] =  matrix(c(0,  0,   0,      # Null
                              0, -la, -la,     # Loss
                              0,  rs,  rs/2),  # win
                            nrow = 3, byrow = TRUE)

      #--------------------------------------------------------------------------
      # One can also optionally choose to simulate preference learning by
      # specifying a Dirichlet distribution over preferences (c).
      #--------------------------------------------------------------------------

      # This will not be simulated here. However, this works by increasing
      # the reference magnitude for an outcome each time that outcome is
      # observed. The assumption here is that preferences naturally increase
      # for entering situations that are more familiar.

      # Allowable policies: U or V.
      #==========================================================================

      #--------------------------------------------------------------------------
      # Each policy is a sequence of actions over time that the agent can
      # consider.
      #--------------------------------------------------------------------------

      # For our simulations, we will specify V, where rows correspond to
      # time points and should be length T-1 (here, 2 transitions, from
      # time point 1 to time point 2, and time point 2 to time point 3):

      NumPolicies = 5 # Number of policies
      NumFactors = 2 # Number of state factors

      V = vector("list", 1*2)
      dim(V) = matrix(c(1,2))


      # Deep policies v[[]]

      V[[1]]        = matrix(c(1, 1, 1, 1, 1,
                               1, 1, 1, 1, 1), # Context state is not controllable
                             nrow = 2, byrow = TRUE)


      V[[2]]        = matrix(c(1, 2, 2, 3, 4,     # t1 to t2
                               1, 3, 4, 1, 1),    # t2 to t3
                             nrow = 2, byrow = TRUE)
      # Within the loop transition in time (t1 to t2...) is treated
      # as t-1 or t<Time (fiddled within the loops).

      # For V[[2]], columns left to right indicate policies allowing:
      # 1. staying in the start state
      # 2. taking the hint then choosing the left machine
      # 3. taking the hint then choosing the right machine
      # 4. choosing the left machine right away (then returning to start
      #    state)
      # 5. choosing the right machine right away (then returning to start
      #    state)


      # Habits: E and e.
      #==========================================================================

      #--------------------------------------------------------------------------
      # Optional: a columns vector with one entry per policy, indicating
      # the prior probability of choosing that policy (i.e., independent
      # of other beliefs).
      #--------------------------------------------------------------------------

      # We will not equip our agent with habits with any starting habits
      # (flat distribution over policies):

      E = matrix(c(1, 1, 1, 1, 1),
                 nrow = 5, ncol = 1, byrow = TRUE)

      # To incorporate habit learning, where policies become more likely
      # after each time they are chosen, we can also specify concentration
      # parameters by specifying e:

      e = matrix(c(1, 1, 1, 1, 1),
                 nrow = 5, ncol = 1, byrow = TRUE)


      # Additional optional parameters.
      #==========================================================================

      # Eta: learning rate (0-1) controlling the magnitude of concentration
      # parameter updates after each trial (if learning is enabled).

      eta = 1 # Default (maximum) learning rate

      # Omega: forgetting rate (0-1) controlling the magnitude of
      # reduction in concentration parameter values after each trial
      # (if learning is enabled).

      omega = 1 # Default value indicating there is no forgetting
      # (values < 1 indicate forgetting)

      # Beta: Expected precision of expected free energy (G) over
      # policies (a positive value, with higher values indicating lower
      # expected precision). Lower values increase the influence of
      # habits (E) and otherwise make policy selection less deteriministic.

      beta = 1 # By default this is set to 1, but try increasing its value
      # to lower precision and see how it affects model behavior

      # Alpha: An 'inverse temperature' or 'action precision' parameter that
      # controls how much randomness there is when selecting actions
      # (e.g., how  often the agent might choose not to take the hint,
      # even if the model assigned the highest probability to that action.
      # This is a positive number, where higher values indicate less
      # randomness. Here we set this to a fairly high value:

      alpha = 32 # fairly low randomness in action selection


      ## Define POMDP Structure
      #==========================================================================

      # Not necessary in R, but to get the same overview as in the Matlab script:

      Time                 # Number of time steps
      V                    # allowable (deep) policies

      A                    # state-outcome mapping
      B                    # transition probabilities
      C                    # preferred states
      D                    # priors over initial states
      d                    # enable learning priors over initial states

      # Also just for an overview compared to the matlab script.
      # To recreate the below in R, we have to either add E for Gen_model = 1
      # or  a and e to the list, if Gen_model = 2. We will do so at the end of
      # the function.
      if (Gen_model == 1){
        E }              # prior over policies

      if (Gen_model == 2){
        a                # enable learning state-outcome mappings
        e                # enable learning of prior over policies
      }

      eta = eta                 # learning rate
      omega = omega             # forgetting rate
      alpha = alpha             # action precision
      beta = beta               # expected free energy precision


      # respecify for use in inversion script (specific to this tutorial example)
      NumPolicies   # Number of policies
      NumFactors    # Number of state factors

      # Set up list:
      if(Gen_model == 1){
        MDP = list(A,B,C,D,d,E,V,Time,eta,alpha,beta,omega,NumPolicies,NumFactors, chosen_action=1, Bdim)

        # (Re)name items of list (name for chosen_action is included as well already, so it can simply be added)