When you say
"Each coordinate is independently chosen to be either uniform random in [0,1] (with probability π ) or simple 0 (with probability 1βπ )"
I believe it should be
"Each coordinate is independently chosen to be either uniform random in [0,1] (with probability 1-π ) or simple 0 (with probability π )"
so that s is the sparsity and is consistent with https://transformer-circuits.pub/2022/toy_model/index.html#demonstrating
When you say
"Each coordinate is independently chosen to be either uniform random in [0,1] (with probability π ) or simple 0 (with probability 1βπ )"
I believe it should be
"Each coordinate is independently chosen to be either uniform random in [0,1] (with probability 1-π ) or simple 0 (with probability π )"
so that s is the sparsity and is consistent with https://transformer-circuits.pub/2022/toy_model/index.html#demonstrating