These are some models I made for the coin flip cheaters
There are two types of coins in the problem.
- fair coin (probability of heads is 1/2)
- biased coin (probability of heads is 3/4)
One can find the probability that the coin is fair using Bayes' theorem.
Let
where
If we label right, we get 1 point and 15 flipping chances. Otherwise, we lose 30 flipping chances. Let's consider only the flipping chances. If we flip once, we lose 1 flipping chance. We may gain or lose flipping chances by labeling. Thus, we can evaluate the labeling process by the following formula.
For my model, I used
From above, we can calculate the expected reward. If there is a way that maximizes the expected reward, we can take that as a strategy.
The no belief model tests a coin until the 'generally' expected reward is at its maximum. What I meant by the term, generally, is that it does not consider how many times we got heads.
Suppose that
We can compute
The weak belief model tests a coin while the expected reward increases, meaning we compare the following two values.
$$\mathbb{E}[reward(n, x)]$$ $$P(\text{The next coin is head}) \times reward(n + 1, x + 1) + P(\text{The next coin is tail}) \times reward(n + 1, x)$$
If the first value is greater than the second value, the reward is expected to decrease, thus, we label the coin instantly. Otherwise, the reward is expected to increase, hence, we flip once more.
To calculate the second value, we use the law of total probability, and
We can calculate
The fanatic model is similar to the weak belief model, but unlike the weak belief model, the model considers the 'current' label. As mentioned above, the coin can be labeled by using
This model strongly depends on the posterior observation data, not considering the prior probability.
The belief model is also similar to the weak belief model, except that it considers
If the model meets the end condition for the first time, it is unclear whether the coin is fair or biased. Therefore, calm models do not end testing if the end condition (the reward is expected to be decreased) is first met. When the end condition is again met, they end testing. If the remaining fund is 0 or the flip is the 14th flip, it ends testing, because there are no more chances to flip.
Indecisive models use the significance level to label the coin. If the
As mentioned above, the indecisive model uses the significance level to check the end condition. Let
Empirically, the significance level of 0.3 worked fine.
For computing the expected next reward, other than considering only the next step, indecisive models consider every combination that can occur. When considering every combination, it computes the expected reward if given
Below is the cheat sheet built using the fanatic model. F for Fair, C for Cheat, T for further testing.
| T | |||||||||||||||||
| F | T | ||||||||||||||||
| F | T | C | |||||||||||||||
| F | F | T | C | ||||||||||||||
| F | F | F | T | C | |||||||||||||
| F | F | F | T | C | C | ||||||||||||
| F | F | F | F | T | C | C | |||||||||||
| F | F | F | F | T | T | C | C | ||||||||||
| F | F | F | F | F | T | C | C | C | |||||||||
| F | F | F | F | F | F | T | C | C | C | ||||||||
| F | F | F | F | F | F | T | C | C | C | C | |||||||
| F | F | F | F | F | F | F | T | C | C | C | C | ||||||
| F | F | F | F | F | F | F | F | T | C | C | C | C | |||||
| F | F | F | F | F | F | F | F | T | C | C | C | C | C | ||||
| F | F | F | F | F | F | F | F | F | C | C | C | C | C | C | |||
| F | F | F | F | F | F | F | F | F | F | C | C | C | C | C | C |
Below is the cheat sheet built using the basic indecisive model. F for Fair, C for Cheat, T for further testing.
| T | |||||||||||||||||
| T | T | ||||||||||||||||
| F | T | T | |||||||||||||||
| F | F | T | C | ||||||||||||||
| F | F | T | T | C | |||||||||||||
| F | F | F | T | C | C | ||||||||||||
| F | F | F | F | T | C | C | |||||||||||
| F | F | F | F | T | T | C | C | ||||||||||
| F | F | F | F | F | T | C | C | C | |||||||||
| F | F | F | F | F | T | T | C | C | C | ||||||||
| F | F | F | F | F | F | T | T | C | C | C | |||||||
| F | F | F | F | F | F | F | T | C | C | C | C | ||||||
| F | F | F | F | F | F | F | T | T | C | C | C | C | |||||
| F | F | F | F | F | F | F | F | T | C | C | C | C | C | ||||
| F | F | F | F | F | F | F | F | F | C | C | C | C | C | C | |||
| F | F | F | F | F | F | F | F | F | F | C | C | C | C | C | C |
By the way, the game is well designed. The 'generally' expected rewards are calculated as below.
The expected rewards are all negative; thus, somehow, the game would end eventually. If we modify
Some expected rewards are positive. If we do only a few tests, the flipping chance is expected to be gained. Therefore, the game would be super long, or even endless, and become boring. Fix
Again, some expected rewards are positive.
Still, we can modify some parameters to make different games. We can lower