HigherMathInvestigations/walds_eq_example.py at master · KerimovEmil/HigherMathInvestigations · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
import math
from tqdm import tqdm

"""
PROOF: Expected Value of the Stopping Increment Y_N

Problem:
Sample Y_i ~ U(0, 1) independently.
Stop at N = min{ n : sum(Y_1...Y_n) >= 1 }.
Find E[Y_N].

-------------------------------------------------------------------------------
1. EXPECTED STOPPING TIME (E[N])
The probability that the sum of n uniform variables is less than 1 is the
volume of an n-dimensional simplex: P(S_n < 1) = 1/n!.
Since N > n is equivalent to S_n < 1:
    E[N] = sum_{n=0 to inf} P(N > n) = sum_{n=0 to inf} 1/n! = e.

2. WALD'S IDENTITY FOR THE TOTAL SUM (E[S_N])
Wald's Identity states E[S_N] = E[N] * E[Y_i].
    E[S_N] = e * (1/2) = e/2.

3. EXPECTED SUM BEFORE STOPPING (E[S_{N-1}])
For n >= 2, the density of S_{n-1} at s (for s < 1) is f(s) = s^(n-2)/(n-2)!.
The probability of stopping at step n given S_{n-1} = s is P(s + Y_n >= 1) = s.
    E[S_{N-1}] = sum_{n=2 to inf} integral_0^1 [ s * (s) * s^(n-2)/(n-2)! ] ds
               = sum_{n=2 to inf} integral_0^1 [ s^n / (n-2)! ] ds
               = sum_{n=2 to inf} [ 1 / ((n+1)(n-2)!) ]

Let k = n - 2:
    E[S_{N-1}] = sum_{k=0 to inf} [ 1 / ((k+3)k!) ]
Using the power series integral of x^2 * e^x from 0 to 1:
    E[S_{N-1}] = e - 2.

4. THE LAST INCREMENT (E[Y_N])
Since S_N = S_{N-1} + Y_N:
    E[Y_N] = E[S_N] - E[S_{N-1}]
    E[Y_N] = (e/2) - (e - 2)
    E[Y_N] = 2 - e/2 ≈ 0.640859
-------------------------------------------------------------------------------
"""

# ─── Efficiency notes ────────────────────────────────────────────────────────
# 1. Running sum: track a scalar `total_yn` instead of re-summing a list
#    each iteration (O(1) update vs O(n) rescan).
# 2. Numpy batch sampling: pre-draw a chunk of randoms at once. numpy's RNG
#    is implemented in C and amortises Python overhead across many draws,
#    far faster than calling random.uniform() in a tight Python loop.
# 3. np.cumsum + argmax: find the crossing index inside a batch with a single
#    vectorised scan rather than a Python while-loop.
# ─────────────────────────────────────────────────────────────────────────────

NUM_SIMULATIONS = 1_000_000
BATCH = 20  # pre-draw this many uniforms per simulation; E[N]=e≈2.7
# so 20 is almost always enough with very rare top-ups

theoretical = 2 - math.e / 2
rng = np.random.default_rng()

total_yn = 0.0  # running sum of y_N values (O(1) update each iteration)
running_avg = 0.0

pbar = tqdm(range(NUM_SIMULATIONS), desc="Simulating")

for i in pbar:
    # Draw a batch of candidates at once (vectorised C call)
    draws = rng.random(BATCH)
    cumsum = np.cumsum(draws)

    # Find the first index where the cumulative sum crosses 1
    idx = np.argmax(cumsum >= 1.0)

    if cumsum[idx] < 1.0:
        # Rare: entire batch didn't cross — top up one draw at a time
        total = cumsum[-1]
        while True:
            y = rng.random()
            total += y
            if total >= 1.0:
                draws = np.append(draws, y)
                idx = len(draws) - 1
                break

    total_yn += draws[idx]
    running_avg = total_yn / (i + 1)

    pbar.set_postfix(
        avg=f"{running_avg:.7f}",
        theory=f"{theoretical:.7f}",
        diff=f"{abs(running_avg - theoretical):.7f}",
    )

pbar.close()

print()
print(f"Simulated average of y_N : {running_avg:.7f}")
print(f"Theoretical (2 - e/2)    : {theoretical:.7f}")
print(f"Difference               : {abs(running_avg - theoretical):.7f}")