-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathwalds_eq_example.py
More file actions
99 lines (80 loc) · 3.66 KB
/
walds_eq_example.py
File metadata and controls
99 lines (80 loc) · 3.66 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
import math
from tqdm import tqdm
"""
PROOF: Expected Value of the Stopping Increment Y_N
Problem:
Sample Y_i ~ U(0, 1) independently.
Stop at N = min{ n : sum(Y_1...Y_n) >= 1 }.
Find E[Y_N].
-------------------------------------------------------------------------------
1. EXPECTED STOPPING TIME (E[N])
The probability that the sum of n uniform variables is less than 1 is the
volume of an n-dimensional simplex: P(S_n < 1) = 1/n!.
Since N > n is equivalent to S_n < 1:
E[N] = sum_{n=0 to inf} P(N > n) = sum_{n=0 to inf} 1/n! = e.
2. WALD'S IDENTITY FOR THE TOTAL SUM (E[S_N])
Wald's Identity states E[S_N] = E[N] * E[Y_i].
E[S_N] = e * (1/2) = e/2.
3. EXPECTED SUM BEFORE STOPPING (E[S_{N-1}])
For n >= 2, the density of S_{n-1} at s (for s < 1) is f(s) = s^(n-2)/(n-2)!.
The probability of stopping at step n given S_{n-1} = s is P(s + Y_n >= 1) = s.
E[S_{N-1}] = sum_{n=2 to inf} integral_0^1 [ s * (s) * s^(n-2)/(n-2)! ] ds
= sum_{n=2 to inf} integral_0^1 [ s^n / (n-2)! ] ds
= sum_{n=2 to inf} [ 1 / ((n+1)(n-2)!) ]
Let k = n - 2:
E[S_{N-1}] = sum_{k=0 to inf} [ 1 / ((k+3)k!) ]
Using the power series integral of x^2 * e^x from 0 to 1:
E[S_{N-1}] = e - 2.
4. THE LAST INCREMENT (E[Y_N])
Since S_N = S_{N-1} + Y_N:
E[Y_N] = E[S_N] - E[S_{N-1}]
E[Y_N] = (e/2) - (e - 2)
E[Y_N] = 2 - e/2 ≈ 0.640859
-------------------------------------------------------------------------------
"""
# ─── Efficiency notes ────────────────────────────────────────────────────────
# 1. Running sum: track a scalar `total_yn` instead of re-summing a list
# each iteration (O(1) update vs O(n) rescan).
# 2. Numpy batch sampling: pre-draw a chunk of randoms at once. numpy's RNG
# is implemented in C and amortises Python overhead across many draws,
# far faster than calling random.uniform() in a tight Python loop.
# 3. np.cumsum + argmax: find the crossing index inside a batch with a single
# vectorised scan rather than a Python while-loop.
# ─────────────────────────────────────────────────────────────────────────────
NUM_SIMULATIONS = 1_000_000
BATCH = 20 # pre-draw this many uniforms per simulation; E[N]=e≈2.7
# so 20 is almost always enough with very rare top-ups
theoretical = 2 - math.e / 2
rng = np.random.default_rng()
total_yn = 0.0 # running sum of y_N values (O(1) update each iteration)
running_avg = 0.0
pbar = tqdm(range(NUM_SIMULATIONS), desc="Simulating")
for i in pbar:
# Draw a batch of candidates at once (vectorised C call)
draws = rng.random(BATCH)
cumsum = np.cumsum(draws)
# Find the first index where the cumulative sum crosses 1
idx = np.argmax(cumsum >= 1.0)
if cumsum[idx] < 1.0:
# Rare: entire batch didn't cross — top up one draw at a time
total = cumsum[-1]
while True:
y = rng.random()
total += y
if total >= 1.0:
draws = np.append(draws, y)
idx = len(draws) - 1
break
total_yn += draws[idx]
running_avg = total_yn / (i + 1)
pbar.set_postfix(
avg=f"{running_avg:.7f}",
theory=f"{theoretical:.7f}",
diff=f"{abs(running_avg - theoretical):.7f}",
)
pbar.close()
print()
print(f"Simulated average of y_N : {running_avg:.7f}")
print(f"Theoretical (2 - e/2) : {theoretical:.7f}")
print(f"Difference : {abs(running_avg - theoretical):.7f}")