Skip to content

Commit 75bd81e

Browse files
Edited algorithm descriptions
1 parent b0ab55c commit 75bd81e

1 file changed

Lines changed: 271 additions & 1 deletion

File tree

doc/algorithms.rst

Lines changed: 271 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ enumpure
2828

2929
Reads a game on standard input and searches for pure-strategy Nash equilibria.
3030

31+
For a strategic-form game, the algorithm systematically enumerates all pure strategy profiles and verifies,
32+
for each profile, that no unilateral deviation by any player can yield a higher payoff. In the case of
33+
extensive-form games, pure-strategy agent Nash equilibria can be determined in an analogous manner:
34+
the algorithm ensures that no player can improve their payoff through a unilateral deviation at any information set.
35+
3136
.. _enummixed:
3237

3338
enummixed
@@ -41,6 +46,26 @@ This is a superset of the points generated by the path-following procedure of Le
4146
It was shown by Shapley [Sha74]_ that there are equilibria not accessible via the method in :ref:`lcp`, whereas the output of
4247
:program:`enummixed` is guaranteed to return all the extreme points.
4348

49+
The algorithm begins by rescaling payoffs so that they are non-negative. It then constructs two polyhedra:
50+
51+
.. math::
52+
P_1 = \{x\geq 0\,|\,A_1x\leq 1\} \\
53+
P_2 = \{y\geq 0\,|\,A_2y\leq 1\}
54+
55+
where :math:`A_1` and :math:`A_2` denote the payoff matrices for the respective players. Next, a bipartite graph is
56+
formed between the vertices of :math:`P_1` and :math:`P_2`.
57+
An edge exists between a vertex :math:`x\in P_1` and a vertex :math:`y\in P_2` if and only if the conditions:
58+
59+
.. math::
60+
61+
x_i(1-A_2y)_i=0 \\
62+
y_j(1-A_1x)_j=0
63+
64+
are satisfied for all indices :math:`i` and :math:`j`. Whenever an edge connects :math:`x` and :math:`y`,
65+
normalising these vectors to form probability distributions produces an extreme Nash equilibrium.
66+
Furthermore, for any clique in the bipartite graph, the pair of convex hulls of the corresponding extreme equilibria
67+
defines a set of Nash equilibria.
68+
4469
.. _enumpoly:
4570

4671
enumpoly
@@ -69,6 +94,44 @@ supports which have the fewest strategies in total. For many classes
6994
of games, this will tend to lower the average time until finding one equilibrium,
7095
as well as finding the second equilibrium (if one exists).
7196

97+
(new) Reads a game on standard input and
98+
computes Nash equilibria by solving systems of polynomial equations
99+
and inequalities.
100+
101+
The algorithm begins by enumerating all supports that could potentially constitute the support
102+
of a Nash equilibrium. It then searches for equilibria within each support :math:`S` as follows.
103+
Consider an equilibrium mixed profile :math:`\sigma` with support :math:`S`.
104+
For all players :math:`i` and for every pair of their pure strategies :math:`(q,r)` in the support :math:`S`,
105+
the following indifference equations hold:
106+
107+
.. math::
108+
109+
u_i(q,\sigma_{-i}) = u_i(r, \sigma_{-i})
110+
111+
where :math:`u_i(a,\sigma_{-i})`` denotes the payoff obtained by player :math:`i` upon
112+
unilaterally deviating to strategy :math:`a`. These indifference equations can be expressed
113+
as polynomial equations in the strategy probabilities. Additionally, the requirement that
114+
the each player's strategy probabilities sum to one provides another polynomial equation.
115+
116+
The algorithm searches for roots of this polynomial system by successively subdividing the hypercube :math:`[0,1]^D`.
117+
The subdivision is performed such that each cell contains either no solutions or exactly one solution.
118+
For cells containing exactly one solution, Newton's method is applied to compute the solution precisely.
119+
Once solutions for a given support have been obtained, it is straightforward to verify that they
120+
satisfy the conditions of a Nash equilibrium.
121+
122+
For extensive-form games, the procedure is analogous, except that the variables correspond to sequence-form
123+
realization weights rather than pure-strategy probabilities.
124+
125+
For strategic games, the program searches supports in the order proposed
126+
by Porter, Nudelman, and Shoham [PNS04]_. For two-player games, this
127+
prioritises supports for which both players have the same number of
128+
strategies. For games with three or more players, this prioritises
129+
supports which have the fewest strategies in total. For many classes
130+
of games, this will tend to lower the average time until finding one equilibrium,
131+
as well as finding the second equilibrium (if one exists).
132+
133+
134+
72135
.. _lp:
73136

74137
lp
@@ -83,6 +146,18 @@ While the set of equilibria in a two-player constant-sum strategic
83146
game is convex, this method will only identify one of the extreme
84147
points of that set.
85148

149+
The algorithm constructs a linear program using the sequence-form
150+
constraints and payoff matrices. Specifically, it solves an optimisation
151+
problem of the form:
152+
153+
.. math::
154+
155+
\operatorname{maximise}~ c^Tx ~~~\operatorname{subject to}~ Ax\leq b,~x\geq 0
156+
157+
where :math:`x` denotes the vector of free variables. The linear program
158+
is solved using the simplex method, and the resulting solution is then
159+
translated into the mixed behavior profile corresponding to a Nash equilibrium.
160+
86161
.. _lcp:
87162

88163
lcp
@@ -108,6 +183,52 @@ of those convex sets. See :ref:`enummixed` for a method
108183
which is guaranteed to find all the extreme points for a strategic
109184
game.
110185

186+
(new)
187+
188+
Reads a two-player game on standard input and
189+
computes Nash equilibria by finding solutions to a linear
190+
complementarity problem.
191+
192+
For extensive games the algorithm constructs a linear complementarity program (LCP) using the sequence-form
193+
constraints and payoff matrices as defined by Koller, Megiddo, and von Stengel [KolMegSte94]_.
194+
Specifically, it seeks solutions to a system of inequalities of the form:
195+
196+
.. math::
197+
198+
w = Mz+q \\
199+
w\geq 0 \\
200+
z\geq 0 \\
201+
z^Tw = 0
202+
203+
where :math:`w` and :math:`z` denote the vectors of free variables.
204+
205+
To solve the LCP, the method of Lemke is used, where an artificial variable :math:`z_0` is introduced, so that:
206+
207+
.. math::
208+
209+
w = Mz + q + z_01
210+
211+
This creates a trivial solution where :math:`z_i=0` for all :math:`i\neq 0`
212+
and :math:`w_j=0` for some :math:`j`.
213+
A sequence of
214+
solutions is then generated via successive pivot operations until $z_0=0$, at which point a solution
215+
to the original LCP is obtained. This solution is subsequently translated into the mixed behavior profile
216+
corresponding to a Nash equilibrium.
217+
218+
For strategic games, the program uses the method of Lemke and Howson
219+
[LemHow64]_. In this case, the method will find all "accessible"
220+
equilibria, i.e., those that can be found as concatenations of Lemke-Howson
221+
paths that start at the artificial equilibrium.
222+
There exist strategic-form games for which some equilibria cannot be found
223+
by this method, i.e., some equilibria are inaccessible; see Shapley [Sha74]_.
224+
225+
In a two-player strategic game, the set of Nash equilibria can be expressed
226+
as the union of convex sets. This program will find extreme points
227+
of those convex sets. See :ref:`enummixed` for a method
228+
which is guaranteed to find all the extreme points for a strategic
229+
game.
230+
231+
111232
.. _liap:
112233

113234
liap
@@ -124,6 +245,47 @@ zero exactly at strategy profiles which are Nash equilibria.
124245
Note that this procedure is not globally convergent. That is, it is
125246
not guaranteed to find all, or even any, Nash equilibria.
126247

248+
(new)
249+
250+
Reads a game on standard input and computes
251+
approximate Nash equilibria using a function minimization approach.
252+
253+
Given a real number :math:`\sigma_a` associated with each action :math:`a` in an extensive form game,
254+
we define the following terms.
255+
256+
(i) Penalisation for negative probabilities:
257+
258+
.. math::
259+
260+
\sum_{a\in\mathcal{A}}(\min\{\sigma_a, 0\})^2
261+
262+
(ii) Penalisation for not summing to one at infosets:
263+
264+
.. math::
265+
266+
\sum_{I \in \mathcal{I}}\left(\sum_{a\in\mathcal{A}(I)}\sigma_a-1\right)^2
267+
268+
where :math:`\mathcal{I}` is the set of information sets and :math:`\mathcal{A}(I)` is the set of actions
269+
at information set :math:`I`.
270+
271+
(iii) Residual term:
272+
273+
.. math::
274+
275+
\sum_{I \in \mathcal{I}}\sum_{a\in\mathcal{A}(I)}(\max\{u(a)-u(I),0\})^2
276+
277+
where :math:`u(a)` and :math:`u(I)` denote the values (dependent on :math:`\sigma`) of action :math:`a` and
278+
information set :math:`I` respectively.
279+
280+
The Lyapunov function is defined as a weighted sum of these three terms.
281+
It is non-negative and equals zero if and only if :math:`\sigma` represents
282+
an agent Nash equilibrium mixed behavior profile.
283+
The algorithm searches for equilibria by generating random starting points and
284+
applying conjugate gradient descent to minimise the Lyapunov function.
285+
286+
Note that this procedure is not globally convergent; that is, it is not guaranteed to find all,
287+
or even any, Nash equilibria.
288+
127289
.. _logit:
128290

129291
logit
@@ -150,6 +312,50 @@ if an information set is not reached due to being the successor of chance
150312
moves with zero probability. In such games, the implementation treats
151313
the beliefs at such information sets as being uniform across all member nodes.
152314

315+
(new)
316+
317+
Reads a game on standard input and computes the
318+
principal branch of the (logit) quantal response correspondence.
319+
320+
The method is based on the procedure described in Turocy [Tur05]_ for
321+
strategic games and Turocy [Tur10]_ for extensive games.
322+
It uses standard path-following methods (as
323+
described in Allgower and Georg's "Numerical Continuation Methods") to
324+
adaptively trace the principal branch of the correspondence
325+
efficiently and securely.
326+
327+
For an extensive form game, an agent quantal response equilibrium with parameter :math:`\lambda`
328+
is defined by, at each information set :math:`I`, the equations:
329+
330+
.. math::
331+
332+
\sigma(a) \propto \exp(\lambda u_i(a))
333+
334+
for all actions :math:`a\in I`, where :math:`\sigma(a)` and :math:`u_i(a)` denote the
335+
probability and value of action :math:`a` respectively. This leads to the following
336+
system of equations over all infosets :math:`I` and actions :math:`a`:
337+
338+
.. math::
339+
340+
\sum_{b \in I}\sigma(b) = 1 \\
341+
\log \sigma(a) - \log \sigma(a_0) = \lambda(u_i(a) - u(a_0))
342+
343+
where :math:`a_0` is a fixed reference action in the information set containing :math:`a`.
344+
345+
These equations define a 1-dimensional manifold in the space of variables :math:`(\lambda, \log(\sigma))`.
346+
The algorithm starts on this manifold at :math:`\lambda = 0`, where the solution corresponds to the
347+
uniform distribution over actions at each information set. It then moves along the curve using a
348+
predictor-corrector method. Specifically, on each iteration the predictor step moves along the
349+
tangent of the curve, and then the corrector step uses Newton's method to project back onto the curve
350+
in the direction orthogonal to that tangent. Two parameters control the operation of this tracing.
351+
The algorithm terminates when the maximum regret is below the desired threshold.
352+
353+
In extensive games, logit quantal response equilibria are not well-defined
354+
if an information set is not reached due to being the successor of chance
355+
moves with zero probability. In such games, the implementation treats
356+
the beliefs at such information sets as being uniform across all member nodes.
357+
358+
153359
.. _simpdiv:
154360

155361
simpdiv
@@ -168,6 +374,26 @@ grid. The program continues this process with finer and finer grids
168374
until locating a mixed strategy profile at which the maximum regret is
169375
small.
170376

377+
(new)
378+
379+
Reads a game on standard input and computes
380+
approximations to Nash equilibria using a simplicial subdivision
381+
approach.
382+
383+
This program implements the algorithm of van der Laan, Talman, and van
384+
Der Heyden [VTH87]_. At each iteration, the algorithm triangulates the space of mixed strategy
385+
profiles into a simplicial complex. Each vertex in the triangulation is labelled with the
386+
player exhibiting the maximum regret and the strategy responsible for it. Each iteration seeks
387+
to find a completely labelled simplex, where each strategy is either present in the labels
388+
of its vertices, or has probability :math:`0` on the simplex (due to the simplex being on
389+
the boundary). It finds this by following a path of simplicies, starting from a
390+
:math:`0`-dimensional simplex (i.e. a single vertex), and guided by the labels of their vertices.
391+
On this path simplicies can increase or decrease in dimension (i.e. a vertex enters or exits) or can
392+
pivot, where a vertex that shares its label with another is chosen and the simplex is flipped
393+
along the facet opposite the vertex (so that the vertex exits and another enters). When
394+
a completely labelled simplex is reached it seeds the starting point of the next iteration,
395+
which operates on a finer triangulation of the space.
396+
171397
.. _ipa:
172398

173399
ipa
@@ -185,6 +411,23 @@ interpreted as defining a ray in the space of games. The profile must have
185411
the property that, for each player, the most frequently played strategy must
186412
be unique.
187413

414+
The algorithm utilises the concept of a polymatrix game, which is a game in
415+
which the payoffs take the form:
416+
417+
.. math::
418+
419+
u_i(\sigma) = \sum_{j\neq i} u_i^j(\sigma_i, \sigma_i)
420+
421+
where :math:`u_i(\sigma)` denotes the payoff to player :math:`i` from the mixed
422+
strategy profile :math:`\sigma`, which consists of a mixed strategy :math:`\sigma_k`
423+
for each player :math:`k`.
424+
425+
At each iteration, the algorithm begins with a mixed strategy profile. It then
426+
approximates the game as a polymatrix game around this profile and computes an
427+
equilibrium of the polymatrix game using the Lemke–Howson method. It then takes
428+
a step towards this solution, creating the starting mixed strategy profile for
429+
the next iteration.
430+
188431
.. _gnm:
189432

190433
gnm
@@ -197,6 +440,33 @@ and Wilson [GovWil03]_. This program is based on the
197440
implementation by Ben Blum and Christian Shelton.
198441

199442
The algorithm takes as a parameter a mixed strategy profile. This profile is
200-
interpreted as defining a ray in the space of games. The profile must have
443+
interpreted as defining a ray in the space of games. Specifically, it generates
444+
a set of games:
445+
446+
.. math::
447+
448+
\{ U_{\lambda} := U + \lambda\eta\,|\, \lambda\in\mathbb{R}\}
449+
450+
where :math:`U` is the payoff tensor of the original game and :math:`\eta` is
451+
constructed from the profile. The profile must have
201452
the property that, for each player, the most frequently played strategy must
202453
be unique.
454+
455+
Given some a game :math:`U_{\lambda}` in a ray, we have the following
456+
equations for an equilibrium mixed strategy profile :math:`\sigma` and
457+
payoff vector :math:`v`:
458+
459+
.. math::
460+
461+
\sigma_{i,s}(u_{\lambda, i}(s,\sigma_{-i})-v_i) = 0 \\
462+
\sum_{s}\sigma_{i,s} = 1
463+
464+
where :math:`u_{\lambda, i}(s,\sigma_{-i})` is the payoff that player
465+
:math:`i` would obtain by unilaterally deviating to pure strategy :math:`s`.
466+
Note that these equations, for all values of :math:`\lambda`, define a one
467+
dimensional manifold. The algorithm starts at a high value of :math:`\lambda`
468+
where the solution is trivial. At each iteration it moves along the tangent
469+
to the curve and then modifies :math:`\eta` (hence modifying the curve)
470+
such that this point lies on the new curve. Occasionally, Newton's method is
471+
used on an iteration to correct numerical errors. Once we reach a point with
472+
:math:`\lambda=0` we have an equilibrium of the true game.

0 commit comments

Comments
 (0)