@@ -28,6 +28,11 @@ enumpure
2828
2929Reads a game on standard input and searches for pure-strategy Nash equilibria.
3030
31+ For a strategic-form game, the algorithm systematically enumerates all pure strategy profiles and verifies,
32+ for each profile, that no unilateral deviation by any player can yield a higher payoff. In the case of
33+ extensive-form games, pure-strategy agent Nash equilibria can be determined in an analogous manner:
34+ the algorithm ensures that no player can improve their payoff through a unilateral deviation at any information set.
35+
3136.. _enummixed :
3237
3338enummixed
@@ -41,6 +46,26 @@ This is a superset of the points generated by the path-following procedure of Le
4146It was shown by Shapley [Sha74 ]_ that there are equilibria not accessible via the method in :ref: `lcp `, whereas the output of
4247:program: `enummixed ` is guaranteed to return all the extreme points.
4348
49+ The algorithm begins by rescaling payoffs so that they are non-negative. It then constructs two polyhedra:
50+
51+ .. math ::
52+ P_1 = \{ x\geq 0 \,|\, A_1 x\leq 1 \} \\
53+ P_2 = \{ y\geq 0 \,|\, A_2 y\leq 1 \}
54+
55+ where :math: `A_1 ` and :math: `A_2 ` denote the payoff matrices for the respective players. Next, a bipartite graph is
56+ formed between the vertices of :math: `P_1 ` and :math: `P_2 `.
57+ An edge exists between a vertex :math: `x\in P_1 ` and a vertex :math: `y\in P_2 ` if and only if the conditions:
58+
59+ .. math ::
60+
61+ x_i(1 -A_2 y)_i=0 \\
62+ y_j(1 -A_1 x)_j=0
63+
64+ are satisfied for all indices :math: `i` and :math: `j`. Whenever an edge connects :math: `x` and :math: `y`,
65+ normalising these vectors to form probability distributions produces an extreme Nash equilibrium.
66+ Furthermore, for any clique in the bipartite graph, the pair of convex hulls of the corresponding extreme equilibria
67+ defines a set of Nash equilibria.
68+
4469.. _enumpoly :
4570
4671enumpoly
@@ -69,6 +94,44 @@ supports which have the fewest strategies in total. For many classes
6994of games, this will tend to lower the average time until finding one equilibrium,
7095as well as finding the second equilibrium (if one exists).
7196
97+ (new) Reads a game on standard input and
98+ computes Nash equilibria by solving systems of polynomial equations
99+ and inequalities.
100+
101+ The algorithm begins by enumerating all supports that could potentially constitute the support
102+ of a Nash equilibrium. It then searches for equilibria within each support :math: `S` as follows.
103+ Consider an equilibrium mixed profile :math: `\sigma ` with support :math: `S`.
104+ For all players :math: `i` and for every pair of their pure strategies :math: `(q,r)` in the support :math: `S`,
105+ the following indifference equations hold:
106+
107+ .. math ::
108+
109+ u_i(q,\sigma _{-i}) = u_i(r, \sigma _{-i})
110+
111+ where :math: `u_i(a,\sigma _{-i})`` denotes the payoff obtained by player :math: `i` upon
112+ unilaterally deviating to strategy :math: `a`. These indifference equations can be expressed
113+ as polynomial equations in the strategy probabilities. Additionally, the requirement that
114+ the each player's strategy probabilities sum to one provides another polynomial equation.
115+
116+ The algorithm searches for roots of this polynomial system by successively subdividing the hypercube :math: `[0 ,1 ]^D`.
117+ The subdivision is performed such that each cell contains either no solutions or exactly one solution.
118+ For cells containing exactly one solution, Newton's method is applied to compute the solution precisely.
119+ Once solutions for a given support have been obtained, it is straightforward to verify that they
120+ satisfy the conditions of a Nash equilibrium.
121+
122+ For extensive-form games, the procedure is analogous, except that the variables correspond to sequence-form
123+ realization weights rather than pure-strategy probabilities.
124+
125+ For strategic games, the program searches supports in the order proposed
126+ by Porter, Nudelman, and Shoham [PNS04 ]_. For two-player games, this
127+ prioritises supports for which both players have the same number of
128+ strategies. For games with three or more players, this prioritises
129+ supports which have the fewest strategies in total. For many classes
130+ of games, this will tend to lower the average time until finding one equilibrium,
131+ as well as finding the second equilibrium (if one exists).
132+
133+
134+
72135.. _lp :
73136
74137lp
@@ -83,6 +146,18 @@ While the set of equilibria in a two-player constant-sum strategic
83146game is convex, this method will only identify one of the extreme
84147points of that set.
85148
149+ The algorithm constructs a linear program using the sequence-form
150+ constraints and payoff matrices. Specifically, it solves an optimisation
151+ problem of the form:
152+
153+ .. math ::
154+
155+ \operatorname {maximise}~ c^Tx ~~~\operatorname {subject to}~ Ax\leq b,~x\geq 0
156+
157+ where :math: `x` denotes the vector of free variables. The linear program
158+ is solved using the simplex method, and the resulting solution is then
159+ translated into the mixed behavior profile corresponding to a Nash equilibrium.
160+
86161.. _lcp :
87162
88163lcp
@@ -108,6 +183,52 @@ of those convex sets. See :ref:`enummixed` for a method
108183which is guaranteed to find all the extreme points for a strategic
109184game.
110185
186+ (new)
187+
188+ Reads a two-player game on standard input and
189+ computes Nash equilibria by finding solutions to a linear
190+ complementarity problem.
191+
192+ For extensive games the algorithm constructs a linear complementarity program (LCP) using the sequence-form
193+ constraints and payoff matrices as defined by Koller, Megiddo, and von Stengel [KolMegSte94 ]_.
194+ Specifically, it seeks solutions to a system of inequalities of the form:
195+
196+ .. math ::
197+
198+ w = Mz+q \\
199+ w\geq 0 \\
200+ z\geq 0 \\
201+ z^Tw = 0
202+
203+ where :math: `w` and :math: `z` denote the vectors of free variables.
204+
205+ To solve the LCP, the method of Lemke is used, where an artificial variable :math: `z_0 ` is introduced, so that:
206+
207+ .. math ::
208+
209+ w = Mz + q + z_01
210+
211+ This creates a trivial solution where :math: `z_i=0 ` for all :math: `i\neq 0 `
212+ and :math: `w_j=0 ` for some :math: `j`.
213+ A sequence of
214+ solutions is then generated via successive pivot operations until $z_0=0$, at which point a solution
215+ to the original LCP is obtained. This solution is subsequently translated into the mixed behavior profile
216+ corresponding to a Nash equilibrium.
217+
218+ For strategic games, the program uses the method of Lemke and Howson
219+ [LemHow64 ]_. In this case, the method will find all "accessible"
220+ equilibria, i.e., those that can be found as concatenations of Lemke-Howson
221+ paths that start at the artificial equilibrium.
222+ There exist strategic-form games for which some equilibria cannot be found
223+ by this method, i.e., some equilibria are inaccessible; see Shapley [Sha74 ]_.
224+
225+ In a two-player strategic game, the set of Nash equilibria can be expressed
226+ as the union of convex sets. This program will find extreme points
227+ of those convex sets. See :ref: `enummixed ` for a method
228+ which is guaranteed to find all the extreme points for a strategic
229+ game.
230+
231+
111232.. _liap :
112233
113234liap
@@ -124,6 +245,47 @@ zero exactly at strategy profiles which are Nash equilibria.
124245Note that this procedure is not globally convergent. That is, it is
125246not guaranteed to find all, or even any, Nash equilibria.
126247
248+ (new)
249+
250+ Reads a game on standard input and computes
251+ approximate Nash equilibria using a function minimization approach.
252+
253+ Given a real number :math: `\sigma _a` associated with each action :math: `a` in an extensive form game,
254+ we define the following terms.
255+
256+ (i) Penalisation for negative probabilities:
257+
258+ .. math ::
259+
260+ \sum _{a\in \mathcal {A}}(\min \{\sigma _a, 0 \})^2
261+
262+ (ii) Penalisation for not summing to one at infosets:
263+
264+ .. math ::
265+
266+ \sum _{I \in \mathcal {I}}\left (\sum _{a\in \mathcal {A}(I)}\sigma _a-1 \right )^2
267+
268+ where :math: `\mathcal {I}` is the set of information sets and :math: `\mathcal {A}(I)` is the set of actions
269+ at information set :math: `I`.
270+
271+ (iii) Residual term:
272+
273+ .. math ::
274+
275+ \sum _{I \in \mathcal {I}}\sum _{a\in \mathcal {A}(I)}(\max \{ u(a)-u(I),0 \})^2
276+
277+ where :math: `u(a)` and :math: `u(I)` denote the values (dependent on :math: `\sigma `) of action :math: `a` and
278+ information set :math: `I` respectively.
279+
280+ The Lyapunov function is defined as a weighted sum of these three terms.
281+ It is non-negative and equals zero if and only if :math: `\sigma ` represents
282+ an agent Nash equilibrium mixed behavior profile.
283+ The algorithm searches for equilibria by generating random starting points and
284+ applying conjugate gradient descent to minimise the Lyapunov function.
285+
286+ Note that this procedure is not globally convergent; that is, it is not guaranteed to find all,
287+ or even any, Nash equilibria.
288+
127289.. _logit :
128290
129291logit
@@ -150,6 +312,50 @@ if an information set is not reached due to being the successor of chance
150312moves with zero probability. In such games, the implementation treats
151313the beliefs at such information sets as being uniform across all member nodes.
152314
315+ (new)
316+
317+ Reads a game on standard input and computes the
318+ principal branch of the (logit) quantal response correspondence.
319+
320+ The method is based on the procedure described in Turocy [Tur05 ]_ for
321+ strategic games and Turocy [Tur10 ]_ for extensive games.
322+ It uses standard path-following methods (as
323+ described in Allgower and Georg's "Numerical Continuation Methods") to
324+ adaptively trace the principal branch of the correspondence
325+ efficiently and securely.
326+
327+ For an extensive form game, an agent quantal response equilibrium with parameter :math: `\lambda `
328+ is defined by, at each information set :math: `I`, the equations:
329+
330+ .. math ::
331+
332+ \sigma (a) \propto \exp (\lambda u_i(a))
333+
334+ for all actions :math: `a\in I`, where :math: `\sigma (a)` and :math: `u_i(a)` denote the
335+ probability and value of action :math: `a` respectively. This leads to the following
336+ system of equations over all infosets :math: `I` and actions :math: `a`:
337+
338+ .. math ::
339+
340+ \sum _{b \in I}\sigma (b) = 1 \\
341+ \log \sigma (a) - \log \sigma (a_0 ) = \lambda (u_i(a) - u(a_0 ))
342+
343+ where :math: `a_0 ` is a fixed reference action in the information set containing :math: `a`.
344+
345+ These equations define a 1-dimensional manifold in the space of variables :math: `(\lambda , \log (\sigma ))`.
346+ The algorithm starts on this manifold at :math: `\lambda = 0 `, where the solution corresponds to the
347+ uniform distribution over actions at each information set. It then moves along the curve using a
348+ predictor-corrector method. Specifically, on each iteration the predictor step moves along the
349+ tangent of the curve, and then the corrector step uses Newton's method to project back onto the curve
350+ in the direction orthogonal to that tangent. Two parameters control the operation of this tracing.
351+ The algorithm terminates when the maximum regret is below the desired threshold.
352+
353+ In extensive games, logit quantal response equilibria are not well-defined
354+ if an information set is not reached due to being the successor of chance
355+ moves with zero probability. In such games, the implementation treats
356+ the beliefs at such information sets as being uniform across all member nodes.
357+
358+
153359.. _simpdiv :
154360
155361simpdiv
@@ -168,6 +374,26 @@ grid. The program continues this process with finer and finer grids
168374until locating a mixed strategy profile at which the maximum regret is
169375small.
170376
377+ (new)
378+
379+ Reads a game on standard input and computes
380+ approximations to Nash equilibria using a simplicial subdivision
381+ approach.
382+
383+ This program implements the algorithm of van der Laan, Talman, and van
384+ Der Heyden [VTH87 ]_. At each iteration, the algorithm triangulates the space of mixed strategy
385+ profiles into a simplicial complex. Each vertex in the triangulation is labelled with the
386+ player exhibiting the maximum regret and the strategy responsible for it. Each iteration seeks
387+ to find a completely labelled simplex, where each strategy is either present in the labels
388+ of its vertices, or has probability :math: `0 ` on the simplex (due to the simplex being on
389+ the boundary). It finds this by following a path of simplicies, starting from a
390+ :math: `0 `-dimensional simplex (i.e. a single vertex), and guided by the labels of their vertices.
391+ On this path simplicies can increase or decrease in dimension (i.e. a vertex enters or exits) or can
392+ pivot, where a vertex that shares its label with another is chosen and the simplex is flipped
393+ along the facet opposite the vertex (so that the vertex exits and another enters). When
394+ a completely labelled simplex is reached it seeds the starting point of the next iteration,
395+ which operates on a finer triangulation of the space.
396+
171397.. _ipa :
172398
173399ipa
@@ -185,6 +411,23 @@ interpreted as defining a ray in the space of games. The profile must have
185411the property that, for each player, the most frequently played strategy must
186412be unique.
187413
414+ The algorithm utilises the concept of a polymatrix game, which is a game in
415+ which the payoffs take the form:
416+
417+ .. math ::
418+
419+ u_i(\sigma ) = \sum _{j\neq i} u_i^j(\sigma _i, \sigma _i)
420+
421+ where :math: `u_i(\sigma )` denotes the payoff to player :math: `i` from the mixed
422+ strategy profile :math: `\sigma `, which consists of a mixed strategy :math: `\sigma _k`
423+ for each player :math: `k`.
424+
425+ At each iteration, the algorithm begins with a mixed strategy profile. It then
426+ approximates the game as a polymatrix game around this profile and computes an
427+ equilibrium of the polymatrix game using the Lemke–Howson method. It then takes
428+ a step towards this solution, creating the starting mixed strategy profile for
429+ the next iteration.
430+
188431.. _gnm :
189432
190433gnm
@@ -197,6 +440,33 @@ and Wilson [GovWil03]_. This program is based on the
197440implementation by Ben Blum and Christian Shelton.
198441
199442The algorithm takes as a parameter a mixed strategy profile. This profile is
200- interpreted as defining a ray in the space of games. The profile must have
443+ interpreted as defining a ray in the space of games. Specifically, it generates
444+ a set of games:
445+
446+ .. math ::
447+
448+ \{ U_{\lambda } := U + \lambda\eta \,|\, \lambda\in \mathbb {R}\}
449+
450+ where :math: `U` is the payoff tensor of the original game and :math: `\eta ` is
451+ constructed from the profile. The profile must have
201452the property that, for each player, the most frequently played strategy must
202453be unique.
454+
455+ Given some a game :math: `U_{\lambda }` in a ray, we have the following
456+ equations for an equilibrium mixed strategy profile :math: `\sigma ` and
457+ payoff vector :math: `v`:
458+
459+ .. math ::
460+
461+ \sigma _{i,s}(u_{\lambda , i}(s,\sigma _{-i})-v_i) = 0 \\
462+ \sum _{s}\sigma _{i,s} = 1
463+
464+ where :math: `u_{\lambda , i}(s,\sigma _{-i})` is the payoff that player
465+ :math: `i` would obtain by unilaterally deviating to pure strategy :math: `s`.
466+ Note that these equations, for all values of :math: `\lambda `, define a one
467+ dimensional manifold. The algorithm starts at a high value of :math: `\lambda `
468+ where the solution is trivial. At each iteration it moves along the tangent
469+ to the curve and then modifies :math: `\eta ` (hence modifying the curve)
470+ such that this point lies on the new curve. Occasionally, Newton's method is
471+ used on an iteration to correct numerical errors. Once we reach a point with
472+ :math: `\lambda =0 ` we have an equilibrium of the true game.
0 commit comments