Edited algorithm descriptions

StephenPasteris · StephenPasteris · commit 75bd81eda28b · 2026-03-09T12:42:47.000Z
diff --git a/doc/algorithms.rst b/doc/algorithms.rst
@@ -28,6 +28,11 @@ enumpure
 
 Reads a game on standard input and searches for pure-strategy Nash equilibria.
 
+For a strategic-form game, the algorithm systematically enumerates all pure strategy profiles and verifies,
+for each profile, that no unilateral deviation by any player can yield a higher payoff. In the case of
+extensive-form games, pure-strategy agent Nash equilibria can be determined in an analogous manner:
+the algorithm ensures that no player can improve their payoff through a unilateral deviation at any information set.
+
 .. _enummixed:
 
 enummixed
@@ -41,6 +46,26 @@ This is a superset of the points generated by the path-following procedure of Le
 It was shown by Shapley [Sha74]_ that there are equilibria not accessible via the method in :ref:`lcp`, whereas the output of
 :program:`enummixed` is guaranteed to return all the extreme points.
 
+The algorithm begins by rescaling payoffs so that they are non-negative. It then constructs two polyhedra:
+
+.. math::
+    P_1 = \{x\geq 0\,|\,A_1x\leq 1\} \\
+    P_2 = \{y\geq 0\,|\,A_2y\leq 1\}
+
+where :math:`A_1` and :math:`A_2` denote the payoff matrices for the respective players. Next, a bipartite graph is
+formed between the vertices of :math:`P_1` and :math:`P_2`.
+An edge exists between a vertex :math:`x\in P_1` and a vertex :math:`y\in P_2` if and only if the conditions:
+
+.. math::
+
+    x_i(1-A_2y)_i=0 \\
+    y_j(1-A_1x)_j=0
+
+are satisfied for all indices :math:`i` and :math:`j`. Whenever an edge connects :math:`x` and :math:`y`,
+normalising these vectors to form probability distributions produces an extreme Nash equilibrium.
+Furthermore, for any clique in the bipartite graph, the pair of convex hulls of the corresponding extreme equilibria
+defines a set of Nash equilibria.
+
 .. _enumpoly:
 
 enumpoly
@@ -69,6 +94,44 @@ supports which have the fewest strategies in total.  For many classes
 of games, this will tend to lower the average time until finding one equilibrium,
 as well as finding the second equilibrium (if one exists).
 
+(new) Reads a game on standard input and
+computes Nash equilibria by solving systems of polynomial equations
+and inequalities.
+
+The algorithm begins by enumerating all supports that could potentially constitute the support
+of a Nash equilibrium. It then searches for equilibria within each support :math:`S` as follows.
+Consider an equilibrium mixed profile :math:`\sigma` with support :math:`S`.
+For all players :math:`i` and for every pair of their pure strategies :math:`(q,r)` in the support :math:`S`,
+the following indifference equations hold:
+
+.. math::
+
+    u_i(q,\sigma_{-i}) = u_i(r, \sigma_{-i})
+
+where :math:`u_i(a,\sigma_{-i})`` denotes the payoff obtained by player :math:`i` upon
+unilaterally deviating to strategy :math:`a`. These indifference equations can be expressed
+as polynomial equations in the strategy probabilities. Additionally, the requirement that
+the each player's strategy probabilities sum to one provides another polynomial equation.
+
+The algorithm searches for roots of this polynomial system by successively subdividing the hypercube :math:`[0,1]^D`.
+The subdivision is performed such that each cell contains either no solutions or exactly one solution.
+For cells containing exactly one solution, Newton's method is applied to compute the solution precisely.
+Once solutions for a given support have been obtained, it is straightforward to verify that they
+satisfy the conditions of a Nash equilibrium.
+
+For extensive-form games, the procedure is analogous, except that the variables correspond to sequence-form
+realization weights rather than pure-strategy probabilities.
+
+For strategic games, the program searches supports in the order proposed
+by Porter, Nudelman, and Shoham [PNS04]_.  For two-player games, this
+prioritises supports for which both players have the same number of
+strategies.  For games with three or more players, this prioritises
+supports which have the fewest strategies in total.  For many classes
+of games, this will tend to lower the average time until finding one equilibrium,
+as well as finding the second equilibrium (if one exists).
+
+
+
 .. _lp:
 
 lp
@@ -83,6 +146,18 @@ While the set of equilibria in a two-player constant-sum strategic
 game is convex, this method will only identify one of the extreme
 points of that set.
 
+The algorithm constructs a linear program using the sequence-form
+constraints and payoff matrices. Specifically, it solves an optimisation
+problem of the form:
+
+.. math::
+
+    \operatorname{maximise}~ c^Tx ~~~\operatorname{subject to}~ Ax\leq b,~x\geq 0
+
+where :math:`x` denotes the vector of free variables. The linear program
+is solved using the simplex method, and the resulting solution is then
+translated into the mixed behavior profile corresponding to a Nash equilibrium.
+
 .. _lcp:
 
 lcp
@@ -108,6 +183,52 @@ of those convex sets.  See :ref:`enummixed` for a method
 which is guaranteed to find all the extreme points for a strategic
 game.
 
+(new)
+
+Reads a two-player game on standard input and
+computes Nash equilibria by finding solutions to a linear
+complementarity problem.
+
+For extensive games the algorithm constructs a linear complementarity program (LCP) using the sequence-form
+constraints and payoff matrices as defined by Koller, Megiddo, and von Stengel [KolMegSte94]_.
+Specifically, it seeks solutions to a system of inequalities of the form:
+
+.. math::
+
+    w = Mz+q \\
+    w\geq 0 \\
+    z\geq 0 \\
+    z^Tw = 0
+
+where :math:`w` and :math:`z` denote the vectors of free variables.
+
+To solve the LCP, the method of Lemke is used, where an artificial variable :math:`z_0` is introduced, so that:
+
+.. math::
+
+    w = Mz + q + z_01
+
+This creates a trivial solution where :math:`z_i=0` for all :math:`i\neq 0`
+and :math:`w_j=0` for some :math:`j`.
+A sequence of
+solutions is then generated via successive pivot operations until $z_0=0$, at which point a solution
+to the original LCP is obtained. This solution is subsequently translated into the mixed behavior profile
+corresponding to a Nash equilibrium.
+
+For strategic games, the program uses the method of Lemke and Howson
+[LemHow64]_. In this case, the method will find all "accessible"
+equilibria, i.e., those that can be found as concatenations of Lemke-Howson
+paths that start at the artificial equilibrium.
+There exist strategic-form games for which some equilibria cannot be found
+by this method, i.e., some equilibria are inaccessible; see Shapley [Sha74]_.
+
+In a two-player strategic game, the set of Nash equilibria can be expressed
+as the union of convex sets. This program will find extreme points
+of those convex sets.  See :ref:`enummixed` for a method
+which is guaranteed to find all the extreme points for a strategic
+game.
+
+
 .. _liap:
 
 liap
@@ -124,6 +245,47 @@ zero exactly at strategy profiles which are Nash equilibria.
 Note that this procedure is not globally convergent. That is, it is
 not guaranteed to find all, or even any, Nash equilibria.
 
+(new)
+
+Reads a game on standard input and computes
+approximate Nash equilibria using a function minimization approach.
+
+Given a real number :math:`\sigma_a` associated with each action :math:`a` in an extensive form game,
+we define the following terms.
+
+(i) Penalisation for negative probabilities:
+
+.. math::
+
+    \sum_{a\in\mathcal{A}}(\min\{\sigma_a, 0\})^2
+
+(ii) Penalisation for not summing to one at infosets:
+
+.. math::
+
+    \sum_{I \in \mathcal{I}}\left(\sum_{a\in\mathcal{A}(I)}\sigma_a-1\right)^2
+
+where :math:`\mathcal{I}` is the set of information sets and :math:`\mathcal{A}(I)` is the set of actions
+at information set :math:`I`.
+
+(iii) Residual term:
+
+.. math::
+
+    \sum_{I \in \mathcal{I}}\sum_{a\in\mathcal{A}(I)}(\max\{u(a)-u(I),0\})^2
+
+where :math:`u(a)` and :math:`u(I)` denote the values (dependent on :math:`\sigma`) of action :math:`a` and
+information set :math:`I` respectively.
+
+The Lyapunov function is defined as a weighted sum of these three terms.
+It is non-negative and equals zero if and only if :math:`\sigma` represents
+an agent Nash equilibrium mixed behavior profile.
+The algorithm searches for equilibria by generating random starting points and
+applying conjugate gradient descent to minimise the Lyapunov function.
+
+Note that this procedure is not globally convergent; that is, it is not guaranteed to find all,
+or even any, Nash equilibria.
+
 .. _logit:
 
 logit
@@ -150,6 +312,50 @@ if an information set is not reached due to being the successor of chance
 moves with zero probability.  In such games, the implementation treats
 the beliefs at such information sets as being uniform across all member nodes.
 
+(new)
+
+Reads a game on standard input and computes the
+principal branch of the (logit) quantal response correspondence.
+
+The method is based on the procedure described in Turocy [Tur05]_ for
+strategic games and Turocy [Tur10]_ for extensive games.
+It uses standard path-following methods (as
+described in Allgower and Georg's "Numerical Continuation Methods") to
+adaptively trace the principal branch of the correspondence
+efficiently and securely.
+
+For an extensive form game, an agent quantal response equilibrium with parameter :math:`\lambda`
+is defined by, at each information set :math:`I`, the equations:
+
+.. math::
+
+    \sigma(a) \propto \exp(\lambda u_i(a))
+
+for all actions :math:`a\in I`, where :math:`\sigma(a)` and :math:`u_i(a)` denote the
+probability and value of action :math:`a` respectively. This leads to the following
+system of equations over all infosets :math:`I` and actions :math:`a`:
+
+.. math::
+
+    \sum_{b \in I}\sigma(b) = 1 \\
+    \log \sigma(a) - \log \sigma(a_0) = \lambda(u_i(a) - u(a_0))
+
+where :math:`a_0` is a fixed reference action in the information set containing :math:`a`.
+
+These equations define a 1-dimensional manifold in the space of variables :math:`(\lambda, \log(\sigma))`.
+The algorithm starts on this manifold at :math:`\lambda = 0`, where the solution corresponds to the
+uniform distribution over actions at each information set. It then moves along the curve using a
+predictor-corrector method. Specifically, on each iteration the predictor step moves along the
+tangent of the curve, and then the corrector step uses Newton's method to project back onto the curve
+in the direction orthogonal to that tangent. Two parameters control the operation of this tracing.
+The algorithm terminates when the maximum regret is below the desired threshold.
+
+In extensive games, logit quantal response equilibria are not well-defined
+if an information set is not reached due to being the successor of chance
+moves with zero probability.  In such games, the implementation treats
+the beliefs at such information sets as being uniform across all member nodes.
+
+
 .. _simpdiv:
 
 simpdiv
@@ -168,6 +374,26 @@ grid. The program continues this process with finer and finer grids
 until locating a mixed strategy profile at which the maximum regret is
 small.
 
+(new)
+
+Reads a game on standard input and computes
+approximations to Nash equilibria using a simplicial subdivision
+approach.
+
+This program implements the algorithm of van der Laan, Talman, and van
+Der Heyden [VTH87]_. At each iteration, the algorithm triangulates the space of mixed strategy
+profiles into a simplicial complex. Each vertex in the triangulation is labelled with the
+player exhibiting the maximum regret and the strategy responsible for it. Each iteration seeks
+to find a completely labelled simplex, where each strategy is either present in the labels
+of its vertices, or has probability :math:`0` on the simplex (due to the simplex being on
+the boundary). It finds this by following a path of simplicies, starting from a
+:math:`0`-dimensional simplex (i.e. a single vertex), and guided by the labels of their vertices.
+On this path simplicies can increase or decrease in dimension (i.e. a vertex enters or exits) or can
+pivot, where a vertex that shares its label with another is chosen and the simplex is flipped
+along the facet opposite the vertex (so that the vertex exits and another enters). When
+a completely labelled simplex is reached it seeds the starting point of the next iteration,
+which operates on a finer triangulation of the space.
+
 .. _ipa:
 
 ipa
@@ -185,6 +411,23 @@ interpreted as defining a ray in the space of games.  The profile must have
 the property that, for each player, the most frequently played strategy must
 be unique.
 
+The algorithm utilises the concept of a polymatrix game, which is a game in
+which the payoffs take the form:
+
+.. math::
+
+    u_i(\sigma) = \sum_{j\neq i} u_i^j(\sigma_i, \sigma_i)
+
+where :math:`u_i(\sigma)` denotes the payoff to player :math:`i` from the mixed
+strategy profile :math:`\sigma`, which consists of a mixed strategy :math:`\sigma_k`
+for each player :math:`k`.
+
+At each iteration, the algorithm begins with a mixed strategy profile. It then
+approximates the game as a polymatrix game around this profile and computes an
+equilibrium of the polymatrix game using the Lemke–Howson method. It then takes
+a step towards this solution, creating the starting mixed strategy profile for
+the next iteration.
+
 .. _gnm:
 
 gnm
@@ -197,6 +440,33 @@ and Wilson [GovWil03]_. This program is based on the
 implementation by Ben Blum and Christian Shelton.
 
 The algorithm takes as a parameter a mixed strategy profile.  This profile is
-interpreted as defining a ray in the space of games.  The profile must have
+interpreted as defining a ray in the space of games. Specifically, it generates
+a set of games:
+
+.. math::
+
+    \{ U_{\lambda} := U + \lambda\eta\,|\, \lambda\in\mathbb{R}\}
+
+where :math:`U` is the payoff tensor of the original game and :math:`\eta` is
+constructed from the profile. The profile must have
 the property that, for each player, the most frequently played strategy must
 be unique.
+
+Given some a game :math:`U_{\lambda}` in a ray, we have the following
+equations for an equilibrium mixed strategy profile :math:`\sigma` and
+payoff vector :math:`v`:
+
+.. math::
+
+    \sigma_{i,s}(u_{\lambda, i}(s,\sigma_{-i})-v_i) = 0 \\
+    \sum_{s}\sigma_{i,s} = 1
+
+where :math:`u_{\lambda, i}(s,\sigma_{-i})` is the payoff that player
+:math:`i` would obtain by unilaterally deviating to pure strategy :math:`s`.
+Note that these equations, for all values of :math:`\lambda`, define a one
+dimensional manifold. The algorithm starts at a high value of :math:`\lambda`
+where the solution is trivial. At each iteration it moves along the tangent
+to the curve and then modifies :math:`\eta` (hence modifying the curve)
+such that this point lies on the new curve. Occasionally, Newton's method is
+used on an iteration to correct numerical errors. Once we reach a point with
+:math:`\lambda=0` we have an equilibrium of the true game.