Artificial Intelligence

Introduction

About the course

Aims

This course will focus on algorithms and models for Artificial Intelligence. We will concentrate mainly on the decision making, rather than the learning, side of artificial intelligence. Learning is already addressed in statistics courses, as well as the machine learning course in the final year.

Philosophy

The philosophy of this course is as follows:

We give example problems.
We use theory to explain and generalise from those examples to general problems.
We describe algorithms to solve general problems.
We implement algorithms to solve the specific examples.

In general, the course will start from the simplest problems and slowly progress to the more complex ones.

Plagiarism, use of automation and external sources

In this course, the use of automated methods for writing and translating text, such generative AI models, web-based translation and code development tools is allowed under certain conditions.

The general idea for the use of advanced algorithms is the same as when you are using any other external resource, such as verbal advice, online forums, or material from books or websites.

In all cases, you are not allowed to present anybody’s work as your own work. This includes both human and machine generated output. Misrepresentation will be considered misuses in this class. So, you must cite each and every use of any external tool, reference, or personal help. Here are some examples:

Verbal advice: Cite as “A. Student: Personal Communication”
Online forums: Stackoverflow is an example. Cite as “We based this part of the work on URL where we modified the code to … ”
Books or websites: Cite normally, explaining how you used the material.
Automated translation: Specify what you used and where. Note, however, that you can write your reports in French or English.
Advanced coding aids: Specify what you used, including generative-AI. Say where you used it exactly, so we can filter out.
Advanced models for grammar and rewriting: Specify which model you used and for which parts of the text.
Advanced models for text structuring. You are not encouraged to do that: we can discuss this in class if needed. However, if you do make use of such models, then you must explain how you used them, including a copy of the intermediate output.
You must use a git repository to create your work. This way we can track changes.
If you do not stick to the above advice, then the automated AI-detection tools may provide false positives.

How to use this notebook

This notebook contains a summary of all the foundational material in the course. It is not sufficient for studying for the exam, or as a reference material.
For details, use the links to external resources.
There are some more detailed slides available for each lecture.

Books and schedule

Artificial Intelligence: Foundations of Computational Agents, 3rd Edition Artificial Intelligence: a Modern Approach, 4th Edition

Module	Topics	AI:FoCA	AI: aMA
1	- Preferences	1. AI and Agents	2.1. Agents and environments
	- Utility	1.2. Complexity	2.2 Rationality
	- States	1.3. Application domains	2.3. Environments
	- Actions	1.4. Knowledge representation	2.4. Agents
	- Beliefs	2. Architecture	2.4.1 Programs
	- Fairness	2.1. Control	2.4.2-3 Reflex agents
		2.2. Hierarchical control	2.4.4. Goals
		2.3. Moral machines	2.4.5. Utility
			2.4.6. Learning
2	Depth-First Search	3. Search	3.1 Problem-solving
	Breadth-First Search	3.1. Search in graphs	3.2. Examples.
			3.3. Best-First search
			3.4.1. Breadth-first
			3.4.3. Depth-first search
3	Heuristic Search	3.2. Uninformed search	3.5.2. A*
	A* Search	3.3. Heuristic search	3.6. Heuristic Functions

4	Dynamic Programming		3.4.2 Dijkstra
		3.4. Dynamic programming
	Branch and bound	3.5. Branch and bound
5	Constraint programming	4. Reasoning with constraints	6. CSP
	Logical reasoning	4.1. Variables and Constraints	7. Logical Agents
		4.2. CSPs and Search
		4.6. Local Search
	Deterministic planning	4.8. Optimization

6	Uncertainty	9. Reasoning with Uncertainty	12.1. Acting under uncertaint
	Aleatory/Epistemic	9.1. Probability	12.2. Basic probability notation
	Probability Theory	9.2. Independence	12.3. Inference
	Bayes Theorem		12.4. Independence
	Probabilistic inference		12.5. Bayes’s theorem
7	Expected Utility Theory	12.1 Preferences and Utility	16.1. Beliefs and Desires
		12.2 One-off decisions	16.2. utility theory
			16.3. Utility functions
			16.5. Decision networks
8	Markov Decision Processes	12.3 Sequential Decisions	17.1. Sequential decision problems
	Dynamic Programming	12.4 The value of information	17.2. Algorithms for MDPs
		12.5 Decision processes
9	Alternating Zero-Sum Games	14.1. Multi-agent framework	5.1. Game Theory
	Stochastic Zero-Sum Games	14.2. Representations of games	5.2. Zero-Sum Games
	Linear programming	14.3. Perfect information games	5.3. Alpha-Beta Search
			5.5. Stochastic Games
10	Belief networks	9.3. Belief Networks	13.1. Representing knowledge
		9.4. Probabilistic Inference	13.2. Bayesian Networks
			13.3. Exact inference in BNs

Notation

$\Reals, \Reals^d$: the real line and $d$-dimension Euclidean space
$\Simplex^d$ the $d$-dimensional simplex
$\Simplex(A)$ the set of distributions over $A$.
$\ind{x}$: indicator function (1 if $x$ is true, 0 otherwise)
$Pr$: probability
$\E$: expectation
$\pol ∈ \Pols$: policies, or algorithms.
$\mdp ∈ \MDPs$: models
$\param ∈ \Param$: parameters (i.e. models parameterised by vectors in $\Reals^d$)
$u$: utility
$c$: cost / constraints
$s ∈ S$: state
$a ∈ A$: action
$r ∈ \Reals$: reward

Project

Application project.

Application projects proposals need to contain the following:

Domain description and goals: What is the problem, in general terms, and which aspect would you try and solve in an AI framework? Make sure to cite relevant literature and course material.
Methodology: How would you formalise the problem mathematically? Which algorithms and/or models do you intend to apply at different stages of the project? Feel free to read widely about both the problem and algorithms and do cite relevant literature. Make sure to employ techniques taught in the course, e.g. logic + search or probabilities and MDPs etc.
Experiment design: How would you know that the method is working? How would you compare with existing solutions? In what context would you expect an improvement? How would you measure it? How will you test the robustness of your solution over variations in the problem instance?
Expected results: What results do you expect to obtain, and what do you think might go wrong? In what way do you expect an improvement?

Algorithmic project.

Algorithmic/theory problem and goals: What is the deficiency, in general terms, of current theory and algorithms that your method would try to improve? As an example, the goal could be reducing computational complexity, increasing data efficiency, improving robustness or applicability of a specific family of algorithms; or introducing a slightly different setting to existing ones. In other words, which is the open problem you will be addressing? Make sure to cite relevant literature to better identify the problem.
Methodology: What kind of existing algorithms, theory or technical result would you rely on? Would you be combining various existing results? What would be the most significant novelty of your methodology? Do cite relevant literature.
Experiment design (if applicable): How would you know that the method is working? How would you compare with existing solutions? In what context would you expect an improvement? How would you measure it?
Expected results: What results do you expect to obtain, and what do you think might go wrong? In what way do you expect an improvement?

Grading for projets:

Grades will be adjusted based on group size with on letter grade up/down for double/half the mean group size.

Environments: A. Complex, well described environment that captures all of the elements of the application or algorithmic problem. B. The environment is simple or lacks description. C. An adequate environment that captures the basic setting. D. Insufficient environment or description. E. Insuffcient environment and description.
Algorithms: A. Significantly novel algorithms that are well described. B. Some novelty in the algorithms, with good descriptions. C. Some novelty in the algorithms, but descriptions are lacking. D. Insufficient novelty or descriptions. E. Insufficient novelty and descriptions.
Experiments: A. Thorough experiments with ablation tests and comparisons over algorithms and environments, that are well-described. B. Somewhat incomplete experiments or descriptions. C. Sufficient experiments and descriptions. D. Insufficient experiments or descriptions. E. Insufficient experiments and descriptions.

Criteria for full marks in each part of the project are the following.

Documenting of the work in a way that enables reproduction.
Technical correctness of their analysis.
Demonstrating that they have understood the assumptions underlying their analysis.
Addressing issues of reproducibility in research.
Addressing scientific and ethical questions where applicable, and if not, clearly explain why they are not.
Consulting additional resources beyond the source material with proper citations.

The follow marking guidelines are what one would expect from students attaining each grade.

Detailed grading

A (6)

Submission of a detailed report from which one can definitely reconstruct their work without referring to their code. There should be no ambiguities in the described methodology. Well-documented code where design decisions are explained.
Extensive analysis and discussion. Technical correctness of their analysis. Nearly error-free implementation.
The report should detail what models are used and what the assumptions are behind them. The conclusions of the should include appropriate caveats. When the problem includes simple decision making, the optimality metric should be well-defined and justified. Simiarly, when well-defined optimality criteria should given for the experiment design, when necessary. The design should be (to some degree of approximation, depending on problem complexity) optimal according to this criteria.
Appropriate methods to measure reproducibility. Use of cross-validation or hold-out sets to measure performance. Use of an unbiased methodology for algorithm, model or parameter selection. Appropriate reporting of a confidence level (e.g. using bootstrapping) in their analytical results. Relevant assumptions are mentioned when required.
A clear definition of a scientific question. When dealing with data relating to humans, ethical concerns, such as privacy and/or fairness should be addressed.
The report contains some independent thinking, or includes additional resources beyond the source material with proper citations. The students go beyond their way to research material and implement methods not discussed in the course.

B (5.5)

Submission of a report from which one can plausibly reconstruct their work without referring to their code. There should be no major ambiguities in the described methodology.
Technical correctness of their analysis, with a good discussion. Possibly minor errors in the implementation.
The report should detail what models are used, as well as the optimality criteria, including for the experiment design. The conclusions of the report must contain appropriate caveats.
Use of cross-validation or hold-out sets to measure performance. Use of an unbiased methodology for algorithm, model or parameter selection.
When dealing with data relating to humans, ethical concerns such as privacy and/or fairness should be addressed. While an analysis of this issue may not be performed, there is a substantial discussion of the issue that clearly shows understanding by the student.
The report contains some independent thinking, or the students mention other methods beyond the source material, with proper citations, but do not further investigate them.

C (5)

Submission of a report from which one can partially reconstruct most of their work without referring to their code. There might be some ambiguities in parts of the described methodology.
Technical correctness of their analysis, with an adequate discussion. Some errors in a part of the implementation.
The report should detail what models are used, as well as the optimality criteria and the choice of experiment design. Analysis caveats are not included.
Either use of cross-validation or hold-out sets to measure performance, or use of an unbiased methodology for algorithm, model or parameter selection - but in a possibly inconsistent manner.
When dealing with data relating to humans, ethical issues are addressed superficially.
There is little mention of methods beyond the source material or independent thinking.

D (4.5)

Submission of a report from which one can partially reconstruct most of their work without referring to their code. There might be serious ambiguities in parts of the described methodology.
Technical correctness of their analysis with limited discussion. Possibly major errors in a part of the implementation.
The report should detail what models are used, as well as the optimality criteria. Analysis caveats are not included.
Either use of cross-validation or hold-out sets to measure performance, or use of an unbiased methodology for algorithm, model or parameter selection - but in a possibly inconsistent manner.
When dealing with data relating to humans, ethical issues are addressed superficially or not at all.
There is little mention of methods beyond the source material or independent thinking.

E (4)

Submission of a report from which one can obtain a high-level idea of their work without referring to their code. There might be serious ambiguities in all of the described methodology.
Technical correctness of their analysis with very little discussion. Possibly major errors in only a part of the implementation.
The report might mention what models are used or the optimality criteria, but not in sufficient detail and caveats are not mentioned.
Use of cross-validation or hold-out sets to simultaneously measure performance and optimise hyperparameters, but possibly in a way that introduces some bias.
When dealing with data relating to humans, ethical issues are not discussed.
There is no mention of methods beyond the source material or independent thinking.

F (<=3.5)

The report does not adequately explain their work.
There is very little discussion and major parts of the analysis are technically incorrect, or there are errors in the implementation.
The models used might be mentioned, but not any other details.
There is no effort to ensure reproducibility or robustness.
When applicable: Ethical issues are not mentioned.
There is no mention of methods beyond the source material or independent thinking.

Examination and grading

You must have sufficient grade in both the exam and the project for passing. If you get an F in either the exam or the project, the course is failed and you must repeat it.

Modules

	Given a problem, identify belief/utility/state/actions. What is the environment/agent boundary?
1	1-2 Introduction

	Agents and Environments
	What is intelligence?
	Beliefs, Utility, States, Actions, Observations
	Assignment:
2	3.1-3.5 Search, State Spaces, Graphs, Uniformed Search.
	Start and Goal states: Breadth-First, Depth-First search
	Assignment:
	1. Implement a different type of algorithm for doing the search. How would you order the nodes in your search?
	2. Formalise X as a start-goal search problem. What are the states? What are the goals? What are the actions? Implement it.
3	3.6. Informed Search, Heuristics, A*

	Cost/utility-based search and shortest path problems.
	Minimum-cost search, A*
	Assignment:
	1. Implement your own heuristics
	2. Generalise problem X to the case where you have costs
4	4.1-4.2, 4.8 Logic and Constraints

	Logical framework
	Logic and constraint satisfaction
	Assignment:
	1. Simple gridworld implementation
	2. Basic logical inference
5	9.1 Probability, Independence, Belief Networks

	Conditional probability and independence, Bayes’s theorem
	Assignment:
	1. Give a graphical model for some problem
	2. Some simple Bayes’ theorem demonstration e.g. covid tests
6	Independence, Conditional Independence
	Graphical models

	Assignment:
	- Graphical model of their project problem
7	12.1. Preferences and Utility. 12.2 Probability. Decision making.

	Decision making under uncertainty.
	One-shot decisions
	Stochastic gradient optimisation.

	Assignment: Find optimal choice of some stochastic problems with:
	1. Discrete action sets
	2. Continuous actions
8	Markov decision processes

	Value iteration

	Assignment: Extend world so that there are randomised dynamics (but full observation) with fixed policies for other agents
9	Markov games

	Backwards induction

	Assignment: Extend world so that the other agent is adversarial
10	Simultaneous games

	Linear programming

	Assignment: Find Nash equilibria

Example worlds:

Train scheduling
Hiking
Wumpus
Driving/racing

single agent problems with certainty

Uninformed search

Uniform cost search	10
Graph definitions	10
Tree example	5
Shortcut example	5
Depth-first search	10
The shortest path problem	10
Goals and DFS	10
Shortest-path DFS	10
Breadth-first search	10
Iterative deepening	10
	95

Informed search

Heuristics

$A^*$-search

Dynamic programming

Branch and Bound

Constraints

Local search	20
Constraint Satisfaction	20
Graph Colouring	20
Meeting Scheduling	20
onstraint Optimisation	20
Travelling Salesman	20
Maximum Flow	20
Logical constraints	20
Towers of Hanoi	20
	160

Infinite choices

Lipschitz search

If we know the function $f$ is Lipschitz-smooth, i.e. \[ ∃ L > 0 : |f(x) - f(y)| \leq L |x - y|, \] then we also know that for any point $z$: \[ f(z) < f(x) + L |x - z|, \qquad f(z) < f(y) + L |y - z| \]

Schubert’s Algorithm (Schubert, 1972)

\begin{algorithmic} \STATE \textbf{Input:} $L > 0$, $X$, $x_0 ∈ X$. \FOR {$t=1, \ldots, T$} \STATE $x_t = \argmax_{x ∈ X} min \cset{f(x_k) + L|x_k - x|}{k=0, \ldots, t-1}$ \ENDFOR \end{algorithmic}

Discussion

This is guaranteed to converge to the optimal solution.
If $L$ is unknown, DIRECT (Jones et al. 1993) can be used.
If $f$ is noisy, the problem becomes a continuum bandit problem.

First-order gradient methods

Gradient descent
Stochastic gradient descent

Properties

Incremental algorithms
Can converge to a local optimum

Single-variable gradient descent

Setting

Input: $f : \Reals → \Reals$
Problem: $max_x f(x)$
Derivative: $\frac{d}{dx} f(x) \defn lim_{Δ → 0} \frac{f(x + Δ) - f(x)}{Δ}$.

Algorithm

Input: $x⁽⁰⁾$, f
For $t = 1, \ldots$:
Calculate direction $g_t = \frac{d}{dx} f(x_t-1)$
Select step-size $α_t$
Update $x^(t) = x^(t-1) + α_t g_t$.

Multiple-variable gradient descent

Setting

Input: $f : \Reals^d → \Reals$, $x = (x_1, \ldots, x_d)$
Problem: $max_x f(x)$
Partial Derivative: $\frac{∂}{∂ x_i} f(x) \defn lim_{Δ → 0} \frac{f(x_1, \ldots, x_i + Δ, \ldots, x_d) - f(x)}{Δ}$.
Gradient $∇_x f(x) = \left[\frac{∂}{∂ x_1} f(x), \ldots, \frac{∂}{∂ x_i} f(x), \ldots, \frac{∂}{∂ x_d} f(x)\right]^\top$.

Algorithm

Input: $x_0$, f
For $t = 1, \ldots$:
Calculate direction $g_t = ∇_x f(x_t-1)$
Select step-size $α_t$
Update $x_t = x_t-1 + α_t g_t$.

Stochastic gradient descent

As gradient descent with errors

Calculate direction $g_t = ∇_x f(x_t-1) + ε_t$
$ε_t$ is typically zero-mean noise.

In learning from data

The gradient can be broken up into a sum of gradients: \[ f(x) = ∑_t v(x, z_t), \qquad ∇_x f(x) = ∑_t ∇_x v(x, z_t), \] $x_t = x_t-1 + α_t ∇_x v(x, z_t)$.

In Bayesian quadrature

The function is an expectation: \[ f(x) = ∫_Z v(x, z) p(z) dz. \qquad ∇_x f(x) ≈ ∑_t ∇_x v(x, z_t), \] where $z_t ∼ p(z)$ are samples from $p$.

single agent problems with uncertainty

Probability

Probability fundamentals

Probability measure $P$

Defined on a universe $Ω$
$P : Σ → [0,1]$ is a function of subsets of $Ω$.
A subset $A ⊂ Ω$ is an event and $P$ measures its likelihood.

Axioms of probability

$P(Ω) = 1$
For $A, B ⊂ Ω$, if $A ∩ B = ∅$ then $P(A ∪ B) = P(A) + P(B)$.

Marginalisation

If $A_1, \ldots, A_n ⊂ Ω$ are a partition of $Ω$ \[ P(B) = ∑_{i = 1}^n P(B ∩ A_i). \]

Conditional probability and independence

Conditional probability

The conditional probability of an event $A$ given an event $B$ is defined as \[ P(A | B) \defn \frac{P(A ∩ B)}{P(B)} \] The above definition requires $P(B)$ to exist and be positive.

Conditional probabilities as a collection of probabilities

More generally, we can define conditional probabilities as simply a collection of probability distributions: \[ \cset{P_\param(A)}{θ ∈ \Param}, \] where $\Param$ is an arbitrary set.

The theorem of Bayes

Bayes’s theorem

\[ P(A | B) = \frac{P(B | A)}{P(B)} \]

The general case

If $A_1, \ldots, A_n$ are a partition of $Ω$, meaning that they are mutually exclusive events (i.e. $A_i ∩ A_j = ∅$ for $i ≠ j$) such that one of them must be true (i.e. $\bigcup_i=1^n A_i = Ω$), then \[ P(B) = ∑_i=1^n P(B | A_i) P(A_i) \] and \[ P(A_j | B) = \frac{P(B | A_j)}{∑_i=1^n P(B | A_i) P(A_i)} \]

Independence

Independent events

$A, B$ are independent iff $P(A ∩ B) = P(A) P(B)$.

Conditional independence

$A, B$ are conditionally independent given $C$ iff $P(A ∩ B | C) = P(A | C) P(B | C)$.

Random variables and expectation

Random variables

A random variable $f : Ω → \Reals$ is a real-value function measurable with respect to the underlying probability measure $P$, and we write $f ∼ P$.

The distribution of $f$

The probability that $f$ lies in some subset $A ⊂ \Reals$ is \[ P_f(A) \defn P(\{ω ∈ Ω : f(ω) ∈ A\}). \]

Independence

Two RVs $f,g$ are independent in the same way that events are independent: \[ P(f ∈ A ∧ g ∈ B) = P(f ∈ A) P(g ∈ B) = P_f(A) P_g(B). \] In that sense, $f ∼ P_f$ and $g ∼ P_g$.

Expectation

For any real-valued random variable $f: Ω → \Reals$, the expectation with respect to a probability measure $P$ is \[ \E_P(f) = ∑_{ω ∈ Ω} f(ω) P(ω). \]

Linearity of expectations

For any RVs $x, y$: \[ \E_P(x + y) = \E_P(x) + \E_P(y) \]

Independence

If $x,y$ are independent RVs then $\E_P(xy) = \E(x)\E(y)$.

Correlation

If $x,y$ are not correlated then $\E_P(xy) = \E(x)\E(y)$.

IID (Independent and Identically Distributed) random variables

A sequence $x_t$ of r.v.s is IID if $x_t ∼ P$ $(x_1, \ldots, x_t, \ldots, x_T) ∼ P^T$.

Conditional expectation

The conditional expectation of a random variable $f: Ω → \Reals$, with respect to a probability measure $P$ conditioned on some event $B$ is simply \[ \E_P(f | B) = ∑_{ω ∈ Ω} f(ω) P(ω | B). \]

Statistical Decision Theory

Expected utility

Actions, outcomes and utility

In this setting, we obtain random outcomes that depend on our actions.

Actions $a ∈ A$
Outcomes $ω ∈ Ω$.
Probability of outcomes $P(ω \mid a)$
Utility $U : Ω → \Reals$

Expected utility

The expected utility of an action is: \[ \E_P[U \mid a] = ∑_{ω ∈ Ω} U(ω) P(ω \mid a). \]

The expected utility hypothesis

We prefer $a$ to $a’$ if and only if \[ \E_P[U \mid a] \geq \E_P[U \mid a’] \]

Supervised learning

Markov decision processes

Markov decision process

Action space $A$.
State space $S$.
Transition kernel $s_t+1 = j \mid s_t = s, a_t = a ∼ P_\mdp(j \mid s, a)$.
Reward $r_t = ρ(s_t, a_t)$ (can also be random).
Utility

\[ U_t = ∑_k=t^T r_t. \]

Value functions

The state value function

For any given MDP $\mdp$ and policy $\pol$ we define \[ V^\pol_{\mdp, t}(s) \defn \E^\pol_{\mdp, t} \left[ U_t \middle| s_t = s \right] \]

The state-action value function

\[ Q^\pol_{\mdp, t}(s, a) \defn \E^\pol_{\mdp, t} \left[ U_t \middle| s_t = s, a_t = a \right] \]

The optimal value functions

For an optimal policy $\pol^*$ \[ V^*_{\mdp, t}(s) \defn V^{\pol^*}_{\mdp, t}(s) \geq V^\pol_{\mdp, t}(s), \qquad Q^*_{\mdp, t}(s,a) \defn Q^{\pol^*}_{\mdp, t}(s,a) \geq V^\pol_{\mdp, t}(s,a) \]

The Bellman equations

State value function

\begin{align*} V^\pol_{\mdp, t}(s) & \defn \E^\pol_\mdp[U_t\mid s_t = s]
& = \E^\pol_\mdp[r_t + U_t+1\mid s_t = s] \ & = \E^\pol_\mdp[r_t \mid s_t = s] + \E^\pol_\mdp[U_t+1 \mid s_t = s]\ & = \E^\pol_\mdp[r_t \mid s_t = s] + ∑_{j ∈ S} \E^\pol_\mdp[U_t+1 \mid s_t+1 = j] Pr^\pol_\mdp(s_t+1 = j \mid s_t = s)\ & = \E^\pol_\mdp[r_t \mid s_t = s] + ∑_{j ∈ S} V^\pol_{\mdp, t+1}(j) Pr^\pol_\mdp(s_t+1 = j \mid s_t = s)\ & = \E^\pol_\mdp[r_t \mid s_t = s] + ∑_{j ∈ S} V^\pol_{\mdp, t+1}(j) ∑_{a ∈ A} P_\mdp(j \mid s, a) \pol(a_t \mid s_t). \end{align*}

State-action value function

\begin{align*} Q^\pol_{\mdp, t}(s) &= ρ(s,a) + ∑_{j ∈ S} V^\pol_{\mdp, t+1}(j) P_\mdp(j \mid s, a) \end{align*}

Optimal policies

Bellman optimality condition

The value function of the optimal policy satisfies this: \begin{align*} V^*_{\mdp, t}(s) & = max_a [ρ(s,a) + ∑_{j ∈ S} V^*_{\mdp, t+1}(j) P_\mdp(j \mid s, a)] \end{align*}

Dynamic programming

To find $V^*, Q^*$, first initialise $V^*_{\mdp, T}(s) = max_a ρ(s,a)$. Then for $t = T-1, T-2, \ldots, 1$: \begin{align*} Q^*_{\mdp, t}(s,a) &= ρ(s,a) + ∑_{j ∈ S} V^*_{\mdp, t+1}(j) P_\mdp(j \mid s, a).
V^*_{\mdp, t}(s) &= max_a Q^*_{\mdp, t}(s,a). \end{align*}

The optimal policy

The optimal policy is deterministic with: \[ a_t = \argmax_a Q^*(s_t, a) \]

Multi-player Games

Introduction

Multi-agent decision making

Two versus $n$-player games
Co-operative games
Zero-sum games
General-sum games
Stochastic games
Partial information games

Prisoner’s Dilemma

Prisoner’s Choice

Two-Player zero-sum Games

Extensive-form alternating-move games

At time $t$:
Player chooses action $a_t$, which is revealed.
Player chooses action $b_t$.
Player $a$ receives $ρ(a_t, b_t)$ and $b$ receives $-ρ(a_t, b_t)$.

The utility for each player is $U = ∑_t ρ(a_t, b_t)$.

Backwards induction for ZSG

\begin{algorithmic} \FOR {$t=T, T-1, \ldots, 1$} \STATE x \ENDFOR \end{algorithmic}

Normal-form simultaneous-move games

Player $a$ chooses action $a$ in secret.
Player $b$ chooses action $b$ in secret.
Players observe both actions
Player $a$ receives $U(a,b)$, and $b$ receives $-U(a,b)$.

Linear Programming

The linear programming problem

Linear programming is a constrained minimisation problem where the objective and the constraints are both linear. \begin{align*} min_x~ & θ^\top x
\textrm{s.t.~} & c^\top x \geq 0. \end{align*} We can have

Project plan

Define a deterministic, fully observable Wumpus world
Find optimal policies for any instance of this world.
Consider a randomised version of the Wumpus world, where

3.1. Your moves are not deterministic 3.2. The monster moves in a way relative to your position

Advanced versions

Only one of then can be done in practice 4.1. Partially-observable wumpus (where the positions are unknown) (logic-based) 4.2. Same, but with randomised observations. (probability-based) 4.3. Adversarial Wumpus (where the monster also solves its own optimisation problem)

Wumpus world project

Wumpus world

State: $s_t = (x_t, y_t, d_t, w_t)$, the x-y location of the agent, the direction, and the amount of arrows left.
Actions: $a ∈ \{U, D, L, R, S\}$ for up, down, left, right, and maybe shoot
Rewards are given for each step, for killing the Wumpus, dying, or finding the treasure

Deterministic/Stochastic Wumpus

An action/observation is always the same/is random

Observable/Unobservable Wumpus

We know where the holes, the treasure and the Wumpus is/they are unknown

Static/Dynamic/Strategic/ Wumpus

The Wumpus is stationary/moves according to a fixed policy/has goals to achieve

Deterministic, Observable Wumpus

This is the simplest setting. It is a deterministic planning problem. For this, you can

Define a way to describe the Wumpus world
Find a policy for solving the Wumpus world as given. This policy is going to be deterministic and Markov.

Of course, the optimal policy for each instance of the Wumpus problem is going to be different.

I recommend summarising the Wumpus problem in two parts: (a) A matrix $G$ where $G[x,y]$ is a number indicating what is contained in this location, (b) $x_t, y_t, d_t, w_t$ being the agent-relevant variables.

You can either use any logical planning algorithm, or an MDP algorithm with deterministic transitions for this problem.

Stochastic, Observable Wumpus

To make the environment stochastic, we can add the following extensions

(a) The Wumpus moves according to some stochastic policy. For example, the Wumpus could randomly move in a direction, so that on average it moves away from us. (b) Our actions do not always work (e.g. we may turn in the wrong direction) (c) We do not always die when we encounter a hole or the Wumpus.

For this, you can

Define a way to describe the Wumpus world
Find a policy for solving the Wumpus world as given. This policy is going to be deterministic and Markov.

Of course, the optimal policy for each instance of the Wumpus problem is going to be different.

I recommend summarising the Wumpus problem in two parts: (a) A matrix $G$ where $G[x,y]$ is a number indicating what is contained in this location, (b) $x_t, y_t, d_t, w_t$ being the agent-relevant variables.

You can either use any logical planning algorithm, or an MDP algorithm with deterministic transitions for this problem.

Deterministic, Unobservable Wumpus

This setting is significantly harder to work with. Now we have observations whenever we are near a hole or the Wumpus.

You can either: (a) Use a logical description of the world, and a SAT algorithm. (b) Use a probabilistic description with all probabilities being 0 or 1, and an MDP algorithm.

In either case, a simple idea is to summarise the knowledge of the Wumpus problem as a matrix $G$ where $G[x,y]$ indicates one of:

Empty.
Hole.
Wumpus.
Treasure.
Breeze Observed.
Stink Smelled.
Unknown.

For simplicity, you can always start with the setting where you know you are dealing with one of a small number of possible worlds. Then the problem can be split in two parts:

Inferring the set of possible worlds
Choosing the optimal policy given what you know.

Inferring the world

For each observation, you can simply eliminate all worlds incompatible with it. Then you are left with a set of possible worlds.

Choosing a policy

You can evaluate any policy in every remaining possible world. Suppose that all of them agree that some action $a$ is better than some action $a’$, then it is obvious that $a$ is better than $a’$. It is not trivial to find the optimal policy under uncertainty, but you can try.

Static, Stochastic-Observation, Unobservable Wumpus

Here we assume the Wumpus does not move, and observations are stochastic: sometimes we feel a breeze, sometimes not. We assume we know the probability of a breeze.

The first problem is to summarise what we know about the Wumpus problem. Now we can have an entry $G[x,y]$ in the matrix which is a vector of probabilities for the possible contents of the co-ordinate: (Empty, Hole, Wumpus, Treasure)

For simplicity, you can always start with the setting where you know you are dealing with one of a small number of possible worlds. Then you only need to deal with the probability of each world being the right one.

Inferring the probabilities of possible worlds
Choosing the optimal policy given what you know.

Inferring the world

For each observation, you calculate the posterior probability of all worlds, $P_t(μ)$

Choosing a policy

You can evaluate any policy in every remaining possible world. In fact, you can find the optimal policy for each one of them.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
assignments		assignments
figures		figures
src		src
.gitignore		.gitignore
README.org		README.org
background.org		background.org
background.pdf		background.pdf
constrained.org		constrained.org
decisions.org		decisions.org
decisions.pdf		decisions.pdf
decisions.tex		decisions.tex
exam.org		exam.org
exam_solutions.org		exam_solutions.org
expert.org		expert.org
games.org		games.org
games.pdf		games.pdf
graph.org		graph.org
graph.pdf		graph.pdf
inference.org		inference.org
inference.pdf		inference.pdf
informed-search.org		informed-search.org
introduction.org		introduction.org
mdps.org		mdps.org
plan.org_archive		plan.org_archive
preamble.tex		preamble.tex
uninformed-search.org		uninformed-search.org

olethrosdc/AI_BSc

Folders and files

Latest commit

History

Repository files navigation

Artificial Intelligence

Introduction

About the course

Aims

Philosophy

Plagiarism, use of automation and external sources

How to use this notebook

Books and schedule

Notation

Project

Application project.

Algorithmic project.

Grading for projets:

A (6)

B (5.5)

C (5)

D (4.5)

E (4)

F (<=3.5)

Examination and grading

Modules

single agent problems with certainty

Uninformed search

Informed search

Heuristics

$A^*$-search

Dynamic programming

Branch and Bound

Constraints

Infinite choices

Lipschitz search

Schubert’s Algorithm (Schubert, 1972)

Discussion

First-order gradient methods

Properties

Single-variable gradient descent

Setting

Algorithm

Multiple-variable gradient descent

Setting

Algorithm

Stochastic gradient descent

As gradient descent with errors

In learning from data

In Bayesian quadrature

single agent problems with uncertainty

Probability

Probability fundamentals

Probability measure $P$

Axioms of probability

Marginalisation

Conditional probability and independence

Conditional probability

Conditional probability

Conditional probabilities as a collection of probabilities

The theorem of Bayes

Bayes’s theorem

The general case

Independence

Independent events

Conditional independence

Random variables and expectation

Random variables

The distribution of $f$

Independence

Expectation

Linearity of expectations

Independence

Correlation

IID (Independent and Identically Distributed) random variables

Conditional expectation

Statistical Decision Theory

Expected utility

Actions, outcomes and utility

Expected utility

Packages