Skip to content

olethrosdc/AI_BSc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

222 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artificial Intelligence

Introduction

About the course

Aims

This course will focus on algorithms and models for Artificial Intelligence. We will concentrate mainly on the decision making, rather than the learning, side of artificial intelligence. Learning is already addressed in statistics courses, as well as the machine learning course in the final year.

Philosophy

The philosophy of this course is as follows:

  • We give example problems.
  • We use theory to explain and generalise from those examples to general problems.
  • We describe algorithms to solve general problems.
  • We implement algorithms to solve the specific examples.

In general, the course will start from the simplest problems and slowly progress to the more complex ones.

Plagiarism, use of automation and external sources

In this course, the use of automated methods for writing and translating text, such generative AI models, web-based translation and code development tools is allowed under certain conditions.

The general idea for the use of advanced algorithms is the same as when you are using any other external resource, such as verbal advice, online forums, or material from books or websites.

In all cases, you are not allowed to present anybody’s work as your own work. This includes both human and machine generated output. Misrepresentation will be considered misuses in this class. So, you must cite each and every use of any external tool, reference, or personal help. Here are some examples:

  1. Verbal advice: Cite as “A. Student: Personal Communication”
  2. Online forums: Stackoverflow is an example. Cite as “We based this part of the work on URL where we modified the code to … ”
  3. Books or websites: Cite normally, explaining how you used the material.
  4. Automated translation: Specify what you used and where. Note, however, that you can write your reports in French or English.
  5. Advanced coding aids: Specify what you used, including generative-AI. Say where you used it exactly, so we can filter out.
  6. Advanced models for grammar and rewriting: Specify which model you used and for which parts of the text.
  7. Advanced models for text structuring. You are not encouraged to do that: we can discuss this in class if needed. However, if you do make use of such models, then you must explain how you used them, including a copy of the intermediate output.
  8. You must use a git repository to create your work. This way we can track changes.
  9. If you do not stick to the above advice, then the automated AI-detection tools may provide false positives.

How to use this notebook

  • This notebook contains a summary of all the foundational material in the course. It is not sufficient for studying for the exam, or as a reference material.
  • For details, use the links to external resources.
  • There are some more detailed slides available for each lecture.

Books and schedule

Artificial Intelligence: Foundations of Computational Agents, 3rd Edition Artificial Intelligence: a Modern Approach, 4th Edition

ModuleTopicsAI:FoCAAI: aMA
1- Preferences1. AI and Agents2.1. Agents and environments
- Utility1.2. Complexity2.2 Rationality
- States1.3. Application domains2.3. Environments
- Actions1.4. Knowledge representation2.4. Agents
- Beliefs2. Architecture2.4.1 Programs
- Fairness2.1. Control2.4.2-3 Reflex agents
2.2. Hierarchical control2.4.4. Goals
2.3. Moral machines2.4.5. Utility
2.4.6. Learning
2Depth-First Search3. Search3.1 Problem-solving
Breadth-First Search3.1. Search in graphs3.2. Examples.
3.3. Best-First search
3.4.1. Breadth-first
3.4.3. Depth-first search
3Heuristic Search3.2. Uninformed search3.5.2. A*
A* Search3.3. Heuristic search3.6. Heuristic Functions
4Dynamic Programming3.4.2 Dijkstra
3.4. Dynamic programming
Branch and bound3.5. Branch and bound
5Constraint programming4. Reasoning with constraints6. CSP
Logical reasoning4.1. Variables and Constraints7. Logical Agents
4.2. CSPs and Search
4.6. Local Search
Deterministic planning4.8. Optimization
6Uncertainty9. Reasoning with Uncertainty12.1. Acting under uncertaint
Aleatory/Epistemic9.1. Probability12.2. Basic probability notation
Probability Theory9.2. Independence12.3. Inference
Bayes Theorem12.4. Independence
Probabilistic inference12.5. Bayes’s theorem
7Expected Utility Theory12.1 Preferences and Utility16.1. Beliefs and Desires
12.2 One-off decisions16.2. utility theory
16.3. Utility functions
16.5. Decision networks
8Markov Decision Processes12.3 Sequential Decisions17.1. Sequential decision problems
Dynamic Programming12.4 The value of information17.2. Algorithms for MDPs
12.5 Decision processes
9Alternating Zero-Sum Games14.1. Multi-agent framework5.1. Game Theory
Stochastic Zero-Sum Games14.2. Representations of games5.2. Zero-Sum Games
Linear programming14.3. Perfect information games5.3. Alpha-Beta Search
5.5. Stochastic Games
10Belief networks9.3. Belief Networks13.1. Representing knowledge
9.4. Probabilistic Inference13.2. Bayesian Networks
13.3. Exact inference in BNs

Notation

  • $\Reals, \Reals^d$: the real line and $d$-dimension Euclidean space
  • $\Simplex^d$ the $d$-dimensional simplex
  • $\Simplex(A)$ the set of distributions over $A$.
  • $\ind{x}$: indicator function (1 if $x$ is true, 0 otherwise)
  • $Pr$: probability
  • $\E$: expectation
  • $\pol ∈ \Pols$: policies, or algorithms.
  • $\mdp ∈ \MDPs$: models
  • $\param ∈ \Param$: parameters (i.e. models parameterised by vectors in $\Reals^d$)
  • $u$: utility
  • $c$: cost / constraints
  • $s ∈ S$: state
  • $a ∈ A$: action
  • $r ∈ \Reals$: reward

Project

Application project.

Application projects proposals need to contain the following:

  • Domain description and goals: What is the problem, in general terms, and which aspect would you try and solve in an AI framework? Make sure to cite relevant literature and course material.
  • Methodology: How would you formalise the problem mathematically? Which algorithms and/or models do you intend to apply at different stages of the project? Feel free to read widely about both the problem and algorithms and do cite relevant literature. Make sure to employ techniques taught in the course, e.g. logic + search or probabilities and MDPs etc.
  • Experiment design: How would you know that the method is working? How would you compare with existing solutions? In what context would you expect an improvement? How would you measure it? How will you test the robustness of your solution over variations in the problem instance?
  • Expected results: What results do you expect to obtain, and what do you think might go wrong? In what way do you expect an improvement?

Algorithmic project.

  • Algorithmic/theory problem and goals: What is the deficiency, in general terms, of current theory and algorithms that your method would try to improve? As an example, the goal could be reducing computational complexity, increasing data efficiency, improving robustness or applicability of a specific family of algorithms; or introducing a slightly different setting to existing ones. In other words, which is the open problem you will be addressing? Make sure to cite relevant literature to better identify the problem.
  • Methodology: What kind of existing algorithms, theory or technical result would you rely on? Would you be combining various existing results? What would be the most significant novelty of your methodology? Do cite relevant literature.
  • Experiment design (if applicable): How would you know that the method is working? How would you compare with existing solutions? In what context would you expect an improvement? How would you measure it?
  • Expected results: What results do you expect to obtain, and what do you think might go wrong? In what way do you expect an improvement?

Grading for projets:

Grades will be adjusted based on group size with on letter grade up/down for double/half the mean group size.

  • Environments: A. Complex, well described environment that captures all of the elements of the application or algorithmic problem. B. The environment is simple or lacks description. C. An adequate environment that captures the basic setting. D. Insufficient environment or description. E. Insuffcient environment and description.
  • Algorithms: A. Significantly novel algorithms that are well described. B. Some novelty in the algorithms, with good descriptions. C. Some novelty in the algorithms, but descriptions are lacking. D. Insufficient novelty or descriptions. E. Insufficient novelty and descriptions.
  • Experiments: A. Thorough experiments with ablation tests and comparisons over algorithms and environments, that are well-described. B. Somewhat incomplete experiments or descriptions. C. Sufficient experiments and descriptions. D. Insufficient experiments or descriptions. E. Insufficient experiments and descriptions.

Criteria for full marks in each part of the project are the following.

  1. Documenting of the work in a way that enables reproduction.
  2. Technical correctness of their analysis.
  3. Demonstrating that they have understood the assumptions underlying their analysis.
  4. Addressing issues of reproducibility in research.
  5. Addressing scientific and ethical questions where applicable, and if not, clearly explain why they are not.
  6. Consulting additional resources beyond the source material with proper citations.

The follow marking guidelines are what one would expect from students attaining each grade.

Detailed grading

A (6)

  1. Submission of a detailed report from which one can definitely reconstruct their work without referring to their code. There should be no ambiguities in the described methodology. Well-documented code where design decisions are explained.
  2. Extensive analysis and discussion. Technical correctness of their analysis. Nearly error-free implementation.
  3. The report should detail what models are used and what the assumptions are behind them. The conclusions of the should include appropriate caveats. When the problem includes simple decision making, the optimality metric should be well-defined and justified. Simiarly, when well-defined optimality criteria should given for the experiment design, when necessary. The design should be (to some degree of approximation, depending on problem complexity) optimal according to this criteria.
  4. Appropriate methods to measure reproducibility. Use of cross-validation or hold-out sets to measure performance. Use of an unbiased methodology for algorithm, model or parameter selection. Appropriate reporting of a confidence level (e.g. using bootstrapping) in their analytical results. Relevant assumptions are mentioned when required.
  5. A clear definition of a scientific question. When dealing with data relating to humans, ethical concerns, such as privacy and/or fairness should be addressed.
  6. The report contains some independent thinking, or includes additional resources beyond the source material with proper citations. The students go beyond their way to research material and implement methods not discussed in the course.

B (5.5)

  1. Submission of a report from which one can plausibly reconstruct their work without referring to their code. There should be no major ambiguities in the described methodology.
  2. Technical correctness of their analysis, with a good discussion. Possibly minor errors in the implementation.
  3. The report should detail what models are used, as well as the optimality criteria, including for the experiment design. The conclusions of the report must contain appropriate caveats.
  4. Use of cross-validation or hold-out sets to measure performance. Use of an unbiased methodology for algorithm, model or parameter selection.
  5. When dealing with data relating to humans, ethical concerns such as privacy and/or fairness should be addressed. While an analysis of this issue may not be performed, there is a substantial discussion of the issue that clearly shows understanding by the student.
  6. The report contains some independent thinking, or the students mention other methods beyond the source material, with proper citations, but do not further investigate them.

C (5)

  1. Submission of a report from which one can partially reconstruct most of their work without referring to their code. There might be some ambiguities in parts of the described methodology.
  2. Technical correctness of their analysis, with an adequate discussion. Some errors in a part of the implementation.
  3. The report should detail what models are used, as well as the optimality criteria and the choice of experiment design. Analysis caveats are not included.
  4. Either use of cross-validation or hold-out sets to measure performance, or use of an unbiased methodology for algorithm, model or parameter selection - but in a possibly inconsistent manner.
  5. When dealing with data relating to humans, ethical issues are addressed superficially.
  6. There is little mention of methods beyond the source material or independent thinking.

D (4.5)

  1. Submission of a report from which one can partially reconstruct most of their work without referring to their code. There might be serious ambiguities in parts of the described methodology.
  2. Technical correctness of their analysis with limited discussion. Possibly major errors in a part of the implementation.
  3. The report should detail what models are used, as well as the optimality criteria. Analysis caveats are not included.
  4. Either use of cross-validation or hold-out sets to measure performance, or use of an unbiased methodology for algorithm, model or parameter selection - but in a possibly inconsistent manner.
  5. When dealing with data relating to humans, ethical issues are addressed superficially or not at all.
  6. There is little mention of methods beyond the source material or independent thinking.

E (4)

  1. Submission of a report from which one can obtain a high-level idea of their work without referring to their code. There might be serious ambiguities in all of the described methodology.
  2. Technical correctness of their analysis with very little discussion. Possibly major errors in only a part of the implementation.
  3. The report might mention what models are used or the optimality criteria, but not in sufficient detail and caveats are not mentioned.
  4. Use of cross-validation or hold-out sets to simultaneously measure performance and optimise hyperparameters, but possibly in a way that introduces some bias.
  5. When dealing with data relating to humans, ethical issues are not discussed.
  6. There is no mention of methods beyond the source material or independent thinking.

F (<=3.5)

  1. The report does not adequately explain their work.
  2. There is very little discussion and major parts of the analysis are technically incorrect, or there are errors in the implementation.
  3. The models used might be mentioned, but not any other details.
  4. There is no effort to ensure reproducibility or robustness.
  5. When applicable: Ethical issues are not mentioned.
  6. There is no mention of methods beyond the source material or independent thinking.

Examination and grading

You must have sufficient grade in both the exam and the project for passing. If you get an F in either the exam or the project, the course is failed and you must repeat it.

Modules

11-2 Introduction
Agents and Environments
What is intelligence?
Beliefs, Utility, States, Actions, Observations
Assignment:
Given a problem, identify belief/utility/state/actions. What is the environment/agent boundary?
23.1-3.5 Search, State Spaces, Graphs, Uniformed Search.
Start and Goal states: Breadth-First, Depth-First search
Assignment:
1. Implement a different type of algorithm for doing the search. How would you order the nodes in your search?
2. Formalise X as a start-goal search problem. What are the states? What are the goals? What are the actions? Implement it.
33.6. Informed Search, Heuristics, A*
Cost/utility-based search and shortest path problems.
Minimum-cost search, A*
Assignment:
1. Implement your own heuristics
2. Generalise problem X to the case where you have costs
44.1-4.2, 4.8 Logic and Constraints
Logical framework
Logic and constraint satisfaction
Assignment:
1. Simple gridworld implementation
2. Basic logical inference
59.1 Probability, Independence, Belief Networks
Conditional probability and independence, Bayes’s theorem
Assignment:
1. Give a graphical model for some problem
2. Some simple Bayes’ theorem demonstration e.g. covid tests
6Independence, Conditional Independence
Graphical models
Assignment:
- Graphical model of their project problem
712.1. Preferences and Utility. 12.2 Probability. Decision making.
Decision making under uncertainty.
One-shot decisions
Stochastic gradient optimisation.
Assignment: Find optimal choice of some stochastic problems with:
1. Discrete action sets
2. Continuous actions
8Markov decision processes
Value iteration
Assignment: Extend world so that there are randomised dynamics (but full observation) with fixed policies for other agents
9Markov games
Backwards induction
Assignment: Extend world so that the other agent is adversarial
10Simultaneous games
Linear programming
Assignment: Find Nash equilibria

Example worlds:

  1. Train scheduling
  2. Hiking
  3. Wumpus
  4. Driving/racing

single agent problems with certainty

Uninformed search

Graph definitions10
Tree example5
Shortcut example5
Depth-first search10
The shortest path problem10
Goals and DFS10
Shortest-path DFS10
Breadth-first search10
Iterative deepening10
Uniform cost search10
95

Informed search

Heuristics

$A^*$-search

Dynamic programming

Branch and Bound

Constraints

Local search20
Constraint Satisfaction20
Graph Colouring20
Meeting Scheduling20
onstraint Optimisation20
Travelling Salesman20
Maximum Flow20
Logical constraints20
Towers of Hanoi20
160

Infinite choices

Lipschitz search

If we know the function $f$ is Lipschitz-smooth, i.e. \[ ∃ L > 0 : |f(x) - f(y)| \leq L |x - y|, \] then we also know that for any point $z$: \[ f(z) < f(x) + L |x - z|, \qquad f(z) < f(y) + L |y - z| \]

Schubert’s Algorithm (Schubert, 1972)

\begin{algorithmic} \STATE \textbf{Input:} $L &gt; 0$, $X$, $x_0 ∈ X$. \FOR {$t=1, \ldots, T$} \STATE $xt = \argmaxx ∈ X min \cset{f(x_k) + L|x_k - x|}{k=0, \ldots, t-1}$ \ENDFOR \end{algorithmic}

Discussion

  • This is guaranteed to converge to the optimal solution.
  • If $L$ is unknown, DIRECT (Jones et al. 1993) can be used.
  • If $f$ is noisy, the problem becomes a continuum bandit problem.

First-order gradient methods

  • Gradient descent
  • Stochastic gradient descent

Properties

  • Incremental algorithms
  • Can converge to a local optimum

Single-variable gradient descent

Setting

  • Input: $f : \Reals → \Reals$
  • Problem: $max_x f(x)$
  • Derivative: $\frac{d}{dx} f(x) \defn limΔ → 0 \frac{f(x + Δ) - f(x)}{Δ}$.

Algorithm

  1. Input: $x(0)$, f
  2. For $t = 1, \ldots$:
  3. Calculate direction $g_t = \frac{d}{dx} f(xt-1)$
  4. Select step-size $α_t$
  5. Update $x(t) = x(t-1) + α_t g_t$.

Multiple-variable gradient descent

Setting

  • Input: $f : \Reals^d → \Reals$, $x = (x_1, \ldots, x_d)$
  • Problem: $max_x f(x)$
  • Partial Derivative: $\frac{∂}{∂ x_i} f(x) \defn limΔ → 0 \frac{f(x_1, \ldots, x_i + Δ, \ldots, x_d) - f(x)}{Δ}$.
  • Gradient $∇_x f(x) = \left[\frac{∂}{∂ x_1} f(x), \ldots, \frac{∂}{∂ x_i} f(x), \ldots, \frac{∂}{∂ x_d} f(x)\right]^\top$.

Algorithm

  1. Input: $x_0$, f
  2. For $t = 1, \ldots$:
  3. Calculate direction $g_t = ∇_x f(xt-1)$
  4. Select step-size $α_t$
  5. Update $xt = xt-1 + α_t g_t$.

Stochastic gradient descent

As gradient descent with errors

  • Calculate direction $g_t = ∇_x f(xt-1) + ε_t$
  • $ε_t$ is typically zero-mean noise.

In learning from data

The gradient can be broken up into a sum of gradients: \[ f(x) = ∑_t v(x, z_t), \qquad ∇_x f(x) = ∑_t ∇_x v(x, z_t), \] $x_t = xt-1 + α_t ∇_x v(x, z_t)$.

In Bayesian quadrature

The function is an expectation: \[ f(x) = ∫_Z v(x, z) p(z) dz. \qquad ∇_x f(x) ≈ ∑_t ∇_x v(x, z_t), \] where $z_t ∼ p(z)$ are samples from $p$.

single agent problems with uncertainty

Probability

Probability fundamentals

Probability measure $P$

  • Defined on a universe $Ω$
  • $P : Σ → [0,1]$ is a function of subsets of $Ω$.
  • A subset $A ⊂ Ω$ is an event and $P$ measures its likelihood.

Axioms of probability

  • $P(Ω) = 1$
  • For $A, B ⊂ Ω$, if $A ∩ B = ∅$ then $P(A ∪ B) = P(A) + P(B)$.

Marginalisation

If $A_1, \ldots, A_n ⊂ Ω$ are a partition of $Ω$ \[ P(B) = ∑i = 1^n P(B ∩ A_i). \]

Conditional probability and independence

Conditional probability

Conditional probability

The conditional probability of an event $A$ given an event $B$ is defined as \[ P(A | B) \defn \frac{P(A ∩ B)}{P(B)} \] The above definition requires $P(B)$ to exist and be positive.

Conditional probabilities as a collection of probabilities

More generally, we can define conditional probabilities as simply a collection of probability distributions: \[ \cset{P_\param(A)}{θ ∈ \Param}, \] where $\Param$ is an arbitrary set.

The theorem of Bayes

Bayes’s theorem

\[ P(A | B) = \frac{P(B | A)}{P(B)} \]

The general case

If $A_1, \ldots, A_n$ are a partition of $Ω$, meaning that they are mutually exclusive events (i.e. $A_i ∩ A_j = ∅$ for $i ≠ j$) such that one of them must be true (i.e. $\bigcupi=1^n A_i = Ω$), then \[ P(B) = ∑i=1^n P(B | A_i) P(A_i) \] and \[ P(A_j | B) = \frac{P(B | A_j)}{∑i=1^n P(B | A_i) P(A_i)} \]

Independence

Independent events

$A, B$ are independent iff $P(A ∩ B) = P(A) P(B)$.

Conditional independence

$A, B$ are conditionally independent given $C$ iff $P(A ∩ B | C) = P(A | C) P(B | C)$.

Random variables and expectation

Random variables

A random variable $f : Ω → \Reals$ is a real-value function measurable with respect to the underlying probability measure $P$, and we write $f ∼ P$.

The distribution of $f$

The probability that $f$ lies in some subset $A ⊂ \Reals$ is \[ P_f(A) \defn P(\{ω ∈ Ω : f(ω) ∈ A\}). \]

Independence

Two RVs $f,g$ are independent in the same way that events are independent: \[ P(f ∈ A ∧ g ∈ B) = P(f ∈ A) P(g ∈ B) = P_f(A) P_g(B). \] In that sense, $f ∼ P_f$ and $g ∼ P_g$.

Expectation

For any real-valued random variable $f: Ω → \Reals$, the expectation with respect to a probability measure $P$ is \[ \E_P(f) = ∑ω ∈ Ω f(ω) P(ω). \]

Linearity of expectations

For any RVs $x, y$: \[ \E_P(x + y) = \E_P(x) + \E_P(y) \]

Independence

If $x,y$ are independent RVs then $\E_P(xy) = \E(x)\E(y)$.

Correlation

If $x,y$ are not correlated then $\E_P(xy) = \E(x)\E(y)$.

IID (Independent and Identically Distributed) random variables

A sequence $x_t$ of r.v.s is IID if $x_t ∼ P$ $(x_1, \ldots, x_t, \ldots, x_T) ∼ P^T$.

Conditional expectation

The conditional expectation of a random variable $f: Ω → \Reals$, with respect to a probability measure $P$ conditioned on some event $B$ is simply \[ \E_P(f | B) = ∑ω ∈ Ω f(ω) P(ω | B). \]

Statistical Decision Theory

Expected utility

Actions, outcomes and utility

In this setting, we obtain random outcomes that depend on our actions.

  • Actions $a ∈ A$
  • Outcomes $ω ∈ Ω$.
  • Probability of outcomes $P(ω \mid a)$
  • Utility $U : Ω → \Reals$

Expected utility

The expected utility of an action is: \[ \E_P[U \mid a] = ∑ω ∈ Ω U(ω) P(ω \mid a). \]

The expected utility hypothesis

We prefer $a$ to $a’$ if and only if \[ \E_P[U \mid a] \geq \E_P[U \mid a’] \]

Supervised learning

Supervised learning

Markov decision processes

Markov decision process

  • Action space $A$.
  • State space $S$.
  • Transition kernel $st+1 = j \mid s_t = s, a_t = a ∼ P_\mdp(j \mid s, a)$.
  • Reward $r_t = ρ(s_t, a_t)$ (can also be random).
  • Utility

\[ U_t = ∑k=t^T r_t. \]

Value functions

The state value function

For any given MDP $\mdp$ and policy $\pol$ we define \[ V^\pol\mdp, t(s) \defn \E^\pol\mdp, t \left[ U_t \middle| s_t = s \right] \]

The state-action value function

\[ Q^\pol\mdp, t(s, a) \defn \E^\pol\mdp, t \left[ U_t \middle| s_t = s, a_t = a \right] \]

The optimal value functions

For an optimal policy $\pol^*$ \[ V^*\mdp, t(s) \defn V\pol^*\mdp, t(s) \geq V^\pol\mdp, t(s), \qquad Q^*\mdp, t(s,a) \defn Q\pol^*\mdp, t(s,a) \geq V^\pol\mdp, t(s,a) \]

The Bellman equations

State value function

\begin{align*} V^\pol\mdp, t(s) & \defn \E^\pol\mdp[Ut\mid s_t = s]
& = \E^\pol\mdp[r_t + Ut+1\mid s_t = s] \ & = \E^\pol\mdp[r_t \mid s_t = s] + \E^\pol\mdp[Ut+1 \mid s_t = s]\ & = \E^\pol\mdp[r_t \mid s_t = s] + ∑j ∈ S \E^\pol\mdp[Ut+1 \mid st+1 = j] Pr^\pol_\mdp(st+1 = j \mid s_t = s)\ & = \E^\pol\mdp[r_t \mid s_t = s] + ∑j ∈ S V^\pol\mdp, t+1(j) Pr^\pol_\mdp(st+1 = j \mid s_t = s)\ & = \E^\pol\mdp[r_t \mid s_t = s] + ∑j ∈ S V^\pol\mdp, t+1(j) ∑a ∈ A P_\mdp(j \mid s, a) \pol(a_t \mid s_t). \end{align*}

State-action value function

\begin{align*} Q^\pol\mdp, t(s) &= ρ(s,a) + ∑j ∈ S V^\pol\mdp, t+1(j) P_\mdp(j \mid s, a) \end{align*}

Optimal policies

Bellman optimality condition

The value function of the optimal policy satisfies this: \begin{align*} V^*\mdp, t(s) & = maxa [ρ(s,a) + ∑j ∈ S V^*\mdp, t+1(j) P_\mdp(j \mid s, a)] \end{align*}

Dynamic programming

To find $V^*, Q^*$, first initialise $V^*\mdp, T(s) = max_a ρ(s,a)$. Then for $t = T-1, T-2, \ldots, 1$: \begin{align*} Q^*\mdp, t(s,a) &= ρ(s,a) + ∑j ∈ S V^*\mdp, t+1(j) P_\mdp(j \mid s, a).
V^*\mdp, t(s) &= max_a Q^*\mdp, t(s,a). \end{align*}

The optimal policy

The optimal policy is deterministic with: \[ a_t = \argmax_a Q^*(s_t, a) \]

Multi-player Games

Introduction

Multi-agent decision making

  • Two versus $n$-player games
  • Co-operative games
  • Zero-sum games
  • General-sum games
  • Stochastic games
  • Partial information games

Prisoner’s Dilemma

Prisoner’s Choice

Two-Player zero-sum Games

Extensive-form alternating-move games

  • At time $t$:
  • Player chooses action $a_t$, which is revealed.
  • Player chooses action $b_t$.
  • Player $a$ receives $ρ(a_t, b_t)$ and $b$ receives $-ρ(a_t, b_t)$.

The utility for each player is $U = ∑_t ρ(a_t, b_t)$.

Backwards induction for ZSG

\begin{algorithmic} \FOR {$t=T, T-1, \ldots, 1$} \STATE x \ENDFOR \end{algorithmic}

Normal-form simultaneous-move games

  • Player $a$ chooses action $a$ in secret.
  • Player $b$ chooses action $b$ in secret.
  • Players observe both actions
  • Player $a$ receives $U(a,b)$, and $b$ receives $-U(a,b)$.

Linear Programming

The linear programming problem

Linear programming is a constrained minimisation problem where the objective and the constraints are both linear. \begin{align*} min_x~ & θ^\top x
\textrm{s.t.~} & c^\top x \geq 0. \end{align*} We can have

Project plan

  1. Define a deterministic, fully observable Wumpus world
  2. Find optimal policies for any instance of this world.
  3. Consider a randomised version of the Wumpus world, where

3.1. Your moves are not deterministic 3.2. The monster moves in a way relative to your position

  1. Advanced versions

Only one of then can be done in practice 4.1. Partially-observable wumpus (where the positions are unknown) (logic-based) 4.2. Same, but with randomised observations. (probability-based) 4.3. Adversarial Wumpus (where the monster also solves its own optimisation problem)

Wumpus world project

Wumpus world

  • State: $s_t = (x_t, y_t, d_t, w_t)$, the x-y location of the agent, the direction, and the amount of arrows left.
  • Actions: $a ∈ \{U, D, L, R, S\}$ for up, down, left, right, and maybe shoot
  • Rewards are given for each step, for killing the Wumpus, dying, or finding the treasure

Deterministic/Stochastic Wumpus

  • An action/observation is always the same/is random

Observable/Unobservable Wumpus

  • We know where the holes, the treasure and the Wumpus is/they are unknown

Static/Dynamic/Strategic/ Wumpus

  • The Wumpus is stationary/moves according to a fixed policy/has goals to achieve

Deterministic, Observable Wumpus

This is the simplest setting. It is a deterministic planning problem. For this, you can

  1. Define a way to describe the Wumpus world
  2. Find a policy for solving the Wumpus world as given. This policy is going to be deterministic and Markov.

Of course, the optimal policy for each instance of the Wumpus problem is going to be different.

I recommend summarising the Wumpus problem in two parts: (a) A matrix $G$ where $G[x,y]$ is a number indicating what is contained in this location, (b) $x_t, y_t, d_t, w_t$ being the agent-relevant variables.

You can either use any logical planning algorithm, or an MDP algorithm with deterministic transitions for this problem.

Stochastic, Observable Wumpus

To make the environment stochastic, we can add the following extensions

(a) The Wumpus moves according to some stochastic policy. For example, the Wumpus could randomly move in a direction, so that on average it moves away from us. (b) Our actions do not always work (e.g. we may turn in the wrong direction) (c) We do not always die when we encounter a hole or the Wumpus.

For this, you can

  1. Define a way to describe the Wumpus world
  2. Find a policy for solving the Wumpus world as given. This policy is going to be deterministic and Markov.

Of course, the optimal policy for each instance of the Wumpus problem is going to be different.

I recommend summarising the Wumpus problem in two parts: (a) A matrix $G$ where $G[x,y]$ is a number indicating what is contained in this location, (b) $x_t, y_t, d_t, w_t$ being the agent-relevant variables.

You can either use any logical planning algorithm, or an MDP algorithm with deterministic transitions for this problem.

Deterministic, Unobservable Wumpus

This setting is significantly harder to work with. Now we have observations whenever we are near a hole or the Wumpus.

You can either: (a) Use a logical description of the world, and a SAT algorithm. (b) Use a probabilistic description with all probabilities being 0 or 1, and an MDP algorithm.

In either case, a simple idea is to summarise the knowledge of the Wumpus problem as a matrix $G$ where $G[x,y]$ indicates one of:

  • Empty.
  • Hole.
  • Wumpus.
  • Treasure.
  • Breeze Observed.
  • Stink Smelled.
  • Unknown.

For simplicity, you can always start with the setting where you know you are dealing with one of a small number of possible worlds. Then the problem can be split in two parts:

  1. Inferring the set of possible worlds
  2. Choosing the optimal policy given what you know.

Inferring the world

For each observation, you can simply eliminate all worlds incompatible with it. Then you are left with a set of possible worlds.

Choosing a policy

You can evaluate any policy in every remaining possible world. Suppose that all of them agree that some action $a$ is better than some action $a’$, then it is obvious that $a$ is better than $a’$. It is not trivial to find the optimal policy under uncertainty, but you can try.

Static, Stochastic-Observation, Unobservable Wumpus

Here we assume the Wumpus does not move, and observations are stochastic: sometimes we feel a breeze, sometimes not. We assume we know the probability of a breeze.

The first problem is to summarise what we know about the Wumpus problem. Now we can have an entry $G[x,y]$ in the matrix which is a vector of probabilities for the possible contents of the co-ordinate: (Empty, Hole, Wumpus, Treasure)

For simplicity, you can always start with the setting where you know you are dealing with one of a small number of possible worlds. Then you only need to deal with the probability of each world being the right one.

  1. Inferring the probabilities of possible worlds
  2. Choosing the optimal policy given what you know.

Inferring the world

For each observation, you calculate the posterior probability of all worlds, $P_t(μ)$

Choosing a policy

You can evaluate any policy in every remaining possible world. In fact, you can find the optimal policy for each one of them.

About

AI Undergraduate Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •