Skip to content

Abumze978/Probabilistic_Artificial_Intelligence_Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProbabilisticArtificialIntelligenceProjects

This repository contains the four programming tasks handed in for the class Probabilistic Artificial Intelligence at ETH Zurich during the fall semester of 2021. Our code is contained in the file named "solution.py".

Task 1: Gaussian Process Regression

Our task was to help a city predict and audit the concentration of fine particulate matter (PM2.5) per cubic meter of air. In an initial phase, the city has collected preliminary measurements using mobile measurement stations. The goal is now to develop a pollution model that can predict the air pollution concentration in locations without measurements. This model will then be used to determine particularly polluted areas where permanent measurement stations should be deployed. For this task we had to implement a Gaussian Process Regression. The main hurdle was computational, in the sense that, the large number of samples lead to a prohibitive cost in the update of the model posterior. We had to try different approximations to make it feasible and in the end using SGD proved to be enough.

Task 2: Bayesian Neural Network trained via Bayes by Backprop

In this task we had to implement an algorithm named Bayes by backprop to train a Bayesian Neural Network. The idea is to obtain a loss function that is tractable and over which we can optimize. Indeed that's what we did in our approach: following the indications of the paper we set the loss function to be the log variational posterior minus the log prior plus the negative log likelihood evaluated at samples drawn from the variational posterior. Understanding how to correctly implement this loss function was the first hurdle we had to overcome. Also we had to choose a suitable prior, and under the suggestions contained in the paper, we implemented our prior to be a mixture of two Gaussians, for both the weights and the biases. Finally the last challenge was to fine tune the hyperparameters. After some trial and error we obtained a satisfactory result.

Task 3: Bayesian Optimization

Bayesian Optimization (BO) is a powerful framework for finding optimums while using very few function evaluations. However, in many real-world applications, the search space is often subject to constraints limiting the feasible domain. When evaluating the feasibility of a candidate solution becomes expensive, modeling the feasible domain jointly with the objective function becomes crucial. To this effect, in this task, we were asked to extend what we've seen in the lectures on Bayesian Optimization: we had to find the minimum of an objective function subject to a constraint, where also the constrained was unknown. The main challenge consisted in first defining and secondly implementing a suitable acquisition function (i.e. the fucntion that tells you which point you should evaluate next). Following the suggestion of Gelbart et al. we chose as our acquisition function a variation of the expected improvement (EI) acquisition function. In particular, at every point x, we multiplied the value of EI(x) times the probability that the constrained was satisfied at that very point x. To model both the constraint function and the objective function we leveraged gaussian processes with the suggested parameters. Finally to make our algorithm less sensitive to bad initialization, for our first three recommendations we randomly chose three points in the domain.

Task 4: Actor Critic Reinforcement Learning

The task was to implement a reinforcement learning algorithm that, interacting with the simulated environment, allowed a lunar lander to safely land in a predefined position. The task was to explore three ways of doing it. Starting from a naïve implementation of policy gradients where the policy was parameterized by a neural network, at each step something more complicated was added. At the beginning (TODOs 1-3) we implemented a vanilla version of policy gradient. Subsequently we implemented the reward-to-go variation (TODO 4) where, when calculating the policy gradient we only considered the effect that the current action has on the current and future states, ignoring the reward of the past ones. This allowed us to have a better result and achieve lower variance in our bootstrapping estimate. Finally, we implemented the Generalized Advantage Estimation (TODOs 5-8) as explained in the suggested paper, where we changed the policy update function exploiting the advantage function as a baseline, which resulted in further reduced variance and in the end yielded a satisfactory result.

About

This repository contains the four programming tasks handed in for the class Probabilistic Artificial Intelligence at ETH Zurich in the fall semester of 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages