A demonstration for pricing call options using neural networks.
Consider an security valued at
Intuition: If one chooses the perfectly conservative move; making a position on a security (long or short), while hedging the position against risk with the appropriately balancing option (put or call), then one's expected profit rate should be the same as the risk-free interest rate in the market. This push and pull between the option price and stock price is illustrated by the above differential equation.
Note: This formulation makes a bunch of assumptions:
- Security price is a GRF, with constant volatility. The security trading is pure, it does not pay dividends - its price is set purely by demand-supply.
- The riskfree rate remains constant as well.
- The option is only sold at maturity (European, not American options).
The above equation is a partial differential equation (PDE). We can solve this PDE, using neural networks (PINNs), but ultimately PINNs do not offer any advantage over numerical methods for this case. They are relatively inaccurate, take a long time to train, and are riddled with boundary pathologies.
But if we consider the parametric problem statement, i.e. the parameters
There are different ways to view this problem -
- Since the parameters of the PDE are random variables, it is a case of simply a parametric stochastic differential equation, with a stochastic process as solution. We can approximate using PINNs and use MCMC-type methods to characterize the posterior.
- More appropriately and simply, the PDE can be viewed as an operator over the field function, of the form
$\mathcal{K} : \mathcal{\Psi}(\sigma, r) \rightarrow C_{\delta, r}$ , mapping the parameters to a functions in the$\mathcal{L}^2$ space.
An operator, simply put, maps a function to a function, for example
Where,
$\rightarrow \Psi^{(w_b)}i$ is the $i^{th}$ of $n$ scalar valued neural networks or the $i^{th}$ output of a stacked neural network of the form $\Psi: \mathbb{R}^2 \rightarrow \mathbb{R}^n$ with weights $w{\psi}$, also known as the trunk network.
The universal approximation theorems are only one side of the coin. The other side is the universal approximation for an algorithm that can actually find such neural networks (by finding the appropriate weights). This is the stochastic approximation method, where Robbins and Monroe posit that the minimum of a loss function
For our case we can spell out our loss function as,
Where,
- We can easily convert the integrals into expectations by making a few assumtions,
$r \sim \mathcal{U}(0, r_{max})$ ,$\sigma \sim \mathcal{U}(0, \sigma_{max})$ and so on. - And then we approximate the expectation using sampling averages.
- Robbins Monroe guarantees this converges to the required minimum as long as the learning rate satisfies the convex conditions.
The model we have used here is a simple MLP, and training is rather fast and quick:
We can now easily sample each of
This allows vast sampling, enabling MCMC type of analysis.

