Simple neural networks with constant number N of neurons per each layer (when the
fixed N is actually part of the type of matrices and vectors) or even differing
between layers.
The type Matrix[A, M, N] specifies that the matrix contains elements of type A,
has M number of rows and N number of columns. A must have a Ring typeclass
instance, as defined in spire.
The type Vector[A, N] specifies that the vectors contains elements of type A
and has N number of rows. A must have a Ring typeclass instance, as defined
in spire.
Each operation performed with matrices and vectors must type check, meaning one
can multiply a Matrix[A, M, N] and a Matrix[A, N, P], yielding a Matrix[A, M, P],
but one cannot multiply the former with a Matrix[B, N, P] or a Matrix[A, P, Q],
because A differs from B, respectively, N differs from P.
A neural network Network[N] is composed of M layers each of N neurons, while
N+1 is the number of weights per neuron plus 1 (for the bias). For arbitrary
precision arithmetic using spire.math.Real, for instance, each neuron of type
Neuron[N] has a Vector[Real, N] of weights, a Real bias and an Activation
function, where the latter is a Scala3 enum. This means that the Activation
functions may differ with each neuron. The definition and the derivative of an
Activation function must be known and given.
For a neural network, it is implemented training with backpropagation, as well as the straighter prediction, once the neurons' (biases and) weights have been trained.
For examples, see also this blog
or this java-toy-neural-network project.
The package float builds around a generalized neural network, where
the assumption of a constant number of neurons per each layer is relaxed, and
thus the number of neurons may differ between layers.
Although the dimension types of the variables and values involved in the algorithms
are the wildcard ?, operations on matrices and vectors are type-safe using shapes.
Even assignment is safe - though not the method := by itself - because the algorithms
perform reassignment only, and thus the types of matrices/vectors are asserted before
assignment.
An example of a neural network with two inputs, a hidden layer with ten neurons, and one output is the following:
type N[L <: Int] = L match { case 0 => 2 case 1 => 10 case 2 => 1 }
given List[Int] = 2 :: 10 :: 1 :: Nil
Network[N, 2](loss = MSE[1](), learningRate = 3, ...)Each type mapped by the higher-kinded type N[_] differs with the type argument:
N[0] is the number of inputs, N[1] is the number of neurons in the hidden layer,
while N[2] is the number of outputs. It is hence called a shape.
The implicit given_List_Int is the shape as values. Both kinds of shape (types
and values) must be given. The neural network is defined as Network[N, 2](...),
where N is the shape and 2 is the number of hidden layers, while the (values)
shape is passed as implicit parameter. Then, 1 occuring in the argument
loss = MSE[1]() is the same number of neurons in the output layer.
Note that the output layer is also a hidden layer.
Use, for instance, the following sbt command:
sbt:N Neural Networks> testOnly *double*Network*
This will run all tests with the word "Network" in package "double" (where all
values, functions, or networks are based on the Double type): there is only one
such suite, nnn.double.NetworkSuite.
Consider a neural network with three hidden layers (the last of each is the output layer) and three neurons per each layer, like in the following table:
| Input | Layer | Layer | Layer/Output |
|---|---|---|---|
The layers are fully connected, in the sense of the following nine equations:
where
The outputs can be written more briefly using indices:
These were the equations corresponding to the forward pass. For backpropagation,
we proceed backwards, from the output layer towards the input layer. Assume
From these, we start with the derivatives of
We note that
Then, the previous equations become (under the notation
Let us now write the twelve partial derivatives of
Let us now observe what is the product of the transpose of the
Let us further drop the first row (because it is not used) in the above matrix
(denoting this transient matrix by
and apply the following Hadamard product:
We obtain the following result, again reassigned to
The first row of this matrix corresponds to equations
Then, the equations