This project implements a Feedforward Neural Network (FFNN) in C++. It is based on a previous codebase, now refactored into an object-oriented programming structure. The primary goal is educational, with an emphasis on simplicity by avoiding external libraries. The implementation does not rely on matrices; instead, calculations are performed iteratively, which is sufficient for certain use cases. The main file includes an example demonstrating its usage.
The program requires input files formatted exactly like the MNIST database, which is provided as an example. This includes both training and recognition data. Users can either run the main program directly to observe its functionality or use the NN class independently in other projects.
In the NN.hpp file, there is a #define directive that switches the program to debugging mode, where it displays information in the console.
g++ -std=c++11 main.cpp NN.cpp -o recognize -O2
The network structure is represented by three main components: neuronLayers, weightLayers, and biasLayers. These vectors contain the information necessary for both forward propagation and backpropagation in the network.
The neuronLayers vector holds the activations of the neurons at each layer of the network. This is a 2D vector, where the first index refers to the layer, and the second index corresponds to the neuron within that layer. In the context of backpropagation, it also holds the errors (gradients) during the learning process.
neuronLayers[layerIndex][neuronIndex]gives the activation (or error) of the neuron at positionneuronIndexin layerlayerIndex.
For example:
neuronLayers[0]would represent the input layer, where the values are the inputs to the network.neuronLayers[1]represents the first hidden layer, and so on.- For backpropagation, once the error has been calculated, it will be stored in
neuronLayers[layerIndex]for each layer.
The weightLayers vector stores the weights connecting each layer to the next. It is a 2D vector, where the first index refers to the layer, and the second index corresponds to the weight connecting the neurons in the two layers. The number of weight layers is one less than the number of neuron layers because there are no weights for the input layer.
weightLayers[layerIndex]contains all the weights betweenneuronLayers[layerIndex]andneuronLayers[layerIndex + 1].weightLayers[layerIndex][weightIndex]represents the weight connecting the neuron at positionweightIndexinneuronLayers[layerIndex]to the next layer (neuronLayers[layerIndex + 1]).
For example:
neuronLayers[0](input layer) is connected toneuronLayers[1](first hidden layer) viaweightLayers[0].weightLayers[0][0]represents the connection between the first neuron of the input layer (neuronLayers[0][0]) and the first neuron of the hidden layer (neuronLayers[1][0]).weightLayers[0][1]represents the connection between the first neuron of the input layer and the second neuron of the hidden layer, and so on.
The biasLayers vector holds the biases for each layer, except the input layer. Each element in biasLayers corresponds to the bias values associated with the neurons in the respective layer.
biasLayers[layerIndex][biasIndex]represents the bias for the neuron at positionbiasIndexinneuronLayers[layerIndex + 1].
For example:
biasLayers[0]contains the biases for the neurons inneuronLayers[1](the first hidden layer).biasLayers[1]contains the biases for the neurons inneuronLayers[2], and so on.
Since the input layer doesn't have a bias, we start indexing biases from biasLayers[0], which corresponds to neuronLayers[1].
This structure, though seemingly non-intuitive, is designed to simplify the iterative calculations during the forward and backward passes. Instead of trying to use a more complex matrix structure like weightLayers[layerIndex][neuronPreviousLayer][neuronNextLayer], which would introduce additional complexity, the current system allows for an efficient, iterative computation flow.
In summary:
neuronLayers[layerIndex]contains the activations or errors for layerlayerIndex.weightLayers[layerIndex]contains the weights between layerlayerIndexandlayerIndex + 1.biasLayers[layerIndex]contains the biases forneuronLayers[layerIndex + 1].
- Sigmoid Activation Function
- Xavier Initialization
- Mean Squared Error (MSE) Cost Function
- Backpropagation Algorithm
- Stochastic Gradient Descent (Online Learning)
The forward pass computes activation values for each layer:
Where:
-
$W^{(l)}$ – Weight matrix for layer$l$ -
$b^{(l)}$ – Bias vector for layer$l$ -
$a^{(l)}$ – Activation vector for layer$l$ -
$z^{(l)}$ – Weighted sum before activation for layer$l$ -
$\sigma$ – Activation function
Compute the error for the output layer:
Where:
-
$\delta^{(out)}$ – Error for the output layer -
$\sigma'$ – Derivative of the activation function -
$\odot$ – Hadamard product (element-wise multiplication) -
$y$ - Desired output
For the Hidden Layer
Compute the error for the hidden layer:
Where:
-
$W^{(l)T}$ – Transposed weight matrix
Where:
-
$\frac{\partial L}{\partial W^{(l)}}$ – Gradient with respect to the weights -
$\frac{\partial L}{\partial b^{(l)}}$ – Gradient with respect to the biases
Where:
-
$\eta$ – Learning rate (scalar value)
This project is licensed under the MIT License. See the LICENSE file for details.