Make your own neural network

This repository is inspired by the book Make your own neural network. It takes the well known MNIST dataset of handwriten numbers (60.000 training and 10.000 test samples) and uses them to train and validate a fully connected feed-forward neural network. The experiments show the effects of training the network with different parameters and their respective gains.

The goal for this repository is: Reproduction of the experiments without us IPython.

Tools & Libraries

Vagrant (Virtual machine provider)
Python (Programming language)
- Numpy (Math library)
- Matplotlib (Data visualization library)
FFmpeg + ImageMagic (Encoders)
Atom (IDE)

Experiments

The code is split into five experiments and each experiment in three sections. A main function allows these to be called individually and their results are stored as binaries or images. The core of the experiments are the DataPreparer and the NeuralNetwork class. The DataPreparer reads the dataset and returns each individual records as input for the neural network. The NeuralNetwork represents the feed-forward neural network, which consists of 784 input nodes, 100 hidden nodes and 10 output nodes. The activation function is the sigmoid and the initial weights are generated by sampling random values from a normal distribution.

Experiment 1 - A training run with fixed values.

The first experiment is a single training run with a fixed learning rate. All training images are shown to the network once with a learning rate of 0.3. After that the trained neural network is used to classify the test images. This results in 95% accuracy.

Experiment 2 - The influence of the learning rate

The second experiment trains the neural network with different learning rates. The expected behaviour is that a small learning rate will prevent the neural network from achieving a proper generic answer. While a high learning rate will cause the neural network to overshoot a generic answer. This is reflected in the graph below, as the smallest learning rate only achieves a score of 91%. While the larger learning rates cause the score to deteriorate even more. The optimal learning rate during this run seems to be 0.18.

Experiment 3 - Training multiple rounds

The third experiment trains the neural network with a fixed learning rate of 0.1, but each training image is shown multiple times. The expected behaviour is that the score will increase up until it starts to overfit the problem. The goal of the neural network is to achieve a generic answer, not to memorize the training data. The graph below shows this as the score increases after each succesive epoch. But after only 5 epochs the score starts to deteriorate slowly, indicating that the neural network is overfitting the training data.

Experiment 4 - Learning rates vs Epochs

The fourth experiment is a combination of experiment 2 & 3. It pits the learning rates against the epochs while training a neural network. This will reiterate the results from the previous experiments with more data and a dynamic visualization.

With a learning rate of 0.3 the score doesn't increase with the amount of epochs as the lower learning rates do. Showing that the neural network cannot achieve a more generic answer with these settings.
The smallest learning rate of 0.025 is able to achieve the highest score out of all learning rates, but requires many epochs before achieving this score.

Improving the highest score of 97.3% with only these two parameters is still possible. Judging from the gain, their is still room for improvement by lowering the learning rate and increasing the amount of epochs. But the increase in accuracy achieved this way is probably not worth the computational power required.

Experiment 5 - Inverse query (estimation)

The fifth and last experiment is an attempt at visualizing the neural network. Since the dot product which is used in the neural network isn't reversible the exact result can not be determined. Because of this an estimation is used based on a weighted response of the input nodes. The following ten images show the response of the best neural network for each individual result. Some show a resemblance to the actual value but most leave it to the imagination.

(0 - 9)