-
Notifications
You must be signed in to change notification settings - Fork 14
Tutorial
A neural network can learn a mapping from (x, y) coordinates to pixel brightness. To get an intuitive understanding of how that would work, feel free to visit http://playground.tensorflow.org and play around.
Let's use the following greyscale image as example:
image = [
[0, 130, 255],
[40, 170, 255],
[80, 210, 255]
]In this image, the top left pixel is black, the center pixel is grey and the rightmost pixels are white.
The image can be represented as a list of coordinates to pixel brightness values, like this:
(0, 0) => 0
(1, 0) => 130
(2, 0) => 255
(0, 1) => 40
(1, 1) => 170
(2, 1) => 255
(0, 2) => 80
(1, 2) => 210
(2, 2) => 255
A neural network can be viewed as a function approximator that can learn this mapping. However, first we need to apply some pre-processing to the data. Ideally, the input and output should be centered around zero and have unit variance. In other words, a series of numbers like [-1, 0, 1] is fine, but a series like [0, 130, 255] is not. For now, it's good enough to simply scale all our values so they lie between 0 and 1. That means we'll
- divide x coordinates by the image width
- divide y coordinates by the image height
- divide pixel brightness values by 255
Here's some code to represent our little image and preprocess the data:
import numpy as np
image = [
[0, 130, 255],
[40, 170, 255],
[80, 210, 255]
]
image = np.array(image)
image = np.divide(image, 255.0)
image_width, image_height = image.shape
print('Image with shape {}:'.format(image.shape))
print(image)
x = []
y = []
for i in range(image_height):
for j in range(image_width):
x.append(
[i / image_height, j / image_width]
)
y.append(
[image[i][j]]
)
x = np.array(x)
y = np.array(y)
print('\nScaled coordinates (input):')
print(x)
print('\nScaled pixel brightness values (output):')
print(y)Go ahead and run the code. All good? Great! Let's move on and actually train a neural network on this data.
For now, let's use a tiny neural network, something like this:
The data flows from left to right. In each node, two things happen: 1) All incoming signals are added together and 2) that sum is passed through an activation function. This function can be as simple as a rectifier, f(x) = max(0, x). This function is actually quite popular, and is typically referred to as "ReLU" (Rectified Linear Unit). The output of the activation function becomes the output of the node.
Given that we use the Keras library, the following code corresponds to the architecture described above:
from keras.models import Sequential
from keras.layers import Activation, Dense
model = Sequential()
model.add(Dense(5, input_dim=2))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('relu'))When it comes to training this model, we need to define a loss function and an optimizer. Read up on it if you want, but don't worry too much about the details.
model.compile(loss='mean_squared_error', optimizer='sgd')To train the model on the neatly prepared data, do
model.fit(x, y, epochs=500)epochs=500 means it'll sweep the data 500 times during the training process. The loss should go down from around [0.5, 0.3] to somewhere around [0.05, 0.1]
To see what the neural network learned, do
predicted_image = model.predict(x, verbose=False).reshape(image.shape)
print('\nPredicted image:')
print(predicted_image)The result may not be perfect, but at least it's a start. The code from this mini-tutorial is available here: tutorial.py
When you're ready to move on, take a look at the assignments