Skip to content

Re-Implementation of Paper #12

@snapo

Description

@snapo

Hi,
First of all thanks a lot for the Paper on Arxiv you released.....
I did try to implement it with the low threshold and much more easy data (just for testing)

import torch
import torch.nn as nn
import torch.optim as optim

# Define the Forward Forward model
def forward_forward_model(input_dim, hidden_dim, num_layers):
    layers = [nn.Linear(input_dim if i == 0 else hidden_dim, hidden_dim) for i in range(num_layers)]
    return nn.ModuleList(layers)

# Define the training loop
def train(model, inputs, labels, optimizer, criterion, thresholds, device):
    epoch_loss = 0
    for input_data, label in zip(inputs, labels):
        input_data = input_data.unsqueeze(0).to(device)  # Add batch dimension and move to device
        layer_outputs = input_data
        layer_losses = []
        for layer, threshold in zip(model, thresholds):
            pos_outputs = layer(layer_outputs)
            pos_loss = torch.pow(pos_outputs - threshold, 2).mean()
            neg_outputs = layer(layer_outputs)
            neg_loss = torch.pow(threshold - neg_outputs, 2).mean()
            layer_losses.append(pos_loss + neg_loss)
            layer_outputs = pos_outputs
        loss = sum(layer_losses)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(inputs)

# Generate random binary inputs and labels
def generate_data(num_samples, input_dim):
    inputs = torch.randint(0, 2, (num_samples, input_dim), dtype=torch.float)
    labels = (inputs.sum(dim=1) > input_dim // 2).float().unsqueeze(1)
    return inputs, labels

# Set hyperparameters
input_dim = 12
hidden_dim = 24
num_layers = 4
#thresholds = [0.1, 0.5, 1.0, 2.0] # Set threshold for each layer
thresholds = [0.005, 0.005, 0.005, 0.005] # Set threshold for each layer
#thresholds = list(reversed(thresholds))  # comment to not use reversed thresholds
learning_rate = 0.01
num_epochs = 12
num_samples = 2000

# Generate random data
inputs, labels = generate_data(num_samples, input_dim)

# Check if CUDA GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Initialize the model and optimizer
model = forward_forward_model(input_dim, hidden_dim, num_layers)
model.to(device)  # Move the model to the device
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss()

# Train the model
for epoch in range(num_epochs):
    train_loss = train(model, inputs, labels, optimizer, criterion, thresholds, device)
    print(f'Epoch: {epoch+1:02}, Train Loss: {train_loss:.3f}')

The test data generates lists with random 0 and 1's and if there are more 1's than 0's it equals to good data.

What i somehow do not get is the exact effect on the threshold and also the spikes in the loss. Especially the spikes in a higher loss in the middle of the training do in no way make sense to me.

Here is a sample output:

Epoch: 01, Train Loss: 0.007
Epoch: 02, Train Loss: 0.002
Epoch: 03, Train Loss: 0.040
Epoch: 04, Train Loss: 0.001
Epoch: 05, Train Loss: 0.000
Epoch: 06, Train Loss: 0.001
Epoch: 07, Train Loss: 0.002
Epoch: 08, Train Loss: 0.029
Epoch: 09, Train Loss: 0.000
Epoch: 10, Train Loss: 0.000
Epoch: 11, Train Loss: 0.001
Epoch: 12, Train Loss: 0.003

As you see in epoch 3 it suddenly spikes very very big, like something unexpected happend. Might you have some input on what i might be doing wrong? Or is there something wrong with my code to re-evaluate the paper?

The reason why i did chose different data to try on is to much much faster search over hyper parameters and their effect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions