-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi,
First of all thanks a lot for the Paper on Arxiv you released.....
I did try to implement it with the low threshold and much more easy data (just for testing)
import torch
import torch.nn as nn
import torch.optim as optim
# Define the Forward Forward model
def forward_forward_model(input_dim, hidden_dim, num_layers):
layers = [nn.Linear(input_dim if i == 0 else hidden_dim, hidden_dim) for i in range(num_layers)]
return nn.ModuleList(layers)
# Define the training loop
def train(model, inputs, labels, optimizer, criterion, thresholds, device):
epoch_loss = 0
for input_data, label in zip(inputs, labels):
input_data = input_data.unsqueeze(0).to(device) # Add batch dimension and move to device
layer_outputs = input_data
layer_losses = []
for layer, threshold in zip(model, thresholds):
pos_outputs = layer(layer_outputs)
pos_loss = torch.pow(pos_outputs - threshold, 2).mean()
neg_outputs = layer(layer_outputs)
neg_loss = torch.pow(threshold - neg_outputs, 2).mean()
layer_losses.append(pos_loss + neg_loss)
layer_outputs = pos_outputs
loss = sum(layer_losses)
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoch_loss += loss.item()
return epoch_loss / len(inputs)
# Generate random binary inputs and labels
def generate_data(num_samples, input_dim):
inputs = torch.randint(0, 2, (num_samples, input_dim), dtype=torch.float)
labels = (inputs.sum(dim=1) > input_dim // 2).float().unsqueeze(1)
return inputs, labels
# Set hyperparameters
input_dim = 12
hidden_dim = 24
num_layers = 4
#thresholds = [0.1, 0.5, 1.0, 2.0] # Set threshold for each layer
thresholds = [0.005, 0.005, 0.005, 0.005] # Set threshold for each layer
#thresholds = list(reversed(thresholds)) # comment to not use reversed thresholds
learning_rate = 0.01
num_epochs = 12
num_samples = 2000
# Generate random data
inputs, labels = generate_data(num_samples, input_dim)
# Check if CUDA GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Initialize the model and optimizer
model = forward_forward_model(input_dim, hidden_dim, num_layers)
model.to(device) # Move the model to the device
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss()
# Train the model
for epoch in range(num_epochs):
train_loss = train(model, inputs, labels, optimizer, criterion, thresholds, device)
print(f'Epoch: {epoch+1:02}, Train Loss: {train_loss:.3f}')
The test data generates lists with random 0 and 1's and if there are more 1's than 0's it equals to good data.
What i somehow do not get is the exact effect on the threshold and also the spikes in the loss. Especially the spikes in a higher loss in the middle of the training do in no way make sense to me.
Here is a sample output:
Epoch: 01, Train Loss: 0.007
Epoch: 02, Train Loss: 0.002
Epoch: 03, Train Loss: 0.040
Epoch: 04, Train Loss: 0.001
Epoch: 05, Train Loss: 0.000
Epoch: 06, Train Loss: 0.001
Epoch: 07, Train Loss: 0.002
Epoch: 08, Train Loss: 0.029
Epoch: 09, Train Loss: 0.000
Epoch: 10, Train Loss: 0.000
Epoch: 11, Train Loss: 0.001
Epoch: 12, Train Loss: 0.003
As you see in epoch 3 it suddenly spikes very very big, like something unexpected happend. Might you have some input on what i might be doing wrong? Or is there something wrong with my code to re-evaluate the paper?
The reason why i did chose different data to try on is to much much faster search over hyper parameters and their effect.