AttentionSplit: A Hybrid Neural Network Architecture for Long- and Short-Term Retention

AttentionSplit is a hybrid neural network architecture designed to enhance long- and short-term retention in temporal data tasks.

This project also includes OrthAdam, a novel Adam optimizer variant inspired by AdamP, aimed at improving convergence and maximum achievable accuracy. The framework has been tested across reinforcement learning, image classification, and sequence prediction tasks.

For full details, refer to our Thesis which describes the theory, experiments, and evaluation in depth.

Features

AttentionSplit Module: Combines recurrence (like LSTMs) with attention mechanisms to improve temporal representation.
OrthAdam Optimizer: A modified Adam variant with projection criteria changes to improve optimization performance.
Flexible Testing: Supports OpenAI Gymnasium classic control, Mujoco continuous control, and image classification datasets.
Reproducibility: Includes example scripts and logging to allow replication of experiments.

Installation

Install Python dependencies:

pip install -r requirements.txt
Install the Mujoco physics engine (required for certain RL environments):

https://github.com/openai/mujoco-py?tab=readme-ov-file

Usage

All experiments can be found in the playground folder. You can run individual tests or use the main runner:

# Example: Run a Mujoco HalfCheetah test
python playground/<test_name>/test_<test_name_.py --OPTS [READ SPECIFIC TEST FILE]

Example using OrthAdam in a training loop:

from modules.Optimizer import OrthAdam
from modules.Attentionsplit import AttentionSplitModule
import torch

model = AttentionSplitModule(input_dim=32, hidden_dim=64, output_dim=10)
optimizer = OrthAdam(model.parameters(), lr=0.001)

# Forward pass and optimization
loss = model(torch.randn(16, 32))
loss.backward()
optimizer.step()
optimizer.zero_grad()

Datasets and Environments

OpenAI Gymnasium – Classic Control: CartPole, MountainCar
OpenAI Mujoco – Continuous Control: HalfCheetah, Walker2D
Image Classification: CIFAR-10, CIFAR-100, FashionMNIST
Sequence Prediction: Custom temporal datasets

Testing and Benchmarking

You can explore tests and benchmarking using the provided scripts in the playground folder. For options:

python playground/run.py --help

Results Overview

Experiments indicate:

OrthAdam achieves higher maximal accuracy than standard Adam and AdamP across multiple image classification tasks.
AttentionSplit shows improved performance for temporal data analysis compared to standard LSTM or pure attention models in some RL environments.

Refer to Section 9 of the Thesis for detailed results, graphs, and evaluation metrics.

Future Work

Scale AttentionSplit to NLP and video sequence tasks.
Optimize OrthAdam further for large-scale RL environments.
Explore multi-modal datasets and hybrid training regimes.

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
kish		kish
modules		modules
playground		playground
.gitignore		.gitignore
README.md		README.md
Thesis.pdf		Thesis.pdf
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttentionSplit: A Hybrid Neural Network Architecture for Long- and Short-Term Retention

Features

Installation

Usage

Datasets and Environments

Testing and Benchmarking

Results Overview

Future Work

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AttentionSplit: A Hybrid Neural Network Architecture for Long- and Short-Term Retention

Features

Installation

Usage

Datasets and Environments

Testing and Benchmarking

Results Overview

Future Work

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages