A modular and extensible codebase for emitter classification using contrastive learning approaches.
- Clone the repository:
git clone will add repo link here yoyo
cd Emitter_Classification_Experiments- Install dependencies:
pip install -r requirements.txt- Ensure your dataset files are in the
dataset/directory:set1.xlsset2.xlsset3.xlsxset5.xlsxset6.xlsx
The codebase provides several training scripts for different approaches: | Training is of 2 types :
- Using normal
torchrun --nproc_per_node=[NUMBER OF GPUS] train_[name of architecture].py - Using train_flexible.py :
torchrun --nproc_per_node=[NUMBER OF GPUS] train_flexible.py --model ft_transformer --loss triplet(in this example, we are using ft_transformer with triplet loss)
These are the available models and losses:
torchrun --nproc_per_node=NUM_GPUS train_triplet.pytorchrun --nproc_per_node=NUM_GPUS train_dual_encoder.pytorchrun --nproc_per_node=NUM_GPUS train_supcon.pytorchrun --nproc_per_node=NUM_GPUS train_ft_transformer.pyReplace NUM_GPUS with the number of GPUs you want to use.
All hyperparameters and settings are managed in src/config.py. The module provides:
- DataConfig: Data paths, column definitions, and train/test splits
- ModelConfig: Model architecture parameters
- TrainingConfig: Training hyperparameters and settings
- ExperimentConfig: Complete experiment configuration
Predefined configurations are available for different experiments:
get_triplet_config()get_dual_encoder_config()get_supcon_config()get_ft_transformer_config()get_ft_triplet_config()
torchrun --nproc_per_node=1 train_flexible.py \
--model ft_transformer \
--loss triplet \
--embed_dim 192 \
--layers 6 \
--heads 8 \
--margin 0.3
torchrun --nproc_per_node=1 train_flexible.py \
--model ft_transformer \
--loss triplet \
--embed_dim 192 \
--layers 6 \
--heads 8 \
--margin 0.3torchrun --nproc_per_node=1 train_flexible.py \
--model ft_transformer \
--loss supcon \
--embed_dim 128 \
--temperature 0.07torchrun --nproc_per_node=1 train_flexible.py \
--model ft_transformer \
--loss semi_hard \
--embed_dim 256 \
--margin 0.5You can now use FTTransformer with:
triplet- Standard triplet losssemi_hard- Semi-hard triplet lossinfonce- InfoNCE contrastive losssupcon- Supervised contrastive lossntxent- NT-Xent losscenter- Center loss
Check FLEXIBLE_TRAINING.md for:
- Complete argument reference
- Advanced usage examples
- Tips for hyperparameter tuning
- Multi-GPU training setup
The script automatically handles the different data loaders and training loops needed for each loss type, so you can focus on experimenting with different combinations!
To create a custom experiment:
- Modify the configuration in
src/config.pyor create a new config function - Use the modular components to build your training pipeline
- Create a new training script following the existing patterns
Example:
from src.config import ExperimentConfig
from src.data.processor import DataProcessor
from src.models.factory import get_model
from src.losses.factory import get_loss
# Create custom config
config = ExperimentConfig()
config.training.EPOCHS = 150
config.model.EMBED_DIM = 256
# Use modular components
data_processor = DataProcessor(config.data)
model = get_model('ft_transformer', in_dim=5, emb_dim=256)
criterion = get_loss('ntxent', temperature=0.1)- DataProcessor: Handles loading and preprocessing of Excel files
- Dataset Classes:
TripletPDWDataset: For triplet loss trainingPairPDWDataset: For contrastive learningEmitterDataset: For supervised contrastive learning
The models are organized in separate files for better maintainability:
residual.py: Residual network with skip connectionsEmitterEncoder: Main residual network encoderResidualBlock: Individual residual blocks
ft_transformer.py: Feature Tokenizer Transformer for tabular dataFTTransformer: Standard FT-Transformer implementation
deep_ft_transformer.py: Deeper variant with more layersDeepFTTransformer: Deep FT-Transformer with 6+ layers
factory.py: Model factory for easy instantiationget_model(): Factory function to create modelsAVAILABLE_MODELS: Dictionary of all available models
The losses are organized in separate files for better maintainability:
triplet.py: Triplet-based loss functionsTripletMarginLoss: Standard triplet lossSemiHardTripletLoss: Semi-hard negative mining
contrastive_losses.py: Contrastive learning lossesInfoNCELoss: InfoNCE for contrastive learningSupConLoss: Supervised contrastive lossNTXentLoss: Numerically stable NT-Xent
center_loss.py: Clustering lossCenterLoss: Center loss for clustering
factory.py: Loss factory for easy instantiationget_loss(): Factory function to create lossesAVAILABLE_LOSSES: Dictionary of all available losses
- BaseTrainer: Base class with common training functionality
- Specialized Trainers: Custom trainers for specific approaches
- Distributed Training: Multi-GPU support with DDP
- Distributed: Distributed training setup and utilities
- Evaluation: Clustering accuracy and model evaluation
- Create a new file in
src/models/(e.g.,custom_model.py) - Define your model class inheriting from
nn.Module - Add it to
src/models/factory.py - Import it in
src/models/__init__.py
Example:
# src/models/custom_model.py
import torch.nn as nn
class CustomModel(nn.Module):
def __init__(self, in_dim, emb_dim):
super().__init__()
# Your model definition
def forward(self, x):
# Your forward pass
return normalized_embeddings
# src/models/factory.py
from .custom_model import CustomModel
def get_model(model_type, **kwargs):
if model_type == 'custom':
return CustomModel(**kwargs)
# ... existing code- Create a new file in
src/losses/(e.g.,custom_loss.py) - Define your loss class inheriting from
nn.Module - Add it to
src/losses/factory.py - Import it in
src/losses/__init__.py
Example:
# src/losses/custom_loss.py
import torch.nn as nn
class CustomLoss(nn.Module):
def __init__(self, **kwargs):
super().__init__()
# Your loss definition
def forward(self, *args):
# Your loss computation
return loss
# src/losses/factory.py
from .custom_loss import CustomLoss
def get_loss(loss_type, **kwargs):
if loss_type == 'custom':
return CustomLoss(**kwargs)
# ... existing codeYou can import specific models and losses directly:
# Import specific models
from src.models.residual import EmitterEncoder
from src.models.ft_transformer import FTTransformer
from src.models.deep_ft_transformer import DeepFTTransformer
# Import specific losses
from src.losses.triplet import TripletMarginLoss, SemiHardTripletLoss
from src.losses.contrastive_losses import InfoNCELoss, SupConLoss, NTXentLoss
from src.losses.center_loss import CenterLoss
# Use directly
model = EmitterEncoder(in_dim=5, emb_dim=128)
criterion = TripletMarginLoss(margin=1.0)The factory functions provide a convenient way to create components:
from src.models.factory import get_model
from src.losses.factory import get_loss
# Create models
residual_model = get_model('residual', in_dim=5, emb_dim=128)
ft_model = get_model('ft_transformer', num_feats=5, dim=192, heads=8, layers=3)
deep_model = get_model('deep_ft', num_feats=5, dim=192, heads=8, layers=6)
# Create losses
triplet_loss = get_loss('triplet', margin=1.0)
semi_hard_loss = get_loss('semi_hard', margin=0.3)
infonce_loss = get_loss('infonce', temperature=0.07)
supcon_loss = get_loss('supcon', temperature=0.07)
ntxent_loss = get_loss('ntxent', temperature=0.1)
center_loss = get_loss('center', num_classes=10, dim=128)Training results are saved in the results/ directory as JSON files containing:
- Embedding dimension
- Test clustering accuracy
- Final loss
- Model and loss type
- Hyperparameters
For detailed usage instructions, troubleshooting, and advanced examples, see:
Recipies/COOKBOOK.MD: Comprehensive cookbook with all commands and examplesRecipies/Extend_the_exp.md: Guide for extending experiments with new models and losses
When adding new features:
- Follow the modular structure
- Add appropriate configuration options
- Create unit tests for new components
- Update documentation
[Add your license information here]