Skip to content

Inference entropix#11

Open
tomoqt wants to merge 31 commits intocleaned_tokenizedfrom
inference_entropix
Open

Inference entropix#11
tomoqt wants to merge 31 commits intocleaned_tokenizedfrom
inference_entropix

Conversation

@tomoqt
Copy link
Copy Markdown
Owner

@tomoqt tomoqt commented Mar 5, 2025

adding:
-support for muon optimizer
-simple gradio app
-simple inference with beam search, nucleus sampling,entropix-like decoding
-GRPO fine-tuning of saved model

- Introduced new functions `load_raw_spectrum_tokens` and `load_raw_ir` for direct spectrum file processing
- Added command-line arguments in `search_and_infer.py` to support raw spectrum inference
- Implemented alternative IR encoder (`Regular1DCNNEncoder`) alongside ConvNeXt
- Updated configuration and model to support configurable IR encoder type
- Added new dependencies: gradio, py3Dmol, gradio_molecule3d
- Enhanced spectral encoder flexibility with `ir_encoder_type` parameter
- Added random sample inference for Gradio app when in test mode
- Simplified SMILES visualization and error handling
- Integrated test dataset sampling for random molecule generation
- Updated model inference to support flexible decoding strategies
- Improved error handling and visualization for predicted molecules
- Implemented Entropix decoding strategy in `inference.py` with entropy-based decision making
- Added new decoding method `entropix_decode` with dynamic layer looping for uncertain predictions
- Updated `INFERENCE_README.md` with comprehensive documentation on Entropix strategy
- Enhanced `test_inference.py` to support Entropix evaluation and metrics comparison
- Added new command-line arguments for Entropix configuration (entropy/varentropy thresholds, max loops)
- Updated model and decoder to support flexible layer looping during inference
- Integrated Entropix into existing decoding strategies with minimal code changes
…t directory support

- Updated `test_inference.py` to canonicalize SMILES for exact match comparison
- Added `--output_dir` argument to save inference results in a specified directory
- Introduced `canonicalize_smiles()` helper function in `train_autoregressive.py`
- Modified `greedy_decode()` to support optional temperature-based sampling
- Updated `run.sh` to use torchrun with a test configuration
- Improved SMILES decoding and evaluation with canonical SMILES handling
- Increased batch size from 32 to 64 in real_config.yaml
- Implemented mixed precision training with configurable precision (fp32, fp16, bf16)
- Added precision configuration option in training config
- Integrated torch.cuda.amp.autocast for automatic mixed precision
- Added GradScaler for FP16 numerical stability
- Updated greedy decoding and validation to support mixed precision
- Enhanced training loop to handle different precision modes
- Added precision-aware logging and device compatibility checks
- Introduced a new decoding strategy 'GREEDY_LOOP' in inference.py, allowing for layer looping during greedy decoding.
- Updated ModelInference class to support the new strategy with a dedicated method for greedy decoding with loops.
- Modified test_inference.py to include 'greedy_loop' in the list of strategies and added testing logic for varying loop counts.
- Enhanced muon optimizer in muon.py with additional parameters for orthogonalization.
- Updated training scripts to accommodate new configuration options for GRPO and adjusted default values in YAML config files.
- Ensured backward compatibility by maintaining existing configurations while adding new features.
- Increased embed_dim to 2048, num_heads to 16, and adjusted num_layers to 8 in real_config.yaml for improved model capacity.
- Changed training precision to bf16 in real_config.yaml for enhanced performance.
- Updated greedy_decode_frequency to 1000 in real_config.yaml to optimize decoding strategy.
- Modified training precision to fp16 in test_config.yaml for consistency with mixed precision training.
- Changed optimizer type to AdamW in test_config.yaml for better performance with vector parameters.
- Increased batch_size from 32 to 64 in local_config.yaml, real_config_mup.yaml, and real_config.yaml to enhance training efficiency.
- Reduced batch_size from 64 to 32 in real_config.yaml for optimized training.
- Changed precision from fp16 to fp32 and adjusted batch_size from 32 to 20 in test_config.yaml for improved consistency and performance.
- Increased learning_rate from 1.0e-4 to 1.0e-3 and updated min_learning_rate from 1.0e-5 to 1.0e-4 in test_config.yaml to enhance training dynamics.
- Adjusted learning rate calculation in muon.py for improved performance.
- Increased max_samples from 25 to 500 in test_looping.py to allow for more extensive testing.
- Updated batch_size from 16 to 32 in real_config.yaml for enhanced training efficiency.
- Modified loop_range and max_loops in test_config.yaml to support more iterations during testing.
- Changed precision to fp16 and updated batch_size to 32 in test_config.yaml for consistency with training settings.
- Updated optimizer type to muon_mix in test_config.yaml for better handling of matrix parameters.
- Reduced max_samples from 500 to 50 in test_looping.py for more controlled testing.
- Changed use_stablemax from True to False in real_config.yaml to modify model behavior.
- Updated loop_range and max_loops to [0, 0] and 0 respectively in test_config.yaml for limited iterations.
- Added default parameters (learning_rate, epsilon, beta, temperature, log_wandb) in test_config.yaml to enhance training configuration.
…ling

- Changed optimizer type from AdamW to muon_mix to enhance performance with matrix parameters.
- Increased weight_decay from 0.1 to 0.2 for improved regularization.
- Adjusted learning rate for muon optimizer from 1.0e-4 to 2.5e-4 to enhance training dynamics.
- Changed embed_dim from 1600 to 512 and num_heads from 16 to 4 for reduced model complexity.
- Updated training batch_size from 64 to 1024 and learning_rate from 1.0e-4 to 1.0e-3 for improved training dynamics.
- Modified optimizer type to muon_mix and adjusted weight_decay to 0.1 for better parameter handling.
- Added new parameters for GRPO including learning_rate, epsilon, beta, temperature, and logging options.
- Updated save_model_frequency and greedy_decode_frequency for enhanced training control.
- Increased batch_size from 512 to 768 for improved training throughput.
- Adjusted weight_decay from 0.1 to 0.2 to enhance regularization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant