A project for generating text-based dynamic images (Under updating).
RDTF (Text-Driven Dynamic Image Generation) is a cutting-edge project focused on generating dynamic images from text descriptions. By leveraging advanced AI models and fine-tuning techniques, RDTF enables the creation of visually engaging dynamic visuals that respond to textual inputs.
The project utilizes the i2vgen model as its foundation and incorporates LoRA (Low-Rank Adaptation) fine-tuning technology to optimize performance specifically for dynamic image generation tasks. This combination allows for efficient adaptation of the base model while maintaining high-quality output.
- Text-to-Dynamic-Image: Convert text descriptions into dynamic images with smooth animations
- LoRA Fine-tuning: Efficient model adaptation using LoRA technology
- Customizable Training: Flexible training pipeline for different dynamic image generation tasks
- Easy Integration: Simple invocation process for generating dynamic images
- Python 3.8+
- PyTorch
- diffusers library
- Other dependencies (see
requirements.txt)
# Clone the repository
git clone https://github.com/yourusername/RDTF.git
cd RDTF
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtRDTF uses LoRA fine-tuning for optimal performance.
# Run the training script
bash shells/train_multitaskpretrain.shYou can modify the training parameters in the train_multitaskpretrain.sh script, including:
- Learning rate
- Number of epochs
- Batch size
- Dataset paths
- LoRA rank and alpha values
To generate dynamic images using the trained model:
/usr/local/envs/diffusers/bin/python examples_lora.pyModify the examples_lora.py file to provide your own text prompts and adjust generation parameters such as:
- Output resolution
- Animation length
- Style parameters
- Sampling steps
Check out the examples/ directory for sample outputs and corresponding input prompts.
We welcome contributions to RDTF! Please read our CONTRIBUTING.md for details on our code of conduct and submission process.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the developers of the i2vgen model
- LoRA implementation based on peft library
- Diffusers library by Hugging Face
For questions and feedback, please open an issue and contact.