AI-Powered-Multi-Modal-Text-Image-Alignment-System

This project is a deep learning system that matches images and text based on how similar they are. It was built using PyTorch and trained on the MNIST dataset (handwritten digits). The model learns to understand both images and their matching text descriptions by putting them in the same kind of space.

Technologies

Python | PyTorch

Use Cases

Educational tools and interactive AI assistants
Cross-modal search engines
Image-caption retrieval systems

Project Structure

main.py: Runs the full training pipeline. Loads MNIST data, converts digit labels to text variations, tokenizes the text using BERT, trains the image-text alignment model, and prints training loss after each epoch.
model.py: Defines the MultiModalModel, which encodes images using CNN layers and encodes text using fully connected layers on top of BERT embeddings. Outputs two embeddings for similarity comparison.
helpers.py: Contains utility functions like convert_digits_to_random_text (to turn digit labels into random text variants) and batched_tokenizer (to tokenize a list of texts using BERT). Helps keep main.py clean and modular.

Results

Alignment Accuracy: 95%+
Loss Reduction: 80% in 5 epochs
Training Time: Less than 10 minutes on most GPUs

Each digit label is randomly changed, so a "5" might become "Five", "quinque", or "Paanch", to make every training run a unique and fun way to teach the model how different words can mean the same thing.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
README.md		README.md
helpers.py		helpers.py
main.py		main.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered-Multi-Modal-Text-Image-Alignment-System

Technologies

Use Cases

Project Structure

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered-Multi-Modal-Text-Image-Alignment-System

Technologies

Use Cases

Project Structure

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages