This repository contains both the research and project implementation of an Image Captioning System that received an Honourable Mention at the 30th Congress of Scientific Initiation (UnB) and 21st Congress of the Federal District, Brazil. The project combines natural language processing (NLP) and computer vision techniques to generate descriptive captions for images. It explores the latest methodologies in the field, such as Convolutional Neural Networks (CNNs) for image processing and Recurrent Neural Networks (RNNs), particularly LSTMs, for text generation.
The image captioning system is divided into two main parts:
- Research: The research section includes a detailed literature review and experiments exploring various image captioning models, architectures, and techniques.
- Project: The implementation part focuses on building a working prototype using state-of-the-art deep learning models.
To run the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/loioladev/cnpq-caption-ia.git cd cnpq-caption-ia -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate ./requirements.sh -
Download and prepare datasets before training the models.
The project is divided into two main parts: YOLO Object Detection and LSTM Text Generation. Each part has its own set of scripts and notebooks for training and evaluation.
The YOLO datasets are used for training the object detection model. The datasets are available in the following links:
- COCO Dataset
- Pascal VOC Dataset
- Open Images Dataset
- ImageNet Dataset
- LVIS Dataset
- Exclusively Dark Dataset
The LSTM datasets are used for training the text generation model. The datasets are available in the following links:
This project is licensed under the MIT License. See the LICENSE file for details.
