CAPTCHA Recognition Project

This project focuses on developing a robust CAPTCHA recognition system using deep learning techniques. Two approaches were implemented: a single model approach that processes the entire CAPTCHA image and a multiple classifiers approach, where separate classifiers are trained for each character position. Both methods utilize a ResNet50 architecture. A Streamlit application integrated with FastAPI was also developed to provide an interactive demonstration of the system's capabilities. The project underscores the effectiveness of deep learning in CAPTCHA recognition and provides a flexible, scalable solution for text-based CAPTCHAs.

Features

Two Recognition Approaches:

One-go Recognition: Recognizes the entire CAPTCHA image in one step.
Sequential Recognition: Divides the CAPTCHA into individual characters and uses five classifiers to recognize each character sequentially.

Frontend and Backend:

Streamlit Frontend: Provides an interface to upload CAPTCHA images and view recognition results.
FastAPI Backend: Handles predictions and model interaction.

Custom Dataset Training:

Users can train the models on their own datasets by configuring paths, CAPTCHA length, and alphabet.

Preprocessing Tools:

A script (preprocess.py) to clean the dataset by removing corrupted files.

Installation and Setup

Clone the repository:

git clone https://github.com/EninDmitriy96/CAPTCHA_recognition
cd <path_to_the_project>

Install required dependencies:

pip install -r requirements.txt

Ensure your dataset is prepared in the proper format:

CAPTCHA filenames should correspond to their decryption (e.g., abc12.png for CAPTCHA abc12).

Place CAPTCHA images in the data/ directory or configure paths for your dataset.

Usage

Running the Application

Start both the frontend and backend:

python run_all.py

Access the applications:

Frontend: http://127.0.0.1:8501
Backend: http://127.0.0.1:8000

Terminate both processes by pressing Enter in the terminal.

Using the Frontend

Upload a CAPTCHA image:

Use examples from the data/ folder for best results (same dataset used for training).

View Predictions:

The results from both approaches (one-go and sequential recognition) are displayed.

Training on Your Dataset

Update the following parameters in the code:

Dataset Paths: Ensure paths to your dataset are correct.
Sequential Classifiers: Set the number of classifiers according to the CAPTCHA length.
Alphabet: Define the alphabet used in your CAPTCHAs.

Run the training script (code/models/notebooks):

Preprocessing the Dataset

To remove corrupted files from your dataset, run:

python code/datasets/preprocess.py

Future Improvements

Add support for more CAPTCHA formats.
Improve accuracy for custom datasets.
Extend frontend for additional functionalities (e.g., batch processing).

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
code		code
data		data
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_all.py		run_all.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAPTCHA Recognition Project

Features

Installation and Setup

Usage

Running the Application

Using the Frontend

Training on Your Dataset

Preprocessing the Dataset

Future Improvements

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAPTCHA Recognition Project

Features

Installation and Setup

Usage

Running the Application

Using the Frontend

Training on Your Dataset

Preprocessing the Dataset

Future Improvements

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages