CAPTCHA CNN Recognition System

This project is a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) automatic recognition system based on TensorFlow/Keras. It uses Convolutional Neural Networks (CNN) to recognize 4-character CAPTCHAs with high accuracy (~98.6%).

Features

✅ CNN-based character recognition with batch normalization and dropout
✅ Supports 20 character classes (7 digits + 13 lowercase letters)
✅ Image preprocessing (cropping, grayscale conversion, normalization)
✅ Data augmentation for robust training
✅ Batch prediction support
✅ TensorBoard integration for training visualization
✅ Model checkpointing and early stopping

This README, based on the source code and training record files, explains the project structure, training process, model performance, and usage.

Project Structure

captcha/
├── spas_train_tf2.py                # Main program for model training
├── spas_cnn_model_tf2.py            # Main program for model prediction
├── train_data/                      # Training process records (json)
│   ├── Run_Time_2025_05_02_15_53_10_train.json           # Training accuracy
│   ├── Run_Time_2025_05_02_15_53_10_train_loss.json      # Training loss
│   ├── Run_Time_2025_05_02_15_53_10_train_lr.json        # Learning rate
│   ├── Run_Time_2025_05_02_15_53_10_validation.json      # Validation accuracy
│   └── Run_Time_2025_05_02_15_53_10_validation_loss.json # Validation loss
├── label_captcha_tool-master/
│   ├── captcha/                     # Original CAPTCHA images for training (1000+ images)
│   └── label.csv                    # CAPTCHA labels
├── test_captcha/                    # Images for testing/prediction
├── logs/                            # TensorBoard log files
├── model/                           # Trained model files directory
│   ├── spas_cnn_model_tf2_v3_final.h5   # Final model for prediction (production use)
│   ├── best_model_weights.h5            # Model weights with best validation accuracy
│   └── [other model versions]           # Previous model versions
├── legacy/                          # Legacy code and deprecated files
├── getKaptchaImg/                   # Original CAPTCHA images source (7000+ images)
├── captcha_pic/                     # Additional CAPTCHA images
├── captcha_pic2/                    # Additional CAPTCHA images
├── captcha_pic3/                    # Additional CAPTCHA images
├── captcha_code/                    # CAPTCHA code related files
├── CAPTCHA_CNN.png                  # UML Communication Diagram
└── 用CNN神經網路做驗證碼辨識.pdf  # Project documentation (Chinese)

Functionality Description

spas_train_tf2.py
- Reads images from label_captcha_tool-master/captcha and labels from label.csv.
- Preprocesses images (grayscale, character segmentation, normalization).
- Converts labels to One-Hot encoding.
- Splits data into training and validation sets (80% training, 20% validation).
- Builds or loads a CNN model.
- Trains the model using data augmentation, EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint strategies.
- Records training metrics (Accuracy, Loss, Learning Rate) in the train_data/ folder.
- Saves the model with the highest validation accuracy (best_model_weights.h5) and also saves it as the final model (spas_cnn_model_tf2_v3_final.h5).
- Note: Current version saves models to root directory. Consider moving to model/ directory for better organization.
spas_cnn_model_tf2.py
- Provides processImg and processBatchImg functions to preprocess original CAPTCHA images (cropping, grayscale, scaling to 50x88).
- Provides cnn_model_predict function to predict a single preprocessed image.
- Provides cnn_model_batch_predict function to batch predict all images in the test_captcha folder.
- Provides captcha_code function to process and predict CAPTCHA in one step.
- Loads the model from model/spas_cnn_model_tf2_v3_final.h5 for prediction (updated path).
- Outputs the confidence level (Softmax output) for each character and the final predicted CAPTCHA string.
- Image preprocessing: Crops region [102, 0, 88, 50] from original image and resizes to (50, 88).

Training Process and Model Performance Analysis

Training Process

Data Preparation: Reads images and labels from specified paths, ensures order using sort_key, splits each image into 4 characters using split_digits_in_img, and normalizes them. Converts labels using to_onehot.
Dataset Splitting: Splits data into 80% training set and 20% validation set using train_test_split.
Model Architecture:
- Conv2D(32, (3,3), relu) + BatchNormalization
- Conv2D(64, (3,3), relu) + BatchNormalization
- MaxPooling2D(2,2)
- Dropout(0.3)
- Flatten
- Dense(128, relu) + BatchNormalization
- Dropout(0.4)
- Dense(21, softmax) (corresponding to 20 characters + 1 unknown/background in dict_captcha)
- Uses L2 regularization (weight_decay = 1e-4).
Training Strategies:
- Optimizer: Adam (initial learning rate 0.001)
- Loss Function: Categorical Crossentropy
- Data Augmentation: ImageDataGenerator (rotation, translation, scaling, shearing)
- Callbacks:
  - TensorBoard: Records training process for visualization.
  - EarlyStopping: Monitors val_accuracy, stops training if no improvement for 5 epochs, restores best weights.
  - ReduceLROnPlateau: Monitors val_accuracy, halves learning rate if no improvement for 3 epochs.
  - ModelCheckpoint: Saves model weights with the highest val_accuracy to best_model_weights.h5.
- Epochs: Set to 50 but may end early due to EarlyStopping.

Training Record Analysis (Based on Run_Time_2025_05_02_15_53_10)

Training Cycles: 27 Epochs (0-26).
Accuracy:
- Training Set: Improved from ~31.9% to ~94.4%.
- Validation Set: Started at ~4.9%, reached ~97.5% at epoch 7, stabilized at ~98.6%.
Loss:
- Training Set: Decreased from ~2.52 to ~0.31.
- Validation Set: Started at ~4.14, stabilized between 0.16-0.24 after epoch 7, lowest at ~0.167.
Learning Rate:
- Epoch 0-10: 0.001
- Epoch 11-17: 0.0005 (first reduction)
- Epoch 18-24: 0.00025 (second reduction)
- Epoch 25-26: 0.000125 (third reduction)

Learning Curve Summary

Epoch	Train Acc	Val Acc	Train Loss	Val Loss	Learning Rate
0	0.319	0.049	2.516	4.140	0.001
7	0.870	0.975	0.547	0.233	0.001
11	0.903	0.981	0.463	0.205	0.0005
18	0.923	0.985	0.367	0.173	0.00025
25	0.941	0.986	0.326	0.167	0.000125
26	0.944	0.986	0.314	0.173	0.000125

Conclusion: The model performs well, with high and stable validation accuracy. The learning rate adjustment strategy is effective, and there is no significant overfitting. The best model appears at epoch 25 or 26.

Prediction Process

Load Model: Load model/spas_cnn_model_tf2_v3_final.h5.
Image Preprocessing:
- Read image (load_img).
- Convert to NumPy array (img_to_array).
- Resize using cv2.resize if dimensions do not match (50, 88) (img_rows, img_cols).
- Ensure grayscale single channel.
Character Segmentation: Use split_digits_in_img to split the preprocessed image into 4 character sub-images and normalize them.
Model Prediction:
- Predict each character sub-image using model.predict.
- Use np.argmax to find the class index with the highest probability.
- Convert index back to corresponding character using reverse_list.
Output Results: Print confidence level (Softmax output) and predicted class for each character, then combine and output the predicted CAPTCHA string.

UML Communication Diagram

The following diagram (CAPTCHA_CNN.png) is the UML communication diagram for the project's "Batch CAPTCHA Prediction" process, visualizing the interaction between main objects:

User triggers spas_cnn_model_tf2.py to execute batch prediction.
The main program loads the trained model and retrieves all images to be predicted from the test_captcha/ directory.
Each image undergoes preprocessing (grayscale, cropping, scaling) before being sent to the model for prediction.
Prediction results (confidence level for each character and final CAPTCHA string) are compiled and output to the user.

This diagram helps understand the collaboration between program modules and objects during batch prediction.

How to Use

1. Prepare Environment and Data

Environment Setup:

Option 1: Use the existing virtual environment in envs/tf3.7/
```
source envs/tf3.7/bin/activate  # On macOS/Linux
```

Option 2: Create a new virtual environment with required dependencies

python -m venv venv
source venv/bin/activate
pip install numpy opencv-python tensorflow scikit-learn

Data Preparation:

Place training images in label_captcha_tool-master/captcha/.
Place training labels in label_captcha_tool-master/label.csv (must correspond to image filenames in order).
Place images to be predicted in test_captcha/ (for batch prediction) or other specified paths (for single image prediction).

2. Train Model

Run the following command in the terminal:

python spas_train_tf2.py

After training, the following models will be saved:

best_model_weights.h5: Model with the best validation accuracy
spas_cnn_model_tf2_v3_final.h5: Final production model (copy of best model)

Note: The training script currently saves models to the root directory. You may need to manually move them to the model/ directory if using the prediction script as-is.

3. Batch Predict Images

The default configuration in spas_cnn_model_tf2.py is set for batch prediction:

if __name__ == '__main__':
    cnn_model_batch_predict()

Ensure test images are placed in the test_captcha/ folder, then run:

python spas_cnn_model_tf2.py

4. Predict Single Image

To predict a single image, uncomment and modify the if __name__ == '__main__': block in spas_cnn_model_tf2.py:

if __name__ == '__main__':
    img_filename = r'getKaptchaImg/getKaptchaImg1400.jpeg' # Path to original image
    predict_img = r'test_captcha/processed_captcha.jpg'    # Path to save preprocessed image
    captcha_code(img_filename, predict_img)                # Preprocess and predict

    # Comment out the batch prediction line
    # cnn_model_batch_predict()

Then run:

python spas_cnn_model_tf2.py

Dependencies

Required Packages

numpy
opencv-python (cv2)
tensorflow (2.x recommended, tested with TensorFlow 2.x)
scikit-learn

Python Version

Python 3.7+ (project developed with Python 3.7)
The project includes a virtual environment in envs/tf3.7/ with all dependencies

Installation

Install using pip:

pip install numpy opencv-python tensorflow scikit-learn

Or use the requirements file if available:

pip install -r requirements.txt  # If requirements.txt exists

Frequently Asked Questions

Q1: Warning `Compiled the loaded model, but the compiled metrics have yet to be built` during prediction?

A: This occurs because the model is loaded and used for prediction without compile. It does not affect prediction, but model.compile(...) has been added to the code to remove this warning.

Q2: Message `Could not identify NUMA node` during training or prediction?

A: This is an informational message from TensorFlow when using GPU on Mac, indicating no support for NUMA architecture. It does not affect functionality and can be ignored.

Q3: Error "No trained model found" when running prediction?

A: The prediction script looks for the model in model/spas_cnn_model_tf2_v3_final.h5. If you just finished training, the model files are saved to the root directory. You need to either:

Move the trained models to the model/ directory: mv spas_cnn_model_tf2_v3_final.h5 model/
Or modify the model_path in spas_cnn_model_tf2.py to point to the correct location

Q4: How to change the character set or length of the CAPTCHA?

A:

Character Set: Modify the dict_captcha dictionary in spas_train_tf2.py and spas_cnn_model_tf2.py, and ensure training labels match it. Adjust the output units of the final Dense layer in the model (currently 21).
Length: Modify the digits_in_img variable in spas_train_tf2.py and spas_cnn_model_tf2.py. This affects image segmentation and prediction loop count.

Q5: How to adjust the image cropping region for different CAPTCHA formats?

A: In spas_cnn_model_tf2.py, modify the cropping parameters in processImg and processBatchImg functions:

x, y, w, h = [102, 0, 88, 50]  # [x_offset, y_offset, width, height]

Adjust these values based on your CAPTCHA image format.

Technical Details

Model Architecture

Input: Grayscale images of size (50, 22, 1) - single character after segmentation
Convolutional Layers:
- Conv2D(32) + BatchNorm + ReLU
- Conv2D(64) + BatchNorm + ReLU
- MaxPooling2D(2x2)
Regularization: Dropout(0.3), L2 weight decay (1e-4)
Dense Layers:
- Dense(128) + BatchNorm + ReLU + Dropout(0.4)
- Dense(21) + Softmax (20 character classes + 1 background)
Total Parameters: Approximately 200K+ trainable parameters

Character Set

The model recognizes 20 characters:

Digits: 2, 3, 4, 5, 6, 7, 8
Lowercase letters: a, b, c, d, e, f, g, n, m, p, w, x, y

Performance Metrics

Validation Accuracy: ~98.6%
Training Time: ~27 epochs (with early stopping)
Per-character prediction confidence: Displayed as Softmax probability distribution

Author

This project was developed by [Lin Hung Chuan]. For inquiries, please contact [sprigga@gmail.com].

Contact and Contribution

For questions or suggestions, feel free to open an Issue or Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
label_captcha_tool-master		label_captcha_tool-master
logs		logs
test_captcha		test_captcha
train_data		train_data
.gitignore		.gitignore
CAPTCHA_CNN.png		CAPTCHA_CNN.png
README.md		README.md
spas_cnn_model_tf2.py		spas_cnn_model_tf2.py
spas_train_tf2.py		spas_train_tf2.py
用CNN神經網路做驗證碼辨識.pdf		用CNN神經網路做驗證碼辨識.pdf

sprigga/CAPTCHA_CNN

Folders and files

Latest commit

History

Repository files navigation

CAPTCHA CNN Recognition System

Features

Table of Contents

Project Structure

Functionality Description

Training Process and Model Performance Analysis

Training Process

Training Record Analysis (Based on Run_Time_2025_05_02_15_53_10)

Learning Curve Summary

Prediction Process

UML Communication Diagram

How to Use

1. Prepare Environment and Data

2. Train Model

3. Batch Predict Images

4. Predict Single Image

Dependencies

Required Packages

Python Version

Installation

Frequently Asked Questions

Q1: Warning Compiled the loaded model, but the compiled metrics have yet to be built during prediction?

Q2: Message Could not identify NUMA node during training or prediction?

Q3: Error "No trained model found" when running prediction?

Q4: How to change the character set or length of the CAPTCHA?

Q5: How to adjust the image cropping region for different CAPTCHA formats?

Technical Details

Model Architecture

Character Set

Performance Metrics

Author

Contact and Contribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Q1: Warning `Compiled the loaded model, but the compiled metrics have yet to be built` during prediction?

Q2: Message `Could not identify NUMA node` during training or prediction?

Packages