This project is a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) automatic recognition system based on TensorFlow/Keras. It uses Convolutional Neural Networks (CNN) to recognize 4-character CAPTCHAs with high accuracy (~98.6%).
- ✅ CNN-based character recognition with batch normalization and dropout
- ✅ Supports 20 character classes (7 digits + 13 lowercase letters)
- ✅ Image preprocessing (cropping, grayscale conversion, normalization)
- ✅ Data augmentation for robust training
- ✅ Batch prediction support
- ✅ TensorBoard integration for training visualization
- ✅ Model checkpointing and early stopping
This README, based on the source code and training record files, explains the project structure, training process, model performance, and usage.
- Project Structure
- Functionality Description
- Training Process and Model Performance Analysis
- Prediction Process
- UML Communication Diagram
- How to Use
- Dependencies
- Frequently Asked Questions
- Technical Details
- Author
- Contact and Contribution
captcha/
├── spas_train_tf2.py # Main program for model training
├── spas_cnn_model_tf2.py # Main program for model prediction
├── train_data/ # Training process records (json)
│ ├── Run_Time_2025_05_02_15_53_10_train.json # Training accuracy
│ ├── Run_Time_2025_05_02_15_53_10_train_loss.json # Training loss
│ ├── Run_Time_2025_05_02_15_53_10_train_lr.json # Learning rate
│ ├── Run_Time_2025_05_02_15_53_10_validation.json # Validation accuracy
│ └── Run_Time_2025_05_02_15_53_10_validation_loss.json # Validation loss
├── label_captcha_tool-master/
│ ├── captcha/ # Original CAPTCHA images for training (1000+ images)
│ └── label.csv # CAPTCHA labels
├── test_captcha/ # Images for testing/prediction
├── logs/ # TensorBoard log files
├── model/ # Trained model files directory
│ ├── spas_cnn_model_tf2_v3_final.h5 # Final model for prediction (production use)
│ ├── best_model_weights.h5 # Model weights with best validation accuracy
│ └── [other model versions] # Previous model versions
├── legacy/ # Legacy code and deprecated files
├── getKaptchaImg/ # Original CAPTCHA images source (7000+ images)
├── captcha_pic/ # Additional CAPTCHA images
├── captcha_pic2/ # Additional CAPTCHA images
├── captcha_pic3/ # Additional CAPTCHA images
├── captcha_code/ # CAPTCHA code related files
├── CAPTCHA_CNN.png # UML Communication Diagram
└── 用CNN神經網路做驗證碼辨識.pdf # Project documentation (Chinese)
-
spas_train_tf2.py
- Reads images from
label_captcha_tool-master/captchaand labels fromlabel.csv. - Preprocesses images (grayscale, character segmentation, normalization).
- Converts labels to One-Hot encoding.
- Splits data into training and validation sets (80% training, 20% validation).
- Builds or loads a CNN model.
- Trains the model using data augmentation, EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint strategies.
- Records training metrics (Accuracy, Loss, Learning Rate) in the
train_data/folder. - Saves the model with the highest validation accuracy (
best_model_weights.h5) and also saves it as the final model (spas_cnn_model_tf2_v3_final.h5). - Note: Current version saves models to root directory. Consider moving to
model/directory for better organization.
- Reads images from
-
spas_cnn_model_tf2.py
- Provides
processImgandprocessBatchImgfunctions to preprocess original CAPTCHA images (cropping, grayscale, scaling to 50x88). - Provides
cnn_model_predictfunction to predict a single preprocessed image. - Provides
cnn_model_batch_predictfunction to batch predict all images in thetest_captchafolder. - Provides
captcha_codefunction to process and predict CAPTCHA in one step. - Loads the model from
model/spas_cnn_model_tf2_v3_final.h5for prediction (updated path). - Outputs the confidence level (Softmax output) for each character and the final predicted CAPTCHA string.
- Image preprocessing: Crops region [102, 0, 88, 50] from original image and resizes to (50, 88).
- Provides
- Data Preparation: Reads images and labels from specified paths, ensures order using
sort_key, splits each image into 4 characters usingsplit_digits_in_img, and normalizes them. Converts labels usingto_onehot. - Dataset Splitting: Splits data into 80% training set and 20% validation set using
train_test_split. - Model Architecture:
Conv2D(32, (3,3), relu)+BatchNormalizationConv2D(64, (3,3), relu)+BatchNormalizationMaxPooling2D(2,2)Dropout(0.3)FlattenDense(128, relu)+BatchNormalizationDropout(0.4)Dense(21, softmax)(corresponding to 20 characters + 1 unknown/background indict_captcha)- Uses L2 regularization (
weight_decay = 1e-4).
- Training Strategies:
- Optimizer: Adam (initial learning rate 0.001)
- Loss Function: Categorical Crossentropy
- Data Augmentation:
ImageDataGenerator(rotation, translation, scaling, shearing) - Callbacks:
TensorBoard: Records training process for visualization.EarlyStopping: Monitorsval_accuracy, stops training if no improvement for 5 epochs, restores best weights.ReduceLROnPlateau: Monitorsval_accuracy, halves learning rate if no improvement for 3 epochs.ModelCheckpoint: Saves model weights with the highestval_accuracytobest_model_weights.h5.
- Epochs: Set to 50 but may end early due to EarlyStopping.
- Training Cycles: 27 Epochs (0-26).
- Accuracy:
- Training Set: Improved from ~31.9% to ~94.4%.
- Validation Set: Started at ~4.9%, reached ~97.5% at epoch 7, stabilized at ~98.6%.
- Loss:
- Training Set: Decreased from ~2.52 to ~0.31.
- Validation Set: Started at ~4.14, stabilized between 0.16-0.24 after epoch 7, lowest at ~0.167.
- Learning Rate:
- Epoch 0-10: 0.001
- Epoch 11-17: 0.0005 (first reduction)
- Epoch 18-24: 0.00025 (second reduction)
- Epoch 25-26: 0.000125 (third reduction)
| Epoch | Train Acc | Val Acc | Train Loss | Val Loss | Learning Rate |
|---|---|---|---|---|---|
| 0 | 0.319 | 0.049 | 2.516 | 4.140 | 0.001 |
| 7 | 0.870 | 0.975 | 0.547 | 0.233 | 0.001 |
| 11 | 0.903 | 0.981 | 0.463 | 0.205 | 0.0005 |
| 18 | 0.923 | 0.985 | 0.367 | 0.173 | 0.00025 |
| 25 | 0.941 | 0.986 | 0.326 | 0.167 | 0.000125 |
| 26 | 0.944 | 0.986 | 0.314 | 0.173 | 0.000125 |
- Conclusion: The model performs well, with high and stable validation accuracy. The learning rate adjustment strategy is effective, and there is no significant overfitting. The best model appears at epoch 25 or 26.
- Load Model: Load
model/spas_cnn_model_tf2_v3_final.h5. - Image Preprocessing:
- Read image (
load_img). - Convert to NumPy array (
img_to_array). - Resize using
cv2.resizeif dimensions do not match(50, 88)(img_rows, img_cols). - Ensure grayscale single channel.
- Read image (
- Character Segmentation: Use
split_digits_in_imgto split the preprocessed image into 4 character sub-images and normalize them. - Model Prediction:
- Predict each character sub-image using
model.predict. - Use
np.argmaxto find the class index with the highest probability. - Convert index back to corresponding character using
reverse_list.
- Predict each character sub-image using
- Output Results: Print confidence level (Softmax output) and predicted class for each character, then combine and output the predicted CAPTCHA string.
The following diagram (CAPTCHA_CNN.png) is the UML communication diagram for the project's "Batch CAPTCHA Prediction" process, visualizing the interaction between main objects:
- User triggers
spas_cnn_model_tf2.pyto execute batch prediction. - The main program loads the trained model and retrieves all images to be predicted from the
test_captcha/directory. - Each image undergoes preprocessing (grayscale, cropping, scaling) before being sent to the model for prediction.
- Prediction results (confidence level for each character and final CAPTCHA string) are compiled and output to the user.
This diagram helps understand the collaboration between program modules and objects during batch prediction.
Environment Setup:
- Option 1: Use the existing virtual environment in
envs/tf3.7/source envs/tf3.7/bin/activate # On macOS/Linux
- Option 2: Create a new virtual environment with required dependencies
python -m venv venv source venv/bin/activate pip install numpy opencv-python tensorflow scikit-learn
Data Preparation:
- Place training images in
label_captcha_tool-master/captcha/. - Place training labels in
label_captcha_tool-master/label.csv(must correspond to image filenames in order). - Place images to be predicted in
test_captcha/(for batch prediction) or other specified paths (for single image prediction).
Run the following command in the terminal:
python spas_train_tf2.pyAfter training, the following models will be saved:
best_model_weights.h5: Model with the best validation accuracyspas_cnn_model_tf2_v3_final.h5: Final production model (copy of best model)
Note: The training script currently saves models to the root directory. You may need to manually move them to the model/ directory if using the prediction script as-is.
The default configuration in spas_cnn_model_tf2.py is set for batch prediction:
if __name__ == '__main__':
cnn_model_batch_predict()Ensure test images are placed in the test_captcha/ folder, then run:
python spas_cnn_model_tf2.pyTo predict a single image, uncomment and modify the if __name__ == '__main__': block in spas_cnn_model_tf2.py:
if __name__ == '__main__':
img_filename = r'getKaptchaImg/getKaptchaImg1400.jpeg' # Path to original image
predict_img = r'test_captcha/processed_captcha.jpg' # Path to save preprocessed image
captcha_code(img_filename, predict_img) # Preprocess and predict
# Comment out the batch prediction line
# cnn_model_batch_predict()Then run:
python spas_cnn_model_tf2.py- numpy
- opencv-python (cv2)
- tensorflow (2.x recommended, tested with TensorFlow 2.x)
- scikit-learn
- Python 3.7+ (project developed with Python 3.7)
- The project includes a virtual environment in
envs/tf3.7/with all dependencies
Install using pip:
pip install numpy opencv-python tensorflow scikit-learnOr use the requirements file if available:
pip install -r requirements.txt # If requirements.txt existsQ1: Warning Compiled the loaded model, but the compiled metrics have yet to be built during prediction?
A: This occurs because the model is loaded and used for prediction without compile. It does not affect prediction, but model.compile(...) has been added to the code to remove this warning.
A: This is an informational message from TensorFlow when using GPU on Mac, indicating no support for NUMA architecture. It does not affect functionality and can be ignored.
A: The prediction script looks for the model in model/spas_cnn_model_tf2_v3_final.h5. If you just finished training, the model files are saved to the root directory. You need to either:
- Move the trained models to the
model/directory:mv spas_cnn_model_tf2_v3_final.h5 model/ - Or modify the
model_pathin spas_cnn_model_tf2.py to point to the correct location
A:
- Character Set: Modify the
dict_captchadictionary in spas_train_tf2.py and spas_cnn_model_tf2.py, and ensure training labels match it. Adjust the output units of the final Dense layer in the model (currently 21). - Length: Modify the
digits_in_imgvariable in spas_train_tf2.py and spas_cnn_model_tf2.py. This affects image segmentation and prediction loop count.
A: In spas_cnn_model_tf2.py, modify the cropping parameters in processImg and processBatchImg functions:
x, y, w, h = [102, 0, 88, 50] # [x_offset, y_offset, width, height]Adjust these values based on your CAPTCHA image format.
- Input: Grayscale images of size (50, 22, 1) - single character after segmentation
- Convolutional Layers:
- Conv2D(32) + BatchNorm + ReLU
- Conv2D(64) + BatchNorm + ReLU
- MaxPooling2D(2x2)
- Regularization: Dropout(0.3), L2 weight decay (1e-4)
- Dense Layers:
- Dense(128) + BatchNorm + ReLU + Dropout(0.4)
- Dense(21) + Softmax (20 character classes + 1 background)
- Total Parameters: Approximately 200K+ trainable parameters
The model recognizes 20 characters:
- Digits: 2, 3, 4, 5, 6, 7, 8
- Lowercase letters: a, b, c, d, e, f, g, n, m, p, w, x, y
- Validation Accuracy: ~98.6%
- Training Time: ~27 epochs (with early stopping)
- Per-character prediction confidence: Displayed as Softmax probability distribution
This project was developed by [Lin Hung Chuan]. For inquiries, please contact [sprigga@gmail.com].
For questions or suggestions, feel free to open an Issue or Pull Request.
