A deep learning project that classifies images as either AI-generated (fake) or real using a Convolutional Neural Network (CNN). This project uses the CIFAKE dataset to train a binary classifier capable of distinguishing between authentic and synthetic images.
- Original Kaggle Project: Classifying Real and AI-Generated Images
- Dataset: CIFAKE Dataset on Kaggle
- Project Overview
- Dataset
- Features
- Model Architecture
- Installation
- Usage
- Results
- Dependencies
- Contributors
This project addresses the growing challenge of distinguishing between real and AI-generated images. With the advancement of generative AI models, it has become increasingly difficult to identify synthetic content with the naked eye. This CNN-based classifier provides an automated solution to detect AI-generated images.
- Build a robust binary classifier for AI vs real image detection
- Analyze file size patterns between real and synthetic images
- Implement proper data preprocessing and model evaluation techniques
- Achieve high accuracy in distinguishing between authentic and generated content
The project uses the CIFAKE dataset, which contains:
- Training Set:
- Real images:
/train/REAL/ - Fake (AI-generated) images:
/train/FAKE/
- Real images:
- Test Set:
- Real images:
/test/REAL/ - Fake (AI-generated) images:
/test/FAKE/
- Real images:
- Image Format: 32x32 pixel RGB images in JPEG format
- Preprocessing: Images are normalized to [0,1] range
- Labels: Binary classification (0 = Fake, 1 = Real)
- File size distribution analysis across different categories
- Statistical analysis (mean, median, standard deviation)
- Visual histograms for data exploration
- Image resizing and normalization
- Train-test split with stratification
- Data validation and format verification
- Multi-layer convolutional neural network
- Regularization techniques (L2 regularization, Dropout)
- Adam optimizer with custom learning rate
- Training/validation accuracy tracking
- Confusion matrix visualization
- Precision, Recall, and F1-score metrics
- Classification report generation
- Single image prediction capability
- Batch prediction for multiple images
- Custom image testing functionality
The CNN model consists of:
# Convolutional Layers
- Conv2D(64, (3,3)) + MaxPooling2D(2,2) + Dropout(0.3)
- Conv2D(32, (3,3)) + MaxPooling2D(2,2) + Dropout(0.3)
- Conv2D(16, (3,3)) + MaxPooling2D(2,2) + Dropout(0.3)
# Fully Connected Layers
- Dense(512) + Dropout(0.4)
- Dense(256) + Dropout(0.4)
- Dense(128) + Dropout(0.4)
- Dense(1, activation='sigmoid') # Output layer- Regularization: L2 regularization (0.001) on all layers
- Dropout: Progressive dropout rates (0.3 β 0.4) to prevent overfitting
- Optimizer: Adam with learning rate 0.0001
- Loss Function: Binary crossentropy
- Training: 30 epochs with batch size 64
-
Open the Kaggle notebook
- Visit the original Kaggle project
- Fork the notebook to your account
- The CIFAKE dataset is already available in the Kaggle environment
-
Required packages (pre-installed on Kaggle)
- TensorFlow
- OpenCV
- Scikit-learn
- Matplotlib
- Seaborn
- PIL (Pillow)
- NumPy
- tqdm
-
Clone the repository
git clone https://github.com/chellyahmed/aireal.git cd aireal -
Install required packages
pip install tensorflow opencv-python scikit-learn matplotlib seaborn pillow numpy tqdm
-
Download the dataset
- Download the CIFAKE dataset from Kaggle
- Extract to match the directory structure used in the notebook
- Update the directory paths in the code to match your local setup
-
Open Jupyter Notebook
jupyter notebook aireal.ipynb
-
Execute cells sequentially
- Cell 1: Data analysis and visualization
- Cell 2: Data validation and preprocessing
- Cell 3: Statistical analysis
- Cell 4: Data loading and splitting
- Cell 5: Model definition and compilation
- Cell 6: Model training
- Cell 7: Training visualization
- Cells 8-9: Custom image prediction
- Cells 10-11: Model evaluation and metrics
# Load and preprocess your image
img_path = 'path/to/your/image.jpg'
img = image.load_img(img_path, target_size=(32, 32))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0) / 255.0
# Make prediction
prediction = model.predict(img_array)
result = "Real" if prediction[0][0] > 0.5 else "Fake"
print(f"Prediction: {result} (Confidence: {prediction[0][0]:.4f})")The model achieves:
- Training Accuracy: 93%
- Validation Accuracy: 92.03%
- Precision: 93%
- Recall: 96%
- F1-Score: 0.94
- File size analysis reveals patterns between real and AI-generated images
- The CNN model effectively learns distinguishing features
- Regularization techniques help prevent overfitting
- Model performs well on both training and validation sets
- TensorFlow: Deep learning framework
- OpenCV: Image processing
- NumPy: Numerical computations
- Matplotlib: Data visualization
- Seaborn: Statistical visualization
- Scikit-learn: Machine learning utilities
- PIL (Pillow): Image handling
- tqdm: Progress bars
- Images resized to 32x32 pixels
- Pixel values normalized to [0, 1] range
- RGB color format maintained
- Train-test split: 80-20 ratio
- Batch size: 64
- Epochs: 30
- Learning rate: 0.0001
- Early stopping can be implemented for optimal performance
- Binary classification accuracy
- Precision and recall for both classes
- F1-score for balanced evaluation
- Confusion matrix for detailed analysis
- Experiment with different CNN architectures
- Implement data augmentation techniques
- Add more sophisticated regularization methods
- Create a web interface for easy image testing
- Expand to handle different image sizes and formats
This project was developed by:
- CIFAKE Dataset: Thanks to the creators of the CIFAKE dataset
- Kaggle Community: For providing the platform and dataset
- TensorFlow Team: For the excellent deep learning framework
Note: This project is for educational and research purposes. Always consider ethical implications when working with AI-generated content detection systems.