Skip to content

Android Malware Detection is a machine learning-based security tool designed to identify and classify malicious Android applications. The project leverages advanced ML algorithms to analyze Android APK files and detect potential malware threats, helping to protect users from malicious software.

License

Notifications You must be signed in to change notification settings

vannu07/Android-Malware-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

114 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ›‘οΈ Android Malware Detection with Machine Learning

An AI-powered Android security solution to detect malicious applications using advanced machine learning techniques


Python License Platform Made with Scikit-learn


πŸ“Œ About This Project

Developed by: Varnit Kumar, MCA Student at GGSIPU, Dwarka
Inspired by: MSc thesis at Lisbon Institute of Engineering (ISEL)
Research Paper: "Malware Detection in Android Applications with Machine Learning Techniques"

This repository presents a comprehensive and effective Malware Detection System for Android applications using state-of-the-art Machine Learning techniques. The system employs static feature analysis, advanced feature selection methods, and multiple classification algorithms to accurately classify Android APKs as malicious or benign.

🎯 Key Objectives

  • Develop a robust malware detection system with high accuracy
  • Implement explainable AI for transparent decision-making
  • Provide a scalable solution for real-world Android security
  • Contribute to cybersecurity research and education

🧠 Core Features

  • πŸ” Static Feature Extraction - Comprehensive APK analysis without execution
  • πŸ€– Multiple ML Models - SVM, Random Forest, XGBoost, and Neural Networks
  • βš™οΈ Feature Selection - Advanced dimensionality reduction techniques
  • πŸ“Š Explainable AI (XAI) - SHAP and LIME integration for model interpretability
  • πŸ“ˆ Performance Metrics - Accuracy, Precision, Recall, F1-Score analysis
  • πŸ”„ Real-time Processing - Efficient APK classification pipeline
  • πŸ“ Multiple Dataset Support - Tested on various public malware datasets
  • 🎨 Visualization Tools - Interactive charts and model performance graphs

πŸ—οΈ System Architecture

## Proposed Approach

The problem is formulated as a binary classification problem. The aim is to classify a given Android application as malicious (positive) or benign (negative). Each component integrating the proposed approach or enabling its assessment is briefly described next.

ProposedApproach

Machine Learning module - Component responsible for building, improving and evaluating the ML model that will classify Android applications as benign or malicious.

Feature extraction module - Extracts static features from an Android application’s Android Package Kit (APK) file. It maps them with the features deemed more relevant of the presence of malware in Android applications. This mapping results in the input data provided to the model, which can then classify/predict the Android application as benign or malicious.

Android applications - Allow an assessment of the developed prototype with real-world apps.

Proposed Approach Overview

The system follows a binary classification approach to categorize Android applications as malicious (positive) or benign (negative).

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Android APK   │───▢│  Feature Extraction  │───▢│  ML Classifier  β”‚
β”‚   Applications  β”‚    β”‚      Module          β”‚    β”‚     Module      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚                           β”‚
                                  β–Ό                           β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  Feature Selection   β”‚    β”‚  Prediction     β”‚
                       β”‚  & Preprocessing     β”‚    β”‚  & XAI Output   β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ System Components

Component Description
Machine Learning Module Builds, trains, and evaluates ML models for binary classification
Feature Extraction Module Extracts static features from APK files using advanced parsing techniques
Feature Selection Module Applies dimensionality reduction and selects most relevant features
Explainable AI Module Provides interpretable explanations for model decisions
Evaluation Module Comprehensive performance assessment and validation

πŸ“Š Datasets

Our system has been trained and tested on multiple public datasets to ensure robustness:

Dataset Description Size Features
Drebin Comprehensive Android malware dataset 15,036 samples 545,333 features
CICAndMal2017 Android permission-based dataset 426,000+ samples Permission features
Android Malware (AM) General Android malware collection 25,000+ samples Mixed features
AMSF Android static features dataset 10,000+ samples 6 feature categories

πŸ› οΈ Technology Stack

Core Technologies

  • Python 3.10+ - Primary programming language
  • scikit-learn - Machine learning framework
  • NumPy & Pandas - Data manipulation and analysis
  • Matplotlib & Seaborn - Data visualization
  • SHAP & LIME - Explainable AI

Development Tools

  • PyCharm - Primary IDE
  • Jupyter Notebook - Interactive development
  • Android Studio - Android app development
  • Androguard - APK analysis and feature extraction

Additional Libraries

  • XGBoost - Gradient boosting framework
  • TensorFlow/Keras - Deep learning models
  • Plotly - Interactive visualizations
  • Joblib - Model serialization

πŸš€ Quick Start Guide

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • Git
  • 4GB+ RAM recommended

Installation

  1. Clone the repository
git clone https://github.com/vannu07/Android-Malware-Detection.git
cd Android-Malware-Detection
  1. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download datasets (optional)
# Download sample datasets
python scripts/download_datasets.py

Usage

Basic Usage

# Run the main detection system
python malware_detection.py

# Analyze a specific APK
python detect_single_apk.py --apk_path /path/to/app.apk

# Train a new model
python train_model.py --dataset drebin --model random_forest

Advanced Usage

# Run with custom configuration
python malware_detection.py --config config/custom_config.yaml

# Batch processing
python batch_process.py --input_dir /path/to/apks --output_dir /path/to/results

# Model evaluation
python evaluate_model.py --model_path models/best_model.pkl --test_data data/test_set.csv

πŸ“ˆ Performance Results

Model Comparison

Model Accuracy Precision Recall F1-Score Training Time
Random Forest 97.2% 96.8% 97.1% 96.9% 2.3s
SVM 95.8% 95.2% 96.1% 95.6% 5.7s
XGBoost 98.1% 97.9% 98.2% 98.0% 3.1s
Neural Network 96.5% 96.1% 96.8% 96.4% 12.4s

Feature Importance Analysis

Top 10 most important features for malware detection:

  1. Suspicious API calls
  2. Permission requests
  3. Network activity patterns
  4. File system operations
  5. Cryptographic operations
  6. Intent filters
  7. Service declarations
  8. Receiver components
  9. Content providers
  10. Application signatures

πŸ“ Project Structure

AndroidMalwareDetection/
β”œβ”€β”€ πŸ“ data/
β”‚   β”œβ”€β”€ raw/                 # Raw dataset files
β”‚   β”œβ”€β”€ processed/           # Processed feature files
β”‚   └── models/              # Trained model files
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ feature_extraction/  # Feature extraction modules
β”‚   β”œβ”€β”€ models/             # ML model implementations
β”‚   β”œβ”€β”€ evaluation/         # Model evaluation scripts
β”‚   └── utils/              # Utility functions
β”œβ”€β”€ πŸ“ notebooks/           # Jupyter notebooks for analysis
β”œβ”€β”€ πŸ“ config/              # Configuration files
β”œβ”€β”€ πŸ“ scripts/             # Automation scripts
β”œβ”€β”€ πŸ“ tests/               # Unit tests
β”œβ”€β”€ πŸ“ docs/                # Documentation
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ setup.py               # Package setup
└── README.md              # This file

πŸ§ͺ Testing

Run the test suite to ensure everything works correctly:

# Run all tests
python -m pytest tests/

# Run specific test categories
python -m pytest tests/test_feature_extraction.py
python -m pytest tests/test_models.py

# Run with coverage
python -m pytest --cov=src tests/

🀝 Contributing

We welcome contributions! Here's how you can help:

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Areas for Contribution

  • πŸ› Bug fixes and improvements
  • πŸ“š Documentation enhancements
  • πŸ”¬ New feature extraction methods
  • πŸ€– Additional ML models
  • πŸ§ͺ More comprehensive testing
  • 🎨 UI/UX improvements

Code Style

  • Follow PEP 8 guidelines
  • Use meaningful variable names
  • Add docstrings to functions
  • Include type hints where appropriate

πŸ“š Research & Publications

This project builds on and contributes to the following research:

Academic Papers

  1. INForum 2023: "On the Use of ML for Malware Detection"
  2. RECPAD 2023: "Role of Feature Selection in Malware Detection"
  3. MDPI Information Journal 2024: "Explainable Machine Learning for Android Malware Detection"

Citing This Work

@misc{kumar2024android,
  title={Android Malware Detection with Machine Learning},
  author={Kumar, Varnit},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/vannu07/Android-Malware-Detection}}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgements

Special Thanks

  • Catarina Palma - Original MSc thesis author at ISEL, Lisbon
  • Prof. Artur Ferreira - Academic supervisor and research guidance
  • GGSIPU Faculty - Educational support and mentorship
  • Kaggle Community - Open-source datasets and collaborative environment
  • Open Source Contributors - Libraries and tools that made this possible

Research Institutions

  • Lisbon Institute of Engineering (ISEL) - Original research foundation
  • Guru Gobind Singh Indraprastha University - Academic support

πŸ“ž Contact & Support

Get in Touch

Support This Project

  • ⭐ Star this repository if you find it useful
  • πŸ› Report bugs via GitHub Issues
  • πŸ’‘ Suggest features or improvements
  • πŸ”„ Share with others in the cybersecurity community
  • πŸ“’ Follow updates on social media

πŸš€ Ready to Secure Android? Let's Get Started!

GitHub LinkedIn


⚑ Made with ❀️ by Varnit Kumar | MCA Student at GGSIPU

"Securing the digital world, one APK at a time"


Run Project (no backend API)

  • Create venv & install deps
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  • Run tests (unit tests use mocks so they are fast)
pytest -q
  • Run the project: start main.py and frontend dev server together
# Start main training (runs in foreground if you run directly)
python main.py --dataset Datasets/Drebin_v1.csv --algorithm KNN

# Or use the helper to run main in background and start frontend dev server
./run_all.sh

Note: the FastAPI backend was removed per repository preference. The frontend does not communicate directly with main.py in this setup β€” they run concurrently so you can view the UI while main.py logs appear in main.log when using run_all.sh.

About

Android Malware Detection is a machine learning-based security tool designed to identify and classify malicious Android applications. The project leverages advanced ML algorithms to analyze Android APK files and detect potential malware threats, helping to protect users from malicious software.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10