Skip to content

A machine learning–based tool to detect malware in executable files and malicious URLs using Random Forest and Logistic Regression classifiers.

License

Notifications You must be signed in to change notification settings

HARSH74561/Malware-Detection-using-Machine-learning

Repository files navigation

🛡️ Malware Detection using Machine Learning

Python License Repo Size

A machine learning–powered project for detecting malware files and malicious URLs.
This repository leverages Random Forest and Logistic Regression classifiers to accurately identify malicious patterns in files and web URLs.

📖 📖 Blog Reference: Machine Learning for Malware Detection


🚨 Why This Project?

Traditional signature-based antivirus software struggles with:

  • Polymorphic malware that changes signatures
  • New malware with no known signatures
  • Automated attacks by low-skilled attackers

Machine learning allows detection based on behavior and features, even for previously unseen malware.


🎯 Objectives

  • Detect malware in executable files using PE Header analysis
  • Detect malicious URLs using text-based ML methods
  • Provide a terminal-based CLI interface for scanning
  • Enable future enhancements like GUI and web integration

📂 Dataset


🏗️ Architecture

Architecture

Workflow:

  1. Extract features from PE headers using pefile
  2. Train Random Forest Classifier for file malware detection
  3. Clean & vectorize URLs using TF-IDF
  4. Train Logistic Regression for malicious URL detection
  5. Apply whitelist filtering to avoid false positives

⚙️ Results & Performance

Model Accuracy Precision Recall
Random Forest (PE Headers) 99.37% 99.20% 98.90%
Logistic Regression (URLs) 98.46% 99.18% 96.25%

ROC Curve for URL Detector
ROC Curve

Confusion Matrix PE Header Detector
Confusion Matrix for Malicious PE Header Detector

Confusion Matrix URL Detector
Confusion Matrix URL Detector


🛠️ Requirements

Install dependencies:

pip install -r requirements.txt

Main libraries:

  • scikit-learn

  • pandas

  • numpy

  • pefile

  • joblib

  • pyfiglet (for ASCII art in CLI)

🚀 Installation & Usage

Run Locally

git clone https://github.com/HARSH74561/Malware-Detection-using-Machine-learning.git cd Malware-Detection-using-ML pip install -r requirements.txt python main.py

Run with Docker

git clone https://github.com/HARSH74561/Malware-Detection-using-Machine-learning.git cd Malware-Detection-using-ML docker build -t py-md . docker run -ti py-md

🖥️ User Interface (CLI)

Alt Text Alt Text Alt Text

  • Terminal-based interface

  • ASCII art on startup

  • Easy input for files and URLs

🔮 Future Enhancements

  • Expand dataset for higher accuracy

  • Create a GUI for Windows/Linux

  • Enable real-time file scanning

  • Add web-based interface for file/URL scanning

##🤝 Contributing

  • Fork the repo & create a branch:
git checkout -b feature-branch
  • Make your changes and commit:
git add .
git commit -m "Add new feature"
  • Push and open a Pull Request:
git push origin feature-branch

About the Author

Developed by Harsh (GitHub: HARSH74561) Focused on cybersecurity, machine learning, and AI-powered tools.

About

A machine learning–based tool to detect malware in executable files and malicious URLs using Random Forest and Logistic Regression classifiers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published