Development Of This Repository - Dataset Creation / Preprocessing / Model Building & Testing / Evaluation
This project focuses on building reliable and efficient malware detection models using Machine Learning and Deep Learning techniques to improve current detection methods. By exploring several model types, including regression, classification, and neural networks, the aim is to identify the best-performing models for detecting malware accurately.
The project results showed that:
- The Random Forest Classifier achieved a validation accuracy of 98.85%, with a precision of 98.94, F1 score of 98.98, and an R2 score of 95.30.
- The Neural Network Model reached an accuracy of 96.32%, with a precision of 96.64, F1 score of 96.76, and an R2 score of 85.02.
- Logistic Regression Models were tested as well, but their validation accuracy was lower, with scores of 85.54% and 86.1%. The first Logistic Regression model had a precision of 83.53, F1 score of 87.93, and an R2 score of 41.09, while the second model achieved a precision of 84.09, F1 score of 88.37, and R2 score of 43.35.
Based on these findings, the Random Forest Classifier and Neural Network models were selected as the primary models for identifying malware due to their strong accuracy and consistency. These selected models were then integrated into a backend system to create a fully functional web application, providing users with an interactive platform where they can upload files for malware detection and receive detailed report.
This individual project showcases a practical approach to creating effective malware detection models and highlights the full ML lifecycle from data preparation to model deployment in a user-friendly web application.
Web Application: Live On
- Frontend Source Code: malditectist-webapp-frontend
- Backend Source Code: malditectist-webapp-backend
- Click Here to Visit the live web application.
Access the Preprocessed Dataset: Google Drive Link
This project is licensed under the MIT License. See LICENSE for more details.