Skip to content

priyadhanu14/Vulnerability-Detection-Software-Code

Repository files navigation

🔒 Vulnerability Detection in Source Code

A Deep Learning–Based Project inspired by Russell et al., 2018


📌 Overview

Modern software systems are plagued by hidden vulnerabilities such as buffer overflows, null pointer dereferences, and improper input validation.
This project implements a deep representation learning approach to detect vulnerabilities in C/C++ functions by directly interpreting lexed source code, moving beyond rule-based static analyzers.


🚀 Pipeline

  1. Data Collection

    • C/C++ functions from GitHub, Debian packages, and the SATE IV Juliet Test Suite
    • 12M+ functions curated and labeled using static analyzers
  2. Lexical Representation

    • Custom C/C++ lexer → reduces vocabulary to ~156 tokens
    • Strips comments, normalizes identifiers, standardizes types
  3. Modeling

    • Deep neural network trained on function-level lexed code
    • Learns semantic representations of vulnerable vs. safe code
  4. Evaluation

    • Benchmarked on NIST SATE IV and real-world open-source projects
    • Detects multiple CWE categories (buffer overflows, null pointer errors, input validation flaws, etc.)

🔬 Key Findings

  • Deep learning can learn vulnerability signatures directly from raw source code.
  • Outperformed traditional static analysis and shallow ML baselines.
  • Effective across diverse datasets, showing strong generalization to unseen code.

📊 Example Results

CWE Category Frequency in Dataset Detection Capability
Buffer Overflow (CWE-120/121) 38.2% High ✅
Memory Bound Errors (CWE-119) 18.9% High ✅
NULL Pointer Dereference (476) 9.5% High ✅
Pointer Misuse (469) 2.0% Moderate ⚠️
Input Validation / Misc. 31.4% Variable

🛠️ Tech Stack

  • Languages: Python, C/C++
  • Frameworks: PyTorch / TensorFlow
  • Tools: Custom lexer, static analyzers, NIST SATE IV dataset
  • Concepts: Deep Representation Learning, Token Embeddings, Supervised Classification

📖 References

  • Rebecca L. Russell, Louis Kim, Lei H. Hamilton, Tomo Lazovich, Jacob A. Harer, Onur Ozdemir, Paul M. Ellingwood, Marc W. McConley
    Automated Vulnerability Detection in Source Code Using Deep Representation Learning.
    arXiv:1807.04320

🙌 Acknowledgments

This project was built as part of my research/academic exploration in secure software engineering. Inspired by Draper’s work on large-scale ML for vulnerability detection.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages