🔒 Vulnerability Detection in Source Code

A Deep Learning–Based Project inspired by Russell et al., 2018

📌 Overview

Modern software systems are plagued by hidden vulnerabilities such as buffer overflows, null pointer dereferences, and improper input validation.
This project implements a deep representation learning approach to detect vulnerabilities in C/C++ functions by directly interpreting lexed source code, moving beyond rule-based static analyzers.

🚀 Pipeline

Data Collection
- C/C++ functions from GitHub, Debian packages, and the SATE IV Juliet Test Suite
- 12M+ functions curated and labeled using static analyzers
Lexical Representation
- Custom C/C++ lexer → reduces vocabulary to ~156 tokens
- Strips comments, normalizes identifiers, standardizes types
Modeling
- Deep neural network trained on function-level lexed code
- Learns semantic representations of vulnerable vs. safe code
Evaluation
- Benchmarked on NIST SATE IV and real-world open-source projects
- Detects multiple CWE categories (buffer overflows, null pointer errors, input validation flaws, etc.)

🔬 Key Findings

Deep learning can learn vulnerability signatures directly from raw source code.
Outperformed traditional static analysis and shallow ML baselines.
Effective across diverse datasets, showing strong generalization to unseen code.

📊 Example Results

CWE Category	Frequency in Dataset	Detection Capability
Buffer Overflow (CWE-120/121)	38.2%	High ✅
Memory Bound Errors (CWE-119)	18.9%	High ✅
NULL Pointer Dereference (476)	9.5%	High ✅
Pointer Misuse (469)	2.0%	Moderate ⚠️
Input Validation / Misc.	31.4%	Variable

🛠️ Tech Stack

Languages: Python, C/C++
Frameworks: PyTorch / TensorFlow
Tools: Custom lexer, static analyzers, NIST SATE IV dataset
Concepts: Deep Representation Learning, Token Embeddings, Supervised Classification

📖 References

Rebecca L. Russell, Louis Kim, Lei H. Hamilton, Tomo Lazovich, Jacob A. Harer, Onur Ozdemir, Paul M. Ellingwood, Marc W. McConley
Automated Vulnerability Detection in Source Code Using Deep Representation Learning.
arXiv:1807.04320

🙌 Acknowledgments

This project was built as part of my research/academic exploration in secure software engineering. Inspired by Draper’s work on large-scale ML for vulnerability detection.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
Model		Model
dataset		dataset
functions		functions
routes		routes
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
error_handles.py		error_handles.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔒 Vulnerability Detection in Source Code

📌 Overview

🚀 Pipeline

🔬 Key Findings

📊 Example Results

🛠️ Tech Stack

📖 References

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

priyadhanu14/Vulnerability-Detection-Software-Code

Folders and files

Latest commit

History

Repository files navigation

🔒 Vulnerability Detection in Source Code

📌 Overview

🚀 Pipeline

🔬 Key Findings

📊 Example Results

🛠️ Tech Stack

📖 References

🙌 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages