AI-based threat detection in Large Language Models

Evaluated guardrails, prompt design and a classification model to assess their effectiveness in mitigating direct prompt injection attacks. Integrated the Model Context Protocol to standardize and secure API interaction. Setup a sandbox mode to provide hands-on experience.

My main contributions: Everyhing except Homepage.py was fully written by me. Homepage.py is inspired by another project and has been extended to include a sandbox mode.

Security-Evaluation on custom dataset:

User-Application to try custom security setups:

1 Setup Environment

Note: You will have to install GuardrailsAI manually (see Section 2)

To set up a new Conda environment with Python 3.10 and install the required dependencies:

conda create --name hacking_bot python=3.10 -y
conda activate hacking_bot
pip install -r requirements.txt

2 Manual GuardrailsAI setup

To install guardrail from GuardrailsAI use:

guardrails configure
guardrails hub install hub://guardrails/guardrails_pii

Note: You will have to get an API key from their service.

3 OpenAI API key

Create a file secrets.toml in .streamlit/ and add your OpenAI key as OPENAI_API_KEY="..."

4 How to Run the Application

streamlit run Homepage.py

5 How to Debug an MCP server

mcp dev src/utils/mcp_server.py

more information here: https://github.com/modelcontextprotocol/python-sdk?tab=readme-ov-file The mcp dev seems to be buggy from time to time. It worked after waiting 10-20s and just then start.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
examples		examples
nemo_injection_config		nemo_injection_config
nemo_pii_config		nemo_pii_config
results		results
src		src
style		style
.gitignore		.gitignore
Homepage.py		Homepage.py
README.md		README.md
environment.yml		environment.yml
evaluation.ipynb		evaluation.ipynb
evaluation.py		evaluation.py
level_settings.yaml		level_settings.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-based threat detection in Large Language Models

1 Setup Environment

2 Manual GuardrailsAI setup

3 OpenAI API key

4 How to Run the Application

5 How to Debug an MCP server

About

Uh oh!

Releases

Packages

Languages

ole2412/Prompt-Injection-in-LLMs

Folders and files

Latest commit

History

Repository files navigation

AI-based threat detection in Large Language Models

1 Setup Environment

2 Manual GuardrailsAI setup

3 OpenAI API key

4 How to Run the Application

5 How to Debug an MCP server

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages