Evaluated guardrails, prompt design and a classification model to assess their effectiveness in mitigating direct prompt injection attacks. Integrated the Model Context Protocol to standardize and secure API interaction. Setup a sandbox mode to provide hands-on experience.
My main contributions: Everyhing except Homepage.py was fully written by me. Homepage.py is inspired by another project and has been extended to include a sandbox mode.
Security-Evaluation on custom dataset:
User-Application to try custom security setups:
Note: You will have to install GuardrailsAI manually (see Section 2)
To set up a new Conda environment with Python 3.10 and install the required dependencies:
conda create --name hacking_bot python=3.10 -y
conda activate hacking_bot
pip install -r requirements.txtTo install guardrail from GuardrailsAI use:
guardrails configure
guardrails hub install hub://guardrails/guardrails_pii
Note: You will have to get an API key from their service.
Create a file secrets.toml in .streamlit/ and add your OpenAI key as OPENAI_API_KEY="..."
streamlit run Homepage.pymcp dev src/utils/mcp_server.pymore information here: https://github.com/modelcontextprotocol/python-sdk?tab=readme-ov-file The mcp dev seems to be buggy from time to time. It worked after waiting 10-20s and just then start.

