elliot/prd.md at main · mailshieldai/elliot

Product Requirements Document: AI-Powered Automated Penetration Testing Web App Introduction This document outlines the requirements for an AI-driven automated penetration testing website. The product will allow users to easily perform security scans on their web applications by simply entering a domain name. The system will leverage automation and artificial intelligence (AI) to carry out penetration testing tasks and present results in a user-friendly report. Key technologies include a Next.js front-end for the user interface, a Golang back-end for orchestration, a Kali Linux Docker container running security scanning tools, and an OpenAI GPT-5 powered agent (via Codex CLI) to analyze scan results and guide the testing process. The initial release will be free to use (login required), focusing on core functionality and user experience. Goals and Objectives • On-Demand Security Scanning: Enable users to initiate a penetration test by entering a domain, without requiring security expertise. The goal is to make basic web vulnerability assessment accessible with minimal setup. • Automation and Efficiency: Automate the end-to-end penetration testing workflow – from reconnaissance to vulnerability analysis – saving users time and effort. The system should run common scanning tools automatically and use AI to drive deeper analysis based on findings. • AI-Enhanced Analysis: Leverage an AI agent to intelligently interpret scan outputs, decide next steps (e.g., run additional scans on discovered ports), and summarize findings. This provides more insightful results than raw scanner output, highlighting what matters most. • User-Friendly Reporting: Present the results of the pen test in a clear, concise report for the user. Even non-technical users should be able to understand the identified issues and potential security risks. • Scalability for Future Growth: Lay the groundwork for future enhancements (e.g. user subscriptions, scheduling scans, saving reports) by using robust technology (Next.js, Golang, containerization) that can scale as the user base grows. User Workflow

Landing Page & Domain Input: A visitor arrives at the web app and is greeted with a single-page interface. The main element on this page is an input field to enter a target domain (e.g. example.com) and a prominent "Run Security Scan" button.
Authentication via Clerk: If the user is not logged in, clicking "Run Security Scan" will prompt the user to log in or sign up (handled by Clerk authentication service). This ensures scans are tied to user accounts and paves the way for usage tracking or future billing.
Initiating the Scan: Once authenticated, the user can submit their domain for scanning. The front-end will send a request to the back-end with the domain name. The UI will indicate that the scan is in progress (e.g. a loading spinner or progress message).
Automated Scanning Process: The Golang back-end triggers a Kali Linux Docker container to perform the security tests on the provided domain. This includes:
Reconnaissance: Discover subdomains and related hosts for the domain (using tools like subdomain scanners).
Port Scanning: Scan the target host(s) for open ports and services (using tools like Nmap).
Vulnerability Scanning: For common services or web ports discovered, run relevant automated checks (e.g. using Nikto or other vulnerability scanners) to find known weaknesses. Throughout this process, the AI agent (GPT-5 via Codex CLI) monitors outputs and can instruct further actions. For example, if the AI sees a web server on port 80, it might trigger a deeper web vulnerability scan or attempt to fetch the homepage for analysis.
AI Analysis & Iteration: The AI agent receives the outputs of each tool and decides if additional scanning steps are needed. It acts as a smart orchestrator: analyzing results in real-time and possibly running more commands (via the Codex CLI which allows the AI to execute commands in the container) to probe deeper based on findings. This iterative cycle continues until the AI determines that enough information is gathered or a predefined scan coverage is reached.
Results Compilation: Once scanning is complete, the AI agent compiles a security report. The agent uses its understanding of the findings to generate a summary of vulnerabilities, issues, and recommendations. The Golang back-end collects this final output (along with any raw data needed) and sends it to the front-end.
Report Display: The front-end displays the results to the user in a clean format. This could include a list of identified vulnerabilities (with severity levels), affected assets (domains/ports), and recommendations for mitigation. The design will emphasize clarity – e.g. using bullet points or sections for different vulnerability categories – so that users can quickly grasp their security posture.
Follow-up Actions: For the initial version, the user can run another scan by inputting a new domain or log out. In future iterations, features like downloading the report, emailing it, or saving it to the user’s account could be added. At this stage, no payment is required for running scans (no billing flow in MVP), but users must be logged in, which helps prevent abuse and allows tracking usage. Key Features and Functional Requirements • Single-Page Interface: The application will be a one-page design where the domain input form and results are shown. Keeping it one page ensures a simple and fast user experience. After login, the same page will transition to show scan status and then results. • User Authentication: Integrate Clerk for authentication. Users must log in or register before launching a scan. Clerk will handle password management, social logins (if enabled), and session management. This is crucial for rate limiting and potential future premium features. • Domain Input & Validation: The system should validate the domain input (e.g. correct format, not an IP or URL path, etc.). If invalid, it prompts the user to correct it. • Scan Orchestration (Back-End): The Golang back-end will handle incoming scan requests. For each request, it will: • Spawn or reuse a Kali Linux Docker container preloaded with necessary security tools. The container provides an isolated environment for running potentially dangerous commands safely. • Use the AI agent to drive the scanning process (detailed below in AI Integration). • Manage timeouts or limits (e.g. limit each scan to a reasonable duration and resource usage to avoid overloading the server). • Ensure results from the container (tool outputs, AI findings) are captured for analysis. • AI-Driven Scanning Agent: Utilize an AI penetration testing agent powered by OpenAI’s GPT-5 (via Codex CLI) to interpret results and make decisions. The agent will run within the back-end or directly inside the Kali container environment with appropriate access. Functional requirements for the AI agent include: • Reading outputs from scanning tools (e.g. Nmap results) and understanding them. • Deciding on next actions: e.g., "If open ports 80/443 are found, run a web scanner; if an FTP service is found, attempt to list anonymous login," etc. This decision logic is not hardcoded but emerges from the AI’s prompt and capabilities (see AI Agent System Prompt below for how the AI is instructed). • Executing additional commands via Codex CLI automatically (in Auto or Full Access mode, so it doesn’t require human approval for each step) to perform those next actions. • Preventing unsafe actions: the AI’s access will be confined to the container and the scope of the target domain only. The prompt will instruct it to avoid any actions outside the target scope or that could harm the host system. • Summarizing all findings in a coherent report format at the end. • Progress Feedback: As the scan runs, the user should receive some feedback that things are working. For the MVP, this could be a simple message like "Scanning in progress, this may take a few minutes..." on the page. Optionally, a dynamic status area could list high-level steps (e.g. "Enumerating subdomains... Scanning ports... Running vulnerability analysis..."). Even if detailed progress is not shown in MVP, ensure the front-end periodically checks the scan status so the user isn’t left waiting with no information. • Result Report: Once complete, the user sees the results on the same page. The report should include: • A summary of the scan (e.g. "Scanned 12 subdomains and 3 services. Found 2 potential vulnerabilities."). • A list of identified vulnerabilities or issues, each with a short description. For example, "⚠️ Open Port 22 – The server allows SSH connections on port 22. Ensure strong passwords or key-based auth." or "❗ Outdated Apache Version – Apache/2.4.1 detected on port 80, which has known vulnerabilities. Consider updating to a newer version." • Recommendations or next steps for the user (especially for any critical findings). The report content will largely come from the AI’s analysis. The front-end can style this content (using icons, colored text for severity, etc.) for better readability. • No Payment in MVP: The product will not handle payments or subscription logic in the first version. All users who log in can run scans freely. (The system might impose internal limits, like one scan at a time or a certain number of scans per user per day, to prevent abuse, but there’s no billing flow.) The design should, however, keep in mind that adding a payment wall or premium tier later is likely – e.g., keep user roles or plan info available in the user model from Clerk even if only one default role exists now. • Logging and Monitoring: The system should log important events for debugging and security auditing. For example, log when a scan starts and ends, which domain was scanned (to prevent misuse or allow blocking if needed), any errors in the scanning process, etc. This will help troubleshoot issues and also keep an eye on whether users are scanning only permissible targets. Non-Functional Requirements • Security and Ethics: Given the nature of the product, security is paramount. The scanning environment (Kali container) must be isolated from the main system to prevent malicious commands from affecting the host. The AI agent’s scope will be restricted to the target domain provided by the user, and we will include disclaimers that users should only scan domains they own or have permission for. All data in transit should be encrypted (HTTPS for the web interface, secure channels between backend and any external API like OpenAI). • Performance: A typical scan should complete within a reasonable time (e.g. 1-5 minutes for a basic scan of a single domain). The back-end must handle scans efficiently, possibly by running tasks in parallel (for subdomain scanning, port scanning, etc.) to speed up the process. The AI analysis should be done with a high-performance model (GPT-5) to minimize delay in reasoning. If a scan is too slow or times out (for example, a domain with hundreds of subdomains might be slow), the system should handle this gracefully (perhaps by stopping after a certain limit and reporting partial results). • Scalability: The architecture should support multiple users running scans concurrently. Using Golang for the back-end ensures good concurrency handling, and containerizing the scanning environment allows spawning multiple containers if needed. We might impose a cap (like maximum N concurrent scans) initially. In the future, the system can be scaled horizontally by running multiple back-end worker instances and container hosts. • Usability: The application should be intuitive. A non-technical user should understand how to use it (enter domain, log in, get results). The output report should avoid overly technical jargon where possible, or at least explain it (since one objective is to make security info accessible). The design will be clean and minimalistic, focusing on the scan function and results. • Maintainability: All components should be modular. The front-end (Next.js) and back-end (Go) communicate over a clear API, making it easy to update one without breaking the other. The security tools and AI logic are contained in the Docker environment, which can be versioned and updated (for example, updating to newer Kali or adding new tools over time). The AI prompt and behavior can be adjusted without affecting the rest of the system, enabling quick improvements to the scanning logic. • Compliance: If storing or processing any sensitive data, ensure compliance with privacy laws. For MVP, we are not storing personal data beyond login credentials (handled by Clerk) and scan results. If logs store scan results, that should be protected. Additionally, make it clear in Terms of Service that the user consents to this automated scanning. Technical Architecture The system is composed of several integrated components, each with a specific role: • Front-End: Built with Next.js (React). This provides the landing page, login interface (via Clerk’s React components), the domain input form, and the results page. Next.js allows for a fast, responsive UI and server-side rendering if needed for faster initial load. The front-end will communicate with the back-end via REST API calls (or GraphQL if chosen, but REST is sufficient for MVP). For example, a POST request to /api/scan with the domain name could initiate the scan. • Authentication: The app will use Clerk for user management. Clerk provides pre-built UI components for login/sign-up modals and handles sessions and secure token management. When the front-end calls the back-end API, it will include the user’s auth token so the back-end can verify the user (Clerk provides middleware/SDK for this). This ensures only authenticated requests trigger scans. • Back-End: Implemented in Golang, running as the web server and orchestrator. Key responsibilities of the back-end: • Expose an endpoint (e.g. POST /scan) to receive scan requests (domain name) from the front-end. • Validate and sanitize the input (e.g., ensure it's a domain format, perhaps prevent obviously disallowed targets). • Initiate the scanning process in a new Kali Linux Docker container. This could be done by using the Docker API/SDK in Go to programmatically start a container with the necessary tools available. The container could be pre-built with tools like nmap, dirbuster, nikto, etc., and with the Codex CLI environment set up for the AI agent. • Feed the domain to a scanning script or directly to the AI agent which will run inside the container. There are two possible approaches: (a) AI-centric approach: Launch the Codex CLI AI agent inside the container with a prompt that includes instructions and the target domain, allowing the AI to take over and run tools within the container; (b) Backend-driven approach: Run a sequence of tools and use the AI via API calls to analyze outputs and determine next steps. The design leans towards the AI-centric approach for flexibility, but it requires robust prompt engineering (detailed next). • Monitor the scanning progress. The back-end can listen for the AI’s output or status. If using Codex CLI in auto mode, it might continuously output logs of what it's doing. The back-end can capture these for logging or even to stream back partial info to the front-end (though streaming is an enhancement, not mandatory for MVP). • When the AI finishes (or if the process times out), collect the final report generated. Ensure the container is stopped or destroyed afterwards to free resources and for security isolation per scan. • Return the results to the front-end (e.g., as a JSON containing the report content, which the front-end then displays). • AI Integration: The AI agent is powered by OpenAI GPT-5 through the Codex CLI tool. Codex CLI allows the AI to have a presence in the terminal with the ability to run commands. Our implementation will utilize Codex CLI in an automated fashion. The AI will be given a carefully crafted system prompt (see next section) that defines its role and constraints as a penetration tester. Running in "Full Access" mode (with caution) will allow the AI to execute security tools within the container without pausing for human approval on each command. The GPT-5 model provides advanced reasoning to make the scanning efficient and thorough. We will use the OpenAI API (or ChatGPT Enterprise tools) to power this agent; the Codex CLI acts as an interface for the AI to interact with the local environment. • Kali Linux Environment: The Kali Docker container serves as the sandbox for all scanning activities. It comes with a suite of penetration testing tools. The AI (and by extension the back-end) will rely on these tools to gather information. Some expected tools and their usage: • nmap – Port scanning and service detection on the target host(s). • dnsutils or specialized scripts – Enumerating subdomains (could also use tools like amass or sublist3r). • nikto – Scanning web servers for known vulnerabilities. • wpscan – (if target is identified as a WordPress site, for example) to find WordPress-specific issues. • Other utilities as needed (e.g., ssh client to test if default creds on an open SSH port, etc.). The container approach means these tools are pre-installed and configured. The AI agent will choose which ones to run based on the situation (per its prompt and training). • Result Storage (Temporary): In MVP, results can be kept in memory or ephemeral storage until delivered to the user. We are not building a full database for persisting reports at this stage. If a user re-runs a scan, it will run fresh each time (no history feature yet). Logs can be stored on the server or cloud storage for internal reference but not exposed in the UI initially. AI Agent System Prompt (Codex Agent Instructions) To ensure the AI behaves as a helpful penetration testing assistant, we define a system prompt that guides its actions. This prompt is given to GPT-5 at the start of the Codex CLI session inside the container. It establishes the AI’s role, tools available, and safety boundaries. Below is the system prompt draft for the AI agent: System Prompt (PenTest AI Agent): You are an autonomous penetration testing AI agent operating in a controlled environment (Kali Linux). Your goal is to identify security vulnerabilities in the target provided to you, and report them. Follow these guidelines:

The target domain is provided by the user and is the only scope you are allowed to scan. Do not attack or scan any other domains or external systems not related to the target.
Use the tools available in Kali Linux to gather information and find vulnerabilities. Common tools at your disposal include port scanners (e.g. nmap), web scanners (e.g. nikto, wpscan), and other command-line utilities.
Work step-by-step: first perform reconnaissance (e.g., find subdomains and open ports), then delve into potential vulnerabilities on discovered services.
After each tool run, analyze the output thoroughly. If you find hints of a specific technology or service (for example, an Apache server, an SSH service, etc.), decide on the next appropriate test. Only run relevant and necessary commands to investigate further; avoid redundant or overly destructive actions.
Maintain a log of actions and findings as you progress, so the user can understand what steps were taken. (e.g. "Found port 80 open, running Nikto for web vulnerabilities...").
Safety and legality: Do not perform any scanning or attack beyond informational gathering. Do not exploit vulnerabilities, only identify and report them. Avoid causing any denial of service or damaging the target.
Once you have gathered sufficient information about the target’s security posture, stop the scanning process. Then, compile a report of your findings. The report should list the vulnerabilities or issues discovered, explain their significance, and suggest remediation or next steps. If no significant issues are found, provide a summary of what was checked and note that no major problems were detected.
Present the final output clearly and concisely, suitable for an end-user. Use markdown formatting (lists, headings) if needed to structure the report. Do not include overly technical jargon without explanation, so the report is understandable even to those with basic technical knowledge.
You have full access within this container to execute commands and read files, but you must not access anything outside the container or the target scope. If you are unsure about an action or it violates these guidelines, refrain from doing it. Begin by confirming the target domain and starting reconnaissance. This system prompt ensures the AI knows its mission and boundaries. It essentially turns GPT-5 into a specialized security analyst that can use tools and iteratively approach the problem. By providing these instructions, we reduce the risk of the AI doing something unintended and increase the usefulness of the results. (The prompt may be refined through testing.) Future Considerations and Out of Scope (for MVP) • Payment & Plans: In the future, we may introduce paid plans (for example, a free tier with basic scans and a premium tier with deeper scans or faster processing). For now, the focus is on delivering value and gathering feedback without a paywall. The architecture with user login and usage tracking will make it easier to integrate billing later (e.g., using Stripe) if demand grows. • Report Persistence: A future version might allow users to save scan reports, view past scans, or receive them via email. MVP will not have a user dashboard or history (to keep things simple and focus on core functionality). Implementing a database and more complex back-end logic for report storage is out of scope initially. • Additional Scan Types: The initial product targets web application reconnaissance and basic vulnerability scanning. Later on, we could add more specialized tests (for example, credential brute-forcing for common services, malware scanning, cloud configuration checks if the target is on cloud, etc.). We deliberately limit MVP scope to keep it safe and manageable. • Concurrent Scans & Queue: As usage grows, we may need a queue system or limit concurrent scans per user to manage load. The MVP will likely handle scans sequentially per user (and have a maximum concurrent scan count globally to conserve resources). If demand is high, adding a job queue and background processing would be considered. • GUI Improvements: The one-page UI can be expanded into a richer interface over time. For example, after a scan, we might show interactive elements like clicking a vulnerability to see more details or raw tool output. Also, integrating a small tutorial or FAQ on the page could help new users understand how to use the tool effectively. • AI Model and Tools Updates: We will keep an eye on improvements in AI and security tools. GPT-5 is assumed for this project; if in reality only GPT-4 is available initially, we’ll use the best available model and upgrade when possible. Similarly, the set of tools in Kali can be updated (or even replaced with our own scanning scripts) based on what provides the best results. Conclusion This PRD describes an innovative platform that combines automated security tooling with cutting-edge AI to deliver easy, intelligent penetration testing for users. By logging in and entering a domain, users will receive a tailored security report generated by both traditional scanners and AI analysis. The use of Next.js and Golang ensures a smooth user experience and robust performance, while the integration of a Kali Linux environment and GPT-5 agent provides depth to the scanning process that typical one-size-fits-all scanners lack. The initial version will prove the concept and delight early users with its simplicity and insight. Feedback from this MVP will guide enhancements in security depth, user features, and possibly monetization, as we move forward. The end goal is to make proactive security testing accessible to all developers and admins, empowering them to secure their systems with the help of AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

prd.md

Latest commit

History

prd.md

File metadata and controls