CypherNova1337 presents WebRecon-Arsenal, a comprehensive framework for web application reconnaissance. This framework is built upon the foundational work of several respected security researchers, combined with my own techniques and workflow developed over time. It's designed to be a practical, adaptable, and educational resource for penetration testers and bug bounty hunters.
This framework draws inspiration from the methodologies and tools used by many skilled individuals in the infosec community. While it's impossible to list everyone, I'd like to acknowledge the general influence of:
- The broader bug bounty hunting community and their shared knowledge.
- Public writeups and presentations on reconnaissance techniques.
- Developers of the excellent open-source tools used in this framework.
- @Coffinxp
This is a synthesis of best practices, not a copy of any single person's approach.
Many aspiring web application pentesters are drawn to the field by the prospect of quick bug bounty rewards. They often fall into the trap of immediately attempting exploits or running automated vulnerability scanners without a thorough understanding of the target application. This approach rarely leads to significant findings and often results in wasted effort.
Reconnaissance is not just a preliminary step; it is the foundation upon which a successful penetration test is built. A meticulous recon phase can be the deciding factor between identifying critical vulnerabilities and finding nothing of value. It's about building a comprehensive understanding of the target, including:
- Attack Surface: The totality of exposed assets.
- Technology Stack: Identifying the web server, framework, database, etc.
- Functionality: Grasping how the application operates, features, user roles, and data flows.
- Hidden Assets: Uncovering forgotten subdomains, dev environments, and exposed configuration files.
The Time Commitment:
Reconnaissance is not a quick process, especially for medium to large web applications. The recon phase can easily span weeks, or even months. This is an investment. The deeper your understanding of the target, the more effectively you can tailor your attacks. The attack phase itself can also be a lengthy process, potentially taking weeks or months. There are thousands of recon methods.
This guide presents a multi-stage reconnaissance process that blends automated tools and manual analysis. It's thorough, but adaptable. Tailor it to your specific targets.
Key Principles:
- Iterative: As you uncover new information, feed it back into your tools.
- Layered: Use multiple tools and techniques.
- Manual Analysis: Don't rely solely on automation.
The Process (Step-by-Step):
-
Preparation (One-Time Setup):
-
Install the required tools (see the install.md for detailed instructions).
-
Create or obtain the necessary wordlists:
permutation_wordlist.txt: For subdomain permutations (common prefixes, suffixes, numbers). Example: dev, test, staging, backup, 1, 2023, etc. Good one to start with is:https://gist.github.com/six2dez/ffc2b14d283e8f8eff6ac83e20a3c4b4
vhost_wordlist.txt: For virtual host discovery (common hostnames, your discovered subdomains). Example: www, mail, dev, admin, etc. Good one to start with is:https://github.com/maverickNerd/wordlists/blob/master/vhost.txt
parameter_fuzzing_wordlist.txt: For fuzzing parameters. SecLists is a good source:git clone https://github.com/danielmiessler/SecLists.git
xss_wordlist.txt: For XSS payloads (SecLists). SecLists/Fuzzing/XSS/robot-friendlyresolvers.txt: A list of known-good DNS resolvers.git clone https://github.com/trickest/resolvers.git
/home/USER/Documents/oneListForall/onelistforallshort.txt: Your general-purpose directory brute-forcing wordlist. Ensure this path is correct.git clone https://github.com/six2dez/OneListForAll
/home/USER/Documents/nuclei-templates/: Your Nuclei templates directory. Ensure this path is correct.
I highly recommend using Coffinxp's GF Patterns. Download them and copy them to your .gf folder in USER/.gf (hidden folder) Say yes to merging and replacing.
git clone https://github.com/coffinxp/GFpattren
-
-
Reconnaissance Steps:
-
Create Target Directory and
domains.txt:mkdir "TARGET" cd "TARGET" nano domains.txt
Inside
nano: Enter in-scope domains, one per line, withouthttp://,https://,www, or trailing slashes. Save (Ctrl+O, Enter) and exit (Ctrl+X). -
Phase 1: Expanded Subdomain Enumeration
-
Subfinder (Initial Scan):
subfinder -dL domains.txt -all -recursive -o subdomains.txt
- What: Finds subdomains using passive sources.
-dLreads domains fromdomains.txt.-alluses all sources.-recursivefinds subdomains of subdomains.-osaves tosubdomains.txt. - Why: Expands the attack surface by identifying initial subdomains.
- What: Finds subdomains using passive sources.
-
crt.sh (Certificate Transparency):
curl -s "https://crt.sh/?q=%25.$TARGET&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | anew subdomains.txt
- What: Queries crt.sh for subdomains.
curlfetches data,jqextracts domain names,sedremoves wildcards,anewappends tosubdomains.txt(no duplicates). - Why: Finds subdomains from certificate transparency logs, often revealing hidden ones.
- Note: Do this manually for each domain. It gathers a lot of useful information. If you get something like parse error: Invalid numeric literal at line 1, column 10 just do it again and it should be fine.
Optional Step: If you have specific subdomains or things like contact forms that are Out Of Scope you can run the following to filter them out:
grep -v -E "sub\.domain\.com|contact|sub2\.domain\.com" subdomains.txt > filtered_subdomains.txt
That will remove sub.domain.com, anything regarding contact forms, and sub.domain.com2 and put the rest of the subdomains into a filtered_subdomains.txt file. IF YOU DO THIS you must replace anything following that has to do with "subdomains.txt" like step 3, with "filtered_subdomains.txt".
- What: Queries crt.sh for subdomains.
-
Permutation Scanning (dnsgen + puredns + httpx):
dnsgen /path/to/your/subdomains.txt -w /path/to/permutations_list.txt > dnsgen_output.txt puredns resolve dnsgen_output.txt -r /path/to/resolvers.txt -w permuted_subdomains.txt --wildcard-tests 5 cat subdomains.txt permuted_subdomains.txt | anew subdomains.txt
- What:
dnsgencreates subdomain variations.purednsresolves them (checks for valid IPs) using yourresolvers.txtlist. Finally, the results are combined. - Why: Finds hidden subdomains with predictable naming patterns.
- Note: This can take some time, but it is worth every second. Also, PureDNS is a high bandwith scan, meaning it might cause issues with other devices like a gaming console on the same network. So if potentially disrupting other things on the network is a no go, then skip this step. Youll still have a good attack field, just might miss some stuff. Also dont combine this scan with other background scans like dirsearch.
- What:
-
-
Phase 2: Subdomain Probing and Filtering
- httpx (Live Host Discovery):
cat subdomains.txt | httpx -ports 80,443,8080,8000,8888 -threads 200 -title -tech-detect -o Tech_Subdomains_alive.txt cat Tech_Subdomains_alive.txt | awk '{print $1}' | sort -u > subdomains_alive.txt
- What: Checks which subdomains have active web servers.
-portsspecifies common HTTP/HTTPS ports.-threadssets concurrency.-osaves live subdomains.-titleand-tech-detectgather extra info. - Why: Focuses on live targets and provides initial reconnaissance data.
- What: Checks which subdomains have active web servers.
- httpx (Live Host Discovery):
-
Phase 3: URL Discovery and Content Extraction
-
Katana (Crawling):
katana -u subdomains_alive.txt -d 5 -c 50 -kf -jc -fx -ef woff,css,png,svg,jpg,woff2,jpeg,gif,svg >> allurls.txt- What: Crawls websites to discover links and resources.
-usets starting URLs,-dis crawl depth,-kf,-jc,-fx, and-efspecify file extractions. - Why: Maps the website's structure and content.
- What: Crawls websites to discover links and resources.
-
GAU (Alternative URL Discovery):
cat subdomains_alive.txt | gau | anew allurls.txt
- What: Gathers URLs from AlienVault OTX, Wayback Machine, and Common Crawl.
- Why: Increases URL coverage from diverse sources.
-
Waybackurls:
cat subdomains_alive.txt | waybackurls | anew allurls.txt
- What: Another tool using Wayback Machine
- Why: Gets more Urls
-
Filter for Potentially Sensitive Files:
cat allurls.txt | grep -E "\.txt|\.log|\.cache|\.secret|\.db|\.backup|\.yml|\.json|\.gz|\.rar|\.zip|\.config" >> sens1.txt
- What: Filters URLs based on extensions associated with sensitive data.
- Why: Prioritizes URLs likely to contain sensitive information.
-
Extract JavaScript Files:
cat allurls.txt | grep -E "\.js.?" | sort -u >> alljs.txt
- What: Isolates URLs pointing to JavaScript files.
- Why: JS files often contain valuable information for attackers.
-
Extract URLs from JavaScript (subjs):
cat alljs.txt | subjs >> js_extracted_urls.txt cat allurls.txt js_extracted_urls.txt | anew allurls.txt
- What: Parses JS code to extract embedded URLs.
- Why: Finds URLs loaded dynamically by JS, often missed by crawlers.
-
Unique Parameter Filtering and Potential Addition JS Files Based off of @Coffinxp's method
cat allurls.txt | gau > params.txt cat params.txt | uro -o filterparams.txt cat filterparams.txt | grep ".js$" | anew alljs.txt cat alljs.txt | uro | anew alljs.txt
COPY sort.py to target directory
python3 sort.py
- What: Extracts unique parameters.
- Why: Identifies unique parameters for fuzzing.
-
-
Phase 4: Directory and File Brute-Forcing
-
Dirsearch (General Wordlist):
dirsearch -l subdomains_alive.txt -x 500,502,429,404,400 -R 5 --random-agent -t 100 -F -o directory.txt -w /home/USER/Documents/oneListForall/onelistforallshort.txt
- What: Tries common directory/file names using a wordlist.
-ltargets live subdomains.-xexcludes error codes.-Rsets recursion depth.--random-agenthelps avoid detection.-tis concurrency.-Ffollows redirects.-wspecifies the wordlist. - Why: Finds hidden content not linked from the website.
- Note: This step can take a significant amount of time, especially with a large wordlist like
onelistforallshort.txtand a deep recursion level. However, it can be run in the background while you proceed with other reconnaissance steps. Theonelistforallshort.txtwordlist is designed to be comprehensive, and using it withdirsearchcan effectively map out the entire directory structure of the target application, potentially revealing a goldmine of hidden functionality and sensitive files – think of it as building your own detailed "blueprint" of the website. This is why the output file is nameddirectory.txt.
- What: Tries common directory/file names using a wordlist.
-
Dirsearch (Targeted Extensions - Iterative): Run this loop in your terminal.
while read -r subdomain; do dirsearch -u "$subdomain" -e conf,config,bak,backup,swp,old,db,sql,asp,aspx,aspx~,asp~,py,py~,rb,eb~,php,php~,bkp,cache,cgi,csv,html,inc,jar,js,json,jsp,jsp~,lock,log,rar,sql.gz,sql.zip,sql.tar.gz,sql~,swp~,tar,tar.bz2,tar.gz,txt,wadl,zip,.log,.xml,.js,.json -x 500,502,429,404,400 -R 2 --random-agent -t 20 -F -o "dirsearch_extensions_subdomain.txt" done < subdomains_alive.txt
- What: Runs
dirsearchfor each live subdomain, focusing on specific file extensions (-e). - Why: Targets potentially sensitive file types (configs, backups, etc.).
- Note: This iterative approach, while also potentially time-consuming, can be run concurrently with other tasks. It's a more focused attack than the general brute-force, increasing the chances of finding specific types of sensitive files on each subdomain. Because it runs individually for each subdomain, the output is organized into separate files (
dirsearch_extensions_$subdomain.txt), making analysis easier.
- What: Runs
-
-
Phase 5: Vulnerability Scanning with Nuclei
-
Nuclei (JS Exposures):
cat alljs.txt | nuclei -t /home/USER/Documents/nuclei-templates/http/exposures/ -c 30 -o nuclei_js_exposures.txt- What: Scans JS files for common vulnerabilities using Nuclei templates.
- Why: Identifies potential security issues in JavaScript code.
-
Nuclei (General CVEs, OSINT, Tech):
nuclei -list subdomains_alive.txt -tags cves,osint,tech -o nuclei_general.txt
- What: Broad scan for known vulnerabilities, gathers OSINT data, and identifies technologies.
- Why: Identifies known exploits and technology details for targeted attacks.
-
Nuclei (CORS):
nuclei -list subdomains_alive.txt -t /home/USER/Documents/nuclei-templates/http/misconfiguration/cors/ -o nuclei_cors.txt
- What: Checks for Cross-Origin Resource Sharing (CORS) misconfigurations.
- Why: CORS issues can allow unauthorized data access.
-
Nuclei (CRLF):
nuclei -list subdomains_alive.txt -t /home/USER/Documents/nuclei-templates/http/crlf/ -o nuclei_crlf.txt
- What: Checks for CRLF injection vulnerabilities.
- Why: CRLF injection can lead to various attacks, including header manipulation.
-
Nuclei (LFI - using gf patterns):
cat allurls.txt | gf lfi | nuclei -tags lfi -o nuclei_lfi.txt
- What: Uses
gfpatterns to find potential Local File Inclusion (LFI) vulnerabilities, then scans them with Nuclei. - Why: LFI allows attackers to read arbitrary files from the server.
- What: Uses
-
-
Phase 6: Specialized Checks and Tools
-
SecretFinder (JavaScript Secrets): Run this loop in your terminal.
cat alljs.txt | while read url; do python3 /path/to/SecretFinder/SecretFinder.py -i "$url" -o cli; done >> secret.txt
- What: Analyzes JS files for hardcoded secrets (API keys, passwords).
- Why: Finds potentially exposed credentials.
-
Subdomain Takeover (Subzy):
subzy run --targets subdomains.txt --concurrency 100 --hide_fails --verify_ssl >> subdomaintakeover.txt- What: Checks if subdomains are vulnerable to takeover.
- Why: Prevents attackers from hijacking subdomains.
-
CORS Misconfiguration (Corsy):
python3 /path/to/Corsy/corsy.py -i subdomains_alive.txt -t 10 --headers "User-Agent: GoogleBot\nCookies: SESSION=VoidSec" >> corsmisconf.txt
- What: Another check for CORS misconfigurations, using a different tool.
- Why: Provides a second opinion on CORS vulnerabilities.
-
Open Redirect (OpenRedirex):
cat allurls.txt | gf redirect | openredirex >> open_redirects.txt
- What: Identifies potential open redirect vulnerabilities using
gfpatterns. - Why: Prevents attackers from redirecting users to malicious sites.
- What: Identifies potential open redirect vulnerabilities using
-
XSS (dalfox): Replace
your_xss_endpoint_herewith your blind XSS collaborator URL.cat allurls.txt | dalfox pipe -b your_xss_endpoint_here -o dalfox_xss.txt- What: Finds Cross-Site Scripting (XSS) vulnerabilities.
- Why: Prevents attackers from injecting malicious scripts.
-
-
Phase 7: Port Scan (naabu)
- Full Port Scan with Naabu + Nmap:
naabu -list subdomains.txt -c 50 -nmap-cli 'nmap -sV -sC -Pn' -o naabu-full.txt- What: Performs a port scan and service version detection.
-nmap-cliuses Nmap for detailed scanning.-Pnskips host discovery (important if ICMP is blocked). - Why: Identifies open ports and running services, revealing potential attack vectors.
- What: Performs a port scan and service version detection.
- Full Port Scan with Naabu + Nmap:
-
Phase 8: Virtual Host Discovery
-
Extract IPs from naabu results:
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' naabu-full.txt | sort -u > ips.txt
- What: Gets the IP addresses
- Why: used for virtual host discovery
-
Vhost Discovery (ffuf): Run this loop.
while read -r ip; do ffuf -w subdomains.txt:FUZZ -u https://$ip -H "Host: FUZZ.$TARGET" -fs 0 -o vhost_results_"$ip".txt done < ips.txt
If you do not have any ips or wish to also vhost fuzz domains you can do:
ffuf -u https://TARGET.com -w path/to/vhost.txt:VHOST -H "Host: VHOST" -o domain_vhost.json -of json- What: Tries different
Hostheaders to find virtual hosts on the same IP.-wis the wordlist,-uis the URL,-Hsets theHostheader,-fs 0filters by response size (removes common responses). - Why: Discovers hidden websites hosted on shared infrastructure.
- What: Tries different
-
-
Phase 9: Parameter Fuzzing
- Parameter Fuzzing (XSS Example):
cat allurls.txt | gf xss | xargs -I TARGETURL ffuf -w /path/to/your/xss_wordlist.txt:FUZZ -u TARGETURL -fs 0 -o ffuf_xss_results.txt
- What: Uses
gfto find potential XSS parameters, thenffufto fuzz those parameters with yourxss_wordlist.txt.-fs 0filters by response size. This is an example; you can adapt this for other vulnerability types (SQLi, LFI, etc.) with differentgfpatterns and wordlists. - Why: Actively tests how the application handles potentially malicious input in parameters, looking for vulnerabilities.
- What: Uses
- Parameter Fuzzing (XSS Example):
-
Phase 10: Screenshotting (aquatone)
- Take screenshots of live subdomains:
cat subdomains_alive.txt | aquatone -out aquatone_screenshots- What: Aquatone takes screenshots of websites.
- Why: Visual inspection can reveal interesting features, login panels, or outdated software versions that might not be apparent from automated scans.
- Take screenshots of live subdomains:
-
Included in this repository is a Python script, sort.py, designed to process the output of parameter discovery. This script performs the following actions:
- Reads: It reads a file named
filterparams.txt. This file should contain a list of parameters, one per line (typically generated by tools likeuro). - Sorts: It sorts the parameters alphabetically.
- Limits (Optional): If the number of parameters is very large, it truncates the list to the first 100,000 entries. This is a practical consideration.
- Writes: It writes the sorted (and potentially truncated) list to a new file named
sorted_params_100000.txt. - Credit: Credit to @Coffinxp
Why this is useful:
- Organization: Sorting parameters helps in identifying patterns and prioritizing testing efforts.
- Performance: Some tools may perform better with a smaller, more focused set of parameters. You can always use the original
filterparam.txtif needed.
Manual Recon (Shodan):
- Shodan Recon (Manual): Perform these steps manually in the Shodan web interface.
-
Search:
ssl:'target.com' 200 -
Click "More" on top organizations.
-
Check
http.titles -
Check
http.components -
What: Shodan is a search engine for internet-connected devices.
-
Why: Finds publicly exposed services, technologies, and potential vulnerabilities associated with the target's IP addresses.
-
This reconnaissance process provides a strong foundation for web application penetration testing. However, remember this is my generalized approach and might need adjustments for specific targets. For larger organizations, consider researching business acquisitions and performing recon on those acquired companies' assets. There are countless other recon methods and steps; I adapt my approach based on the target. The key is to be adaptable, creative, and persistent. This is a starting point; constant learning and adaptation are essential in the ever-evolving world of cybersecurity.
This guide and the tools are for educational purposes and authorized testing only. Unauthorized hacking is illegal and unethical. Always obtain explicit, written permission before testing any system. I am not responsible for any misuse of this information.