A Python-based tool for analyzing, and evaluating website Terms & Conditions documents with a focus on privacy and data protection compliance.
Note: This project now uses crawl4ai for web scraping instead of Selenium. See MIGRATION_SUMMARY.md and CRAWL4AI_GUIDE.md for details.
- GDPR compliance scoring
- Privacy and legal compliance analysis
- Web scraping of Terms & Conditions pages (powered by crawl4ai)
- Multi-query document retrieval
- Comprehensive JSON output of privacy metrics
- Python 3.8+
- Internet connection (for first-time browser setup)
- Clone the repository:
git clone https://github.com/yourusername/privacy-analyzer.git
cd privacy-analyzer- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
- Create a
.envfile - Add your Groq API key:
GROQ_API_KEY=your_api_key_here
Modify the llm initialization in get_json() to choose different language models:
mixtral-8x7b-32768llama3-8b-8192- Other Groq-supported models
Update local_model_path to use a different Hugging Face embedding model.
from main import get_json
# Analyze Terms & Conditions
url = "https://example.com/terms"
results = get_json(url)- LangChain
- Crawl4AI (web scraping)
- Hugging Face Transformers
- Groq API
- Chroma Vector Store
- Accuracy depends on webpage structure
- Requires internet connection for scraping
- Limited to publicly accessible web pages
This tool provides an automated analysis and should not replace professional legal advice.