An intelligent documentation analyzer that evaluates MoEngage documentation articles and provides actionable improvement suggestions using advanced AI techniques and modern web technologies.
-
Comprehensive Analysis: Evaluates documentation across 4 key criteria:
- Readability for non-technical marketers
- Structure & Flow assessment
- Completeness of information and examples
- Style Guidelines compliance (Microsoft Style Guide principles)
-
Advanced Web Scraping: Uses Playwright for robust extraction of JavaScript-rendered content
-
AI-Powered Insights: Leverages Large Language Models via OpenRouter API for intelligent analysis
-
Vector-Based Processing: Employs FAISS and embeddings for context-aware content analysis
-
Multiple Output Formats: Generates both JSON and HTML reports with timestamps
URL Input โ Playwright Scraping โ Text Chunking โ Vector Store โ AI Analysis โ Structured Reports
- Content Extraction: Playwright with headless Chrome for dynamic content
- Document Processing: LangChain with RecursiveCharacterTextSplitter
- Vector Search: FAISS with HuggingFace embeddings for relevant chunk selection
- AI Analysis: OpenRouter API integration with flexible model support
- Report Generation: Multi-format output with professional styling
- Python 3.8+
- OpenRouter API key
- Chrome/Chromium browser (for Playwright)
- Clone the repository:
git clone <repository-url>
cd documentation-analyzer- Install dependencies:
pip install -r requirements.txt- Install Playwright browsers:
playwright install chromium- Set up environment variables:
Create a
.envfile in the project root:
OPENROUTER_API_KEY=your_openrouter_api_key_here
DEFAULT_MODEL=deepseek/deepseek-r1-0528-qwen3-8b:freefrom main import DocumentationAnalyzer
# Initialize analyzer
analyzer = DocumentationAnalyzer()
# Analyze a MoEngage documentation URL
url = "https://help.moengage.com/hc/en-us/articles/360035738832-Explore-the-Number-of-Notifications-Received-by-Users"
report = analyzer.analyze(url)
print(f"Analysis Status: {report['status']}")python main.pyThis will display a Tkinter window that accepts a documentation URL as text and analyze a sample MoEngage documentation article and generate reports in the output/ directory.
Upon completion, a success window is displayed.
{
"status": "success",
"url": "https://help.moengage.com/hc/en-us/articles/...",
"analysis_type": "all",
"results": {
"readability": [...],
"structure": [...],
"completeness": [...],
"style": [...]
}
}output/report_YYYYMMDD_HHMMSS.json- Machine-readable analysis resultsoutput/report_YYYYMMDD_HHMMSS.html- Human-friendly formatted report
Challenge: MoEngage documentation uses JavaScript rendering, making standard HTTP requests insufficient.
Solution: Implemented Playwright with proper wait states for network idle and complete content loading.
Challenge: Documentation articles exceed LLM context limits.
Solution: Developed intelligent chunking with FAISS vector search to analyze only relevant sections for each criterion.
Challenge: AI responses vary in format and may include markdown formatting.
Solution: Created robust JSON extraction that handles code blocks and malformed responses gracefully.
Challenge: Efficiently performing different analysis types on the same document.
Solution: Designed a mapping system coordinating analysis functions with targeted vector queries.
- Playwright over Selenium: Better performance and modern async/await support
- Headless browsing: Handles JavaScript-heavy documentation sites effectively
- UTF-8 encoding: Ensures proper handling of special characters
- OpenRouter API: Access to multiple LLM providers and models
- Context-aware prompting: Tailored prompts for each analysis criterion
- Structured output parsing: Robust JSON extraction from varied AI responses
- Hybrid readability assessment: Combines Flesch-Kincaid scoring with qualitative AI analysis
- Vector similarity search: Identifies relevant content sections for targeted analysis
- Microsoft Style Guide focus: Emphasizes clarity, conciseness, and action-oriented language
documentation-analyzer/
โโโAnalyzer/
โโโ main.py # Main application and DocumentationAnalyzer class
โโโutils.py # Core utility functions and analysis logic
โโโ requirements.txt # Python dependencies
โโโ .env # Environment variables (create this)
โโโ output/ # Generated reports directory
โโโ README.md # This file
โโโ.env.example
- Fully functional agent analyzing MoEngage documentation URLs
- Implements all 4 required analysis criteria
- Generates structured, actionable improvement suggestions
- Provides comprehensive reporting in multiple formats
- The bonus task for automatic document revision was not attempted
- This was an optional enhancement task
Here are sample analyses from two different MoEngage documentation articles:
- Readability Issues: Identified 3 sentences exceeding recommended length for marketers
- Structure Improvements: Suggested adding numbered steps for complex procedures
- Completeness Gaps: Recommended adding visual examples for dashboard navigation
- Style Issues: Found 2 instances of passive voice needing conversion
- Readability Score: Flesch-Kincaid grade 8.2 (appropriate for target audience)
- Structure Strength: Well-organized with clear headings and bullet points
- Example Quality: Good use of screenshots but missing code examples
- Style Compliance: Mostly action-oriented with minor jargon simplification needed
The system includes comprehensive error handling for:
- Invalid or unreachable URLs
- API rate limits and network timeouts
- Malformed content extraction
- JSON parsing failures
- File system operations
- Implementation of Task 2 (automatic document revision)
- Support for batch processing multiple URLs
- Integration with additional style guides
- Real-time analysis via web interface
- Advanced caching for improved performance
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenRouter for AI model access
- Playwright team for excellent web automation tools
- LangChain community for document processing capabilities
- MoEngage for providing comprehensive documentation to analyze