Implement Process Knowledge Graph with Web Scraper, Neo4j, OpenSearch, and BYOKG RAG#1
Implement Process Knowledge Graph with Web Scraper, Neo4j, OpenSearch, and BYOKG RAG#1
Conversation
Co-authored-by: hongsam14 <69339846+hongsam14@users.noreply.github.com>
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 21793278 | Triggered | Generic Password | a411945 | src/knowledge_graph/neo4j_manager.py | View secret |
| 21793305 | Triggered | Generic Password | 1bb143b | docker-compose.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 21793278 | Triggered | Generic Password | a411945 | src/knowledge_graph/neo4j_manager.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Co-authored-by: hongsam14 <69339846+hongsam14@users.noreply.github.com>
Co-authored-by: hongsam14 <69339846+hongsam14@users.noreply.github.com>
Co-authored-by: hongsam14 <69339846+hongsam14@users.noreply.github.com>
- fix LIST_TEMPLATE, FILE_TEMPLATE constants - fix get_process_list_from_letter logic - fix get_page_content logic - delete dll scrap pipeline
- add model.py to manage data models for file.net scraping.
- add prompt model - add prompt template and schema samples - add openai llm client
Overview
This PR implements a complete Process Knowledge Graph system that scrapes Windows process and DLL information from file.net, stores it in a knowledge graph, and provides an AI-powered Q&A system using Retrieval-Augmented Generation (RAG).
Implementation
1. Web Scraper for file.net
Implemented a comprehensive web scraper that:
https://www.file.net/process/_a.html)2. Knowledge Graph Storage
Neo4j Integration:
OpenSearch Integration:
3. BYOKG RAG System
Implements a Bring Your Own Knowledge Graph RAG system using:
4. CLI and Deployment
Command-Line Interface:
Docker Support:
Complete containerization with
docker-compose.ymlincluding Neo4j, OpenSearch, and the application.Project Structure
Features
Example Usage
Testing
Documentation
Dependencies
Core technologies:
langchain+langchain-openai- RAG frameworkneo4j- Graph databaseopensearch-py- Search enginebeautifulsoup4+requests- Web scrapingopenai- GPT modelsAll dependencies are specified in
requirements.txt.Getting Started
docker-compose up -d neo4j opensearchpip install -r requirements.txt.envwith your OpenAI API keypython examples/03_rag_poc.pySee QUICKSTART.md for detailed instructions.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
www.file.netimport sys
sys.path.insert(0, '.')
from src.scraper import FileNetScraper
Test basic functionality
scraper = FileNetScraper(delay=0.5)
print('Testing scraper...')
Test fetching a single letter (smaller test)
processes = scraper.get_process_list_from_letter('c')
print(f'Found {len(processes)} processes starting with c')
if processes:
print(f'First 3 processes:')
for p in processes[:3]:
print(f' - {p["name"]}')
print('\n✓ Scraper test completed successfully')` (dns block)
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.