Add LinkedIn scraper/chat CLI, Dockerfile, and docker-compose for RAG Gradio UI by siva01c · Pull Request #1 · siva01c/OSINT

siva01c · 2026-03-13T15:54:46Z

Motivation

Convert the previous Jupyter notebook into a reusable CLI Python script to scrape LinkedIn posts and provide a RAG-powered chat UI.
Provide a reproducible containerized environment so the scraper and Gradio UI can run consistently across systems.
Integrate a retrieval pipeline so scraped posts can be indexed and queried with OpenAI + ChromaDB.
Expose a simple developer workflow for running scraping in headless/headed modes and launching the chat UI.

Description

Add linkedin_tool.py which implements login (cookie save/load), lazy-loading scrolling, post extraction, RagChat (Chromadb persistent collection and OpenAI chat completions), and a Gradio chat UI with CLI subcommands scrape, chat, and all.
Add Dockerfile that uses python:3.11-slim, installs Chromium and required system libraries, installs Python dependencies from requirements.txt, and sets the default CMD to run the chat server with python linkedin_tool.py chat --host 0.0.0.0 --port 7860.
Add docker-compose.yml which defines an osint service built from the repo and a selenium standalone-chrome service, maps ports (7860 for the app and 4444/7900 for Selenium), mounts the project, and injects SELENIUM_REMOTE_URL for remote browser usage.
Update README.md with environment variable setup (cp .env.default .env), instructions for running scrape, chat, or all, Docker Compose usage, and notes about storage locations (cookies/, chromadb/, posts.json).

Testing

No automated tests were added or executed for this change.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dbd49a00f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-13T15:57:20Z

linkedin_tool.py

+            chroma_id = f"topic_{idx}"
+            if self.linkedin.get(ids=[chroma_id]).get("ids"):
+                continue


Derive Chroma document IDs from post IDs instead of indices

The indexing logic uses chroma_id = f"topic_{idx}" and skips any existing ID, which makes the persistent linkedin_posts collection silently stale whenever posts.json changes order/content or a different profile is scraped later. In those cases, rows at the same index are treated as already indexed and never refreshed, so retrieval can return outdated or wrong-person context even though a new scrape was provided.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-13T15:57:20Z

docker-compose.yml

+      - "7860:7860"
+    depends_on:
+      - selenium
+    command: ["python", "linkedin_tool.py", "chat", "--host", "0.0.0.0", "--port", "7860"]


Avoid defaulting docker-compose command to chat-only mode

The compose service always launches python linkedin_tool.py chat ..., but run_chat hard-fails when posts.json is missing, so docker compose up --build on a fresh checkout exits immediately instead of bringing up a usable stack. This default command requires pre-scraped data that new environments do not have yet, making the advertised startup flow fail unless users run scrape manually first.

Useful? React with 👍 / 👎.

Add Selenium browser service to Docker Compose infrastructure

dbd49a0

siva01c added the codex label Mar 13, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LinkedIn scraper/chat CLI, Dockerfile, and docker-compose for RAG Gradio UI#1

Add LinkedIn scraper/chat CLI, Dockerfile, and docker-compose for RAG Gradio UI#1
siva01c wants to merge 1 commit intomainfrom
codex/convert-jupyter-notebook-to-python-script

siva01c commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siva01c commented Mar 13, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant