Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
chromium \
chromium-driver \
fonts-liberation \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libcups2 \
libdbus-1-3 \
libdrm2 \
libgbm1 \
libgtk-3-0 \
libnss3 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxfixes3 \
libxrandr2 \
xdg-utils \
&& rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "linkedin_tool.py", "chat", "--host", "0.0.0.0", "--port", "7860"]
95 changes: 89 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,93 @@
# OSINT Tools
Before running the project, copy .env.default to .env, then update your LinkedIn and OpenAI credentials and API keys in the .env file.

## LinkedIn Posts Analyser
This is a Jupyter Notebook project. The file linkedin.ipynb is a concept for a LinkedIn user post scraper and GenAI analyzer.
This tool is useful for OSINT purposes or for managing your digital footprint. It can retrieve LinkedIn posts from several years back and export them to a JSON file.
This repository now includes a **Python script version** of the original Jupyter notebook (`linkedin.ipynb`).

## Chat with GenAI
For the GenAI features, an OpenAI API key is required. In this part, the JSON file is imported into ChromaDB. Then, using the Gradio chat interface, an OpenAI chatbot is loaded with knowledge about the gathered LinkedIn posts.
## 1) Setup environment variables

Copy `.env.default` to `.env` and fill in your credentials/API key:

```bash
cp .env.default .env
```

Required variables in `.env`:

- `LINKEDIN_USER`
- `LINKEDIN_PASSWORD`
- `LINKEDIN_TARGET_USERNAME`
- `LINKEDIN_TARGET_NAME`
- `OPENAI_API_KEY`

> Keep `.env` private. It is already ignored by git.

---

## 2) Run as Python script

Main script: `linkedin_tool.py`

### Scrape LinkedIn posts

```bash
python linkedin_tool.py scrape
```

If you need a visible browser window (instead of headless):

```bash
python linkedin_tool.py scrape --headed
```

This generates `posts.json`.

### Start RAG chat UI (Gradio)

```bash
python linkedin_tool.py chat --host 0.0.0.0 --port 7860
```

### Run scrape + chat in sequence

```bash
python linkedin_tool.py all
```

---

## 3) Run with Docker Compose

Build and start the services:

```bash
docker compose up --build
```

Docker Compose now starts:

- `osint` (Python app)
- `selenium` (Selenium + Chrome browser infrastructure)

By default, chat is available at http://localhost:7860 and Selenium Grid at http://localhost:4444.

### Run scraping inside container

```bash
docker compose run --rm osint python linkedin_tool.py scrape
```

The `osint` container is preconfigured to use the Selenium browser service via `SELENIUM_REMOTE_URL=http://selenium:4444/wd/hub`.

### Start chat after scraping

```bash
docker compose up
```

---

## Notes

- Cookies are stored under `cookies/`.
- Embedded vector data is stored under `chromadb/`.
- Scraped posts are saved to `posts.json`.
- LinkedIn may trigger checkpoint/verification challenges. If that happens, complete verification and rerun.
23 changes: 23 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
services:
osint:
build: .
container_name: osint-linkedin
env_file:
- .env
environment:
- SELENIUM_REMOTE_URL=http://selenium:4444/wd/hub
volumes:
- ./:/app
ports:
- "7860:7860"
depends_on:
- selenium
command: ["python", "linkedin_tool.py", "chat", "--host", "0.0.0.0", "--port", "7860"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid defaulting docker-compose command to chat-only mode

The compose service always launches python linkedin_tool.py chat ..., but run_chat hard-fails when posts.json is missing, so docker compose up --build on a fresh checkout exits immediately instead of bringing up a usable stack. This default command requires pre-scraped data that new environments do not have yet, making the advertised startup flow fail unless users run scrape manually first.

Useful? React with 👍 / 👎.


selenium:
image: selenium/standalone-chrome:latest
container_name: osint-selenium
shm_size: 2gb
ports:
- "4444:4444"
- "7900:7900"
Loading