RawArchive

Train a personalized reply model from your Instagram chats using a local app + Colab GPU fine-tuning on Qwen/Qwen2.5-3B-Instruct.

Website Preview

What This App Does

RawArchive is a full workflow for style-learning from message history:

You export Instagram messages as .json.
You upload them to the local RawArchive web app.
RawArchive parses chats and builds a training bundle (bun_*).
Google Colab fine-tunes a LoRA adapter on top of Qwen 2.5 3B.
You register the resulting adapter.zip and get a model ID (mdl_*).
You chat with responses that follow the learned writing style.

Local PC handles ingestion/control/inference. Colab handles GPU training.

Execution Flow Diagram

flowchart LR
    subgraph LOCAL[Local PC]
        A[Instagram JSON Export] --> B[Web UI Upload<br/>app/static/index.html]
        B --> C[API Upload Endpoint<br/>POST /v1/datasets/instagram/upload]
        C --> D[Parser and Normalizer<br/>app/parser.py]
        D --> E[Bundle Builder<br/>app/dataset_builder.py]
        E --> F[Bundle Artifact<br/>bun_*]
        M[Model Register Endpoint<br/>POST /v1/models/register] --> N[Model ID<br/>mdl_*]
        N --> O[Local Inference Chat<br/>scripts/chat_local.py]
    end

    subgraph CLOUD[Google Colab GPU Runtime]
        G[Notebook<br/>colab/train_lora_ultrafast.ipynb] --> H[Download Bundle<br/>GET /v1/bundles/bun_xxx/download]
        H --> I[Load Qwen Base Model<br/>Qwen/Qwen2.5-3B-Instruct]
        I --> J[LoRA Fine-Tuning<br/>colab/train_lora.py]
        J --> K[Export Adapter<br/>adapter.zip]
    end

    F --> G
    K --> M
    O --> P[Style-Matched Response]

How It Executes (Detailed)

Stage 1: Ingestion and Parsing

Input is Instagram export .json.
Upload endpoint stores and validates data.
Parser extracts senders, timestamps, and message text.
Normalization removes invalid/empty records and prepares clean samples.

Stage 2: Bundle Generation

You select target style/user from parsed chats.
Builder creates prompt-response training pairs.
Data is split into train/validation.
Bundle is created with ID bun_*.

Stage 3: Colab Training

Colab receives:
- BASE_URL = Cloudflare URL to your local API
- BUNDLE_ID = generated bun_*
Notebook downloads bundle and loads Qwen/Qwen2.5-3B-Instruct.
LoRA adapter layers are trained (not full model weights).
Output is adapter.zip.

Stage 4: Registration and Inference

You register adapter in UI Step 4.
API issues model ID mdl_*.
Local chat script loads base model + adapter and generates replies.

How To Download Instagram Messages as JSON

Use Accounts Center and choose JSON format.

Option A: Instagram App (Phone)

Open Instagram app.
Go to profile > menu > Settings and privacy.
Open Accounts Center.
Go to Your information and permissions.
Open Download your information.
Choose the Instagram account.
Choose data range (or all time), then select Messages.
Set format to JSON.
Submit request and wait for email/notification.
Download the archive and extract .json files.

Option B: Instagram Web

Open Instagram in browser and log in.
Go to More > Settings > Accounts Center.
Open Your information and permissions.
Click Download your information.
Select account and data type Messages.
Choose JSON format and submit request.
Download archive when ready, then extract message .json.

Notes:

Export generation can take time depending on account size.
Use JSON, not HTML, for RawArchive training.

Requirements

Windows + PowerShell
Python 3.11+
Google Colab account
Internet access (for model downloads in Colab)
cloudflared (required if Colab must reach your local API)

Project files used:

app/
colab/
scripts/
tests/
requirements.txt
requirements.inference.txt

Deploy (Local + Colab)

1) Local Setup

py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

2) Start API

.\.venv\Scripts\Activate.ps1
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

3) Start Cloudflare Tunnel (Second Terminal)

.\.venv\Scripts\Activate.ps1
cloudflared tunnel --url http://127.0.0.1:8000

4) Build Bundle in UI

Open http://127.0.0.1:8000 and:

Upload Instagram .json files.
Build bundle.
Copy generated bun_*.

5) Train in Colab

Notebook: colab/train_lora_ultrafast.ipynb

Set:

BASE_URL = https://...trycloudflare.com
BUNDLE_ID = your bun_*

Run all cells, then download adapter.zip.

6) Register Adapter

In RawArchive UI Step 4:

Adapter URI: local://C:/Users/{your-username}/{your-location}/data/models/adapter.zip
Validation Loss: numeric value
Style Score: numeric value

7) Chat Locally

.\.venv\Scripts\Activate.ps1
pip install -r requirements.inference.txt
python scripts\chat_local.py --model-id mdl_your_model_id

Exact Run Commands

Run API:

uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

Run tunnel:

cloudflared tunnel --url http://127.0.0.1:8000

Run tests:

pytest -q

Privacy and Safety

Private artifacts are intentionally ignored:

data/models/adapter.zip
data/datasets/ (raw message content)
data/bundles/ (generated training artifacts)
patterns like messages*.json and conversation*.json

Verify no private artifacts are tracked:

git ls-files | Select-String -Pattern "adapter\.zip|attachment\.zip|messages?\.json|conversation.*\.json"

Expected result: no output.

Troubleshooting

Colab cannot call localhost directly.
Always pass the Cloudflare URL as BASE_URL.
Keep local API + tunnel running during Colab download/training steps.
If registration fails, verify file path and local:// prefix.

Contributors

_{Yashas VM}
_{Creator & Lead Developer}

Made by @Yashas.VM
Co-Powered by Claude

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RawArchive

Website Preview

What This App Does

Execution Flow Diagram

How It Executes (Detailed)

Stage 1: Ingestion and Parsing

Stage 2: Bundle Generation

Stage 3: Colab Training

Stage 4: Registration and Inference

How To Download Instagram Messages as JSON

Option A: Instagram App (Phone)

Option B: Instagram Web

Requirements

Deploy (Local + Colab)

1) Local Setup

2) Start API

3) Start Cloudflare Tunnel (Second Terminal)

4) Build Bundle in UI

5) Train in Colab

6) Register Adapter

7) Chat Locally

Exact Run Commands

Privacy and Safety

Troubleshooting

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
colab		colab
docs/images		docs/images
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.inference.txt		requirements.inference.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RawArchive

Website Preview

What This App Does

Execution Flow Diagram

How It Executes (Detailed)

Stage 1: Ingestion and Parsing

Stage 2: Bundle Generation

Stage 3: Colab Training

Stage 4: Registration and Inference

How To Download Instagram Messages as JSON

Option A: Instagram App (Phone)

Option B: Instagram Web

Requirements

Deploy (Local + Colab)

1) Local Setup

2) Start API

3) Start Cloudflare Tunnel (Second Terminal)

4) Build Bundle in UI

5) Train in Colab

6) Register Adapter

7) Chat Locally

Exact Run Commands

Privacy and Safety

Troubleshooting

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages