Neuro-sama Project

This project is designed to collect and manage VTuber-related dialogue data, supporting data isolation for multiple live stream rooms.

Current Architecture:

Data Synchronization: Synchronize JSON data generated by crawlers to the local environment via file transfer (e.g., SCP).
Data Processing: Local scripts read data files and use Pydantic models for validation and cleaning.

📚 Documentation Index

Quick Start

Environment Setup

This project uses uv for package management. Python version >=3.12 is required.

uv sync

Install PyTorch (with CUDA support):

Please select the appropriate CUDA version based on your graphics card driver (the example below is for CUDA 12.4):

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Core Dependencies

Python 3.12+
Pydantic: Used for data structure definition and validation.
PyTorch: Deep learning framework (CUDA enablement recommended).

Common Commands

Run Tests:
```
uv run pytest
```
Add Dependency:
```
uv add <package_name>
```

Data Models

The core of the project defines two data models (located in src/neuro_sama/models/):

Stream: Live stream metadata (ID, title, streamer, timestamp, etc.).
Dialogue: Dialogue data (question, answer, timestamp, confidence, etc.).

These models are used to validate the JSON data format synchronized from the crawler.

Neruo-sama
├─ config.py
├─ data
│  ├─ cleaned
│  │  └─ 7589012_pend.jsonl
│  ├─ events
│  │  ├─ alignment
│  │  └─ spam
│  └─ raw
│     ├─ audio
│     └─ danmaku
│        └─ 7589012.jsonl
├─ DEV_LOG.md
├─ Dockerfile
├─ main.py
├─ PROJECT_STRUCTURE.md
├─ pyproject.toml
├─ README.md
├─ ROADMAP.md
├─ src
│  └─ neuro_sama
│     ├─ models
│     │  ├─ dialogue.py
│     │  ├─ stream.py
│     │  └─ __init__.py
│     ├─ parser
│     │  ├─ parse_jsonl.py
│     │  ├─ screen_spam.py
│     │  └─ __init__.py
│     └─ __init__.py
├─ test
│  ├─ test_models.py
│  ├─ test_parser.py
│  └─ test_spam.py
├─ TESTING.md
└─ uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neuro-sama Project

📚 Documentation Index

Quick Start

Environment Setup

Core Dependencies

Common Commands

Data Models

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
data		data
doc		doc
src		src
test		test
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

linnene/Neuro-sama

Folders and files

Latest commit

History

Repository files navigation

Neuro-sama Project

📚 Documentation Index

Quick Start

Environment Setup

Core Dependencies

Common Commands

Data Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages