Skip to content

linnene/Neuro-sama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neuro-sama Project

This project is designed to collect and manage VTuber-related dialogue data, supporting data isolation for multiple live stream rooms.

Current Architecture:

  • Data Synchronization: Synchronize JSON data generated by crawlers to the local environment via file transfer (e.g., SCP).
  • Data Processing: Local scripts read data files and use Pydantic models for validation and cleaning.

📚 Documentation Index

Quick Start

Environment Setup

This project uses uv for package management. Python version >=3.12 is required.

uv sync

Install PyTorch (with CUDA support):

Please select the appropriate CUDA version based on your graphics card driver (the example below is for CUDA 12.4):

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Core Dependencies

  • Python 3.12+
  • Pydantic: Used for data structure definition and validation.
  • PyTorch: Deep learning framework (CUDA enablement recommended).

Common Commands

  • Run Tests:

    uv run pytest
  • Add Dependency:

    uv add <package_name>

Data Models

The core of the project defines two data models (located in src/neuro_sama/models/):

  1. Stream: Live stream metadata (ID, title, streamer, timestamp, etc.).
  2. Dialogue: Dialogue data (question, answer, timestamp, confidence, etc.).

These models are used to validate the JSON data format synchronized from the crawler.

Neruo-sama
├─ config.py
├─ data
│  ├─ cleaned
│  │  └─ 7589012_pend.jsonl
│  ├─ events
│  │  ├─ alignment
│  │  └─ spam
│  └─ raw
│     ├─ audio
│     └─ danmaku
│        └─ 7589012.jsonl
├─ DEV_LOG.md
├─ Dockerfile
├─ main.py
├─ PROJECT_STRUCTURE.md
├─ pyproject.toml
├─ README.md
├─ ROADMAP.md
├─ src
│  └─ neuro_sama
│     ├─ models
│     │  ├─ dialogue.py
│     │  ├─ stream.py
│     │  └─ __init__.py
│     ├─ parser
│     │  ├─ parse_jsonl.py
│     │  ├─ screen_spam.py
│     │  └─ __init__.py
│     └─ __init__.py
├─ test
│  ├─ test_models.py
│  ├─ test_parser.py
│  └─ test_spam.py
├─ TESTING.md
└─ uv.lock

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published