Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
with:
python-version-file: '.python-version'
- name: Install dependencies
run: uv sync --locked --all-extras --dev
run: uv sync --locked --all-extras --dev --index pytorch-cpu
- name: Run ruff checks
run: uv run ruff check
- name: Run mypy for type checking
Expand All @@ -48,6 +48,6 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: uv sync --locked --all-extras --dev
run: uv sync --locked --all-extras --dev --index pytorch-cpu
- name: Run unit tests
run: uv run pytest -v
7 changes: 6 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Scribae is a CLI tool that transforms local Markdown notes into structured SEO c
## Build & Development Commands

```bash
uv sync --locked --all-extras --dev # Install dependencies
uv sync --locked --all-extras --dev # Install dependencies (includes PyTorch with CUDA)
uv run scribae --help # Run CLI
uv run ruff check # Lint (auto-fix: --fix)
uv run mypy # Type check
Expand All @@ -20,6 +20,11 @@ uv run pytest tests/unit/foo_test.py # Run single test file
uv run pytest -k "test_name" # Run tests matching pattern
```

For a lighter install (~200MB vs ~2GB), use the CPU-only PyTorch index:
```bash
uv sync --locked --all-extras --dev --index pytorch-cpu
```

**Important:** Always run tests, mypy, and ruff at the end of your task and fix any issues.

## Architecture
Expand Down
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,37 @@ keeping placeholders, links, and numbers intact.
- **NLLB fallback.** When pivoting fails, the pipeline falls back to NLLB. ISO codes like `en`/`de`/`es` are mapped to
NLLB codes (e.g., `eng_Latn`, `deu_Latn`, `spa_Latn`). You can also pass NLLB codes directly via `--src`/`--tgt`.

### Translation dependencies
Translation uses PyTorch and Hugging Face Transformers. Install the translation extra before running
`scribae translate`:
```bash
uv sync --locked --dev --extra translation
```
To avoid downloading CUDA libraries (~2GB), use the CPU-only PyTorch index instead:
```bash
uv sync --locked --dev --extra translation --index pytorch-cpu
```

## Quick start
1. Install [uv](https://github.com/astral-sh/uv) and sync dependencies (Python 3.12 is managed by uv):
```bash
uv sync --locked --all-extras --dev
uv sync --locked --dev
```
2. (Optional) Install translation dependencies:
```bash
uv sync --locked --dev --extra translation
```
Use the CPU-only index if you want to avoid CUDA downloads:
```bash
uv sync --locked --dev --extra translation --index pytorch-cpu
```
2. (Optional) Point Scribae at your model endpoint:
3. (Optional) Point Scribae at your model endpoint:
```bash
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="no-key"
# or use OPENAI_API_BASE if you prefer
```
3. Run the CLI:
4. Run the CLI:
```bash
uv run scribae --help
```
Expand Down
11 changes: 10 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,18 @@ dependencies = [
"pyyaml>=6.0.2",
"python-frontmatter>=1.1.0",
"transformers>=4.46.3",
"torch>=2.5.1",
"sentencepiece>=0.2.0",
"sacremoses>=0.1.1",
"fast-langdetect>=1.0.0",
"fasttext-predict==0.9.2.4",
"tomli>=2.0.0;python_version<'3.11'",
]

[project.optional-dependencies]
translation = [
"torch>=2.5.1",
]

[project.scripts]
scribae = "scribae.main:app"

Expand All @@ -56,6 +60,11 @@ Changelog = "https://github.com/fmueller/scribae/blob/main/CHANGELOG.md"
[tool.uv]
package = true

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
explicit = true

[dependency-groups]
dev = [
"build>=1.2.2.post1",
Expand Down
19 changes: 14 additions & 5 deletions src/scribae/translate/mt.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

from collections.abc import Iterable
from types import ModuleType
from typing import TYPE_CHECKING, Any

from .model_registry import ModelRegistry, RouteStep
Expand Down Expand Up @@ -51,17 +52,25 @@ def _pipeline_for(self, model_id: str) -> Pipeline:
from transformers import pipeline

if model_id not in self._pipelines:
torch = self._require_torch()
if self.device is None or self.device == "auto":
import torch

device = 0 if torch.cuda.is_available() else -1
self._pipelines[model_id] = pipeline("translation", model=model_id, device=device)
else:
self._pipelines[model_id] = pipeline(
"translation", model=model_id, device=self.device
)
self._pipelines[model_id] = pipeline("translation", model=model_id, device=self.device)
return self._pipelines[model_id]

def _require_torch(self) -> ModuleType:
try:
import torch
except ImportError as exc:
raise RuntimeError(
"Translation requires PyTorch. Install it with "
"`uv sync --extra translation` or "
"`uv sync --extra translation --index pytorch-cpu` (CPU-only)."
) from exc
return torch

def prefetch(self, steps: Iterable[RouteStep]) -> None:
"""Warm translation pipelines for the provided route steps."""
for step in steps:
Expand Down
53 changes: 35 additions & 18 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.