Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
github: MattyB95
ko_fi: mattyb95
thanks_dev: u/gh/MattyB95
4 changes: 2 additions & 2 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ jobs:
- uses: actions/checkout@v6

- name: Initialize CodeQL
uses: github/codeql-action/init@v3
uses: github/codeql-action/init@v4
with:
languages: python

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
uses: github/codeql-action/analyze@v4
with:
category: "/language:python"
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,47 @@ Jabberjay uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---

## [0.0.7] — 2026-03-16

### Added
- **`CITATION.cff`** — machine-readable citation metadata for academic use;
GitHub surfaces this as a "Cite this repository" button; compatible with
Zenodo, Zotero, and other reference managers
- **Sample rate validation** — `detect()` now raises `ValueError` immediately
for zero or negative sample rates when a pre-loaded audio tuple is passed,
preventing silent failures in downstream model inference
- **Test coverage for new behaviour** — 7 new tests covering sample rate
validation, `BytesIO` buffer cleanup on both success and error paths,
figure cleanup on error paths, and `Classical.predict()` bonafide/spoof paths

### Changed
- **Dependency minimum versions pinned** — all runtime dependencies now carry
lower-bound constraints (`torch>=2.0`, `transformers>=4.30`,
`huggingface-hub>=0.20`, `librosa>=0.10`, etc.) to prevent silent
incompatibilities on fresh installs
- **Contact and copyright updated** — maintainer email changed to
`Matthew.Boakes@Gmail.com` across `CODE_OF_CONDUCT.md`, `SECURITY.md`, and
`pyproject.toml`; `LICENSE` updated to `2024-2026 Matthew Boakes and The
Alan Turing Institute`
- **`CONTRIBUTING.md` model template corrected** — example `run.py` now uses
`cast()` + `normalize_pipeline_scores()` (matching real handlers) and the
handler example uses `_result_from_scores()` instead of building
`DetectionResult` manually; label normaliser variable names corrected to
`_BONAFIDE_SUBSTR` / `_SPOOF_SUBSTR` / `_BONAFIDE_EXACT` / `_SPOOF_EXACT`

### Fixed
- **`VIT/utility.py`** — `BytesIO` buffer is now always closed in the
`finally` block, preventing a resource leak if `Image.open()` or
`img.load()` raises
- **`Classical/run.py`** — renamed `model` variable to `model_path` to
accurately reflect that `download_pretrained_model()` returns a file path,
not a model object
- **`_result_from_scores()`** — parameter type restored to
`list[PredictionScore]`, keeping the type contract consistent end-to-end
from model handlers through to `DetectionResult.scores`

---

## [0.0.6] — 2026-03-16

### Added
Expand Down Expand Up @@ -117,6 +158,7 @@ Jabberjay uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
- Command-line interface (`jabberjay <audio>`)
- GitHub Actions CI workflow and ruff linting

[0.0.7]: https://github.com/MattyB95/Jabberjay/compare/v0.0.6...HEAD
[0.0.6]: https://github.com/MattyB95/Jabberjay/compare/v0.0.5...v0.0.6
[0.0.5]: https://github.com/MattyB95/Jabberjay/compare/v0.0.4...v0.0.5
[0.0.4]: https://github.com/MattyB95/Jabberjay/compare/v0.0.3...v0.0.4
Expand Down
31 changes: 31 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
cff-version: 1.2.0
message: "If you use this software, please cite it using the metadata below."
type: software
title: Jabberjay
abstract: >
Jabberjay is a unified Python API and CLI for synthetic voice detection.
It brings together state-of-the-art deepfake audio detection models —
including ViT, AST, Wav2Vec2, HuBERT, WavLM, RawNet2, and a classical
baseline — under a single consistent interface, allowing researchers and
practitioners to detect AI-generated speech without wrestling with
individual model dependencies and conventions.
authors:
- family-names: Boakes
given-names: Matthew
email: Matthew.Boakes@Gmail.com
orcid: "https://orcid.org/0000-0002-9377-6240"
repository-code: "https://github.com/MattyB95/Jabberjay"
url: "https://mattyb95.github.io/Jabberjay"
repository-artifact: "https://pypi.org/project/Jabberjay/"
license: MIT
version: 0.0.7
date-released: "2026-03-16"
keywords:
- synthetic voice detection
- deepfake detection
- anti-spoofing
- audio classification
- speech
- ASVspoof
- machine learning
- transformers
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This Code of Conduct applies within all community spaces, and also applies when

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behaviour may be reported to the community leaders responsible for enforcement at **mboakes@turing.ac.uk**. All complaints will be reviewed and investigated promptly and fairly.
Instances of abusive, harassing, or otherwise unacceptable behaviour may be reported to the community leaders responsible for enforcement at **Matthew.Boakes@Gmail.com**. All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the reporter of any incident.

Expand Down
33 changes: 13 additions & 20 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,12 @@ Work on `develop` — **do not target `main` directly**.

New models are the most valuable contribution. The bar for inclusion is:

| Requirement | Detail |
|---|---|
| **Licence** | Apache 2.0 or MIT only |
| **Task** | Binary bonafide / spoof classification |
| Requirement | Detail |
|------------------|--------------------------------------------------------|
| **Licence** | Apache 2.0 or MIT only |
| **Task** | Binary bonafide / spoof classification |
| **Availability** | Publicly available weights (HuggingFace Hub preferred) |
| **Input** | Raw audio waveform (not pre-extracted features) |
| **Input** | Raw audio waveform (not pre-extracted features) |

### Step-by-step

Expand All @@ -88,11 +88,13 @@ class Model(Enum):

```python
# src/Jabberjay/Models/MyModel/run.py
import numpy as np
from typing import cast

import numpy as np
from loguru import logger
from transformers import pipeline
from Jabberjay.Utilities.label_normalizer import normalize_label

from Jabberjay.Utilities.label_normalizer import normalize_pipeline_scores
from Jabberjay.Utilities.types import PredictionScore

_MODEL_ID = "author/model-id-on-huggingface"
Expand All @@ -103,13 +105,10 @@ def predict(y: np.ndarray, sr: float) -> list[PredictionScore]:
pipe = pipeline("audio-classification", model=_MODEL_ID, sampling_rate=_TARGET_SR)
logger.debug(f"Running MyModel inference on {len(y)} samples at {int(sr)}Hz")
raw = cast(list[dict[str, object]], pipe({"raw": y, "sampling_rate": int(sr)}))
return [
PredictionScore(label=normalize_label(str(s["label"])), score=float(str(s["score"])))
for s in raw
]
return normalize_pipeline_scores(raw)
```

If the model uses non-standard labels, `normalize_label()` handles mapping — check `src/Jabberjay/Utilities/label_normalizer.py` and extend `_BONAFIDE_KEYWORDS` / `_SPOOF_KEYWORDS` if needed.
If the model uses non-standard labels, `normalize_pipeline_scores()` calls `normalize_label()` internally — check `src/Jabberjay/Utilities/label_normalizer.py` and extend `_BONAFIDE_SUBSTR` / `_SPOOF_SUBSTR` (substring matching) or `_BONAFIDE_EXACT` / `_SPOOF_EXACT` (exact matching) if needed.

**3. Add a handler and match case** in `src/Jabberjay/jabberjay.py`:

Expand All @@ -122,14 +121,8 @@ case Model.MyModel:
def _mymodel_handler(self, y: np.ndarray, sr: float) -> DetectionResult:
import Jabberjay.Models.MyModel.run as MyModel
scores = MyModel.predict(y=y, sr=sr)
top = scores[0]
return DetectionResult(
label=top["label"],
is_bonafide=top["label"] == "Bonafide",
confidence=top["score"],
model=Model.MyModel,
scores=scores,
)
logger.debug(f"MyModel predictions: {scores}")
return self._result_from_scores(scores, Model.MyModel)
```

**4. Update tests** — add the new enum value to `TestEnums.test_model_members` in `tests/test_jabberjay.py`.
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 The Alan Turing Institute
Copyright (c) 2024-2026 Matthew Boakes and The Alan Turing Institute

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
Loading
Loading