Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Read the Docs PR preview

on:
pull_request_target:
types:
- opened
Comment on lines +5 to +6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For GitHub Actions pull_request_target workflows, which event types are needed so jobs rerun when new commits are pushed to an existing PR?

💡 Result:

To have a pull_request_target workflow re-run when new commits are pushed to an existing PR, you need the synchronize activity type (that’s the PR action GitHub sends on new commits). (docs.github.com)

If you’re explicitly setting types:, include at least:

on:
  pull_request_target:
    types: [opened, synchronize, reopened]

(If you omit types, GitHub’s default for pull_request_target already includes opened, synchronize, and reopened.) (docs.github.com)

Citations:


Preview link won't refresh after new PR commits.

Line 6 listens only to opened, so later synchronize updates won't retrigger this workflow and preview data can go stale.

💡 Suggested fix
 on:
   pull_request_target:
     types:
       - opened
+      - synchronize
+      - reopened
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/docs.yml around lines 5 - 6, The workflow only triggers on
new PRs because the pull_request trigger's types list contains only "opened";
update the pull_request trigger (the types: setting) to include "synchronize"
alongside "opened" so subsequent commits to the PR retrigger the docs preview
workflow and keep preview data fresh.

paths:
- "docs/**"
- "mkdocs.yml"
- ".readthedocs.yaml"

permissions:
pull-requests: write

jobs:
documentation-links:
runs-on: ubuntu-latest
steps:
- uses: readthedocs/actions/preview@v1
with:
project-slug: "fgmetric"
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: 2

build:
os: ubuntu-24.04
tools:
python: "3.12"
jobs:
pre_create_environment:
- asdf plugin add uv
- asdf install uv latest
- asdf global uv latest
create_environment:
- uv venv "${READTHEDOCS_VIRTUALENV_PATH}"
install:
- UV_PROJECT_ENVIRONMENT="${READTHEDOCS_VIRTUALENV_PATH}" uv sync --frozen --group docs
build:
html:
- uv run --frozen mkdocs build --strict --site-dir "${READTHEDOCS_OUTPUT}/html"

mkdocs:
configuration: mkdocs.yml
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Type-validated Python models for delimited data files.

[![Docs](https://readthedocs.org/projects/fgmetric/badge/?version=stable)](https://fgmetric.readthedocs.io/)
[![CI](https://github.com/fulcrumgenomics/fgmetric/actions/workflows/python_package.yml/badge.svg?branch=main)](https://github.com/fulcrumgenomics/fgmetric/actions/workflows/python_package.yml?query=branch%3Amain)
[![Python Versions](https://img.shields.io/badge/python-3.12_|_3.13-blue)](https://github.com/fulcrumgenomics/fgmetric)
[![MyPy Checked](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
Expand Down
7 changes: 7 additions & 0 deletions docs/api/metric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Metric

::: fgmetric.Metric
options:
show_bases: false
members:
- read
10 changes: 10 additions & 0 deletions docs/api/metric_writer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# MetricWriter

::: fgmetric.MetricWriter
options:
show_bases: false
members:
- __init__
- write
- writeall
- close
1 change: 1 addition & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--8<-- "CONTRIBUTING.md"
148 changes: 148 additions & 0 deletions docs/guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# User Guide

## Defining a Metric

A `Metric` is a Pydantic model that maps to rows in a delimited file.
Define one by subclassing `Metric` and declaring fields with type annotations:

```python
from fgmetric import Metric


class AlignmentMetric(Metric):
read_name: str
mapping_quality: int
is_duplicate: bool = False
```

Each field corresponds to a column in your file.
Pydantic handles type coercion automatically --- string values like `"60"` become `int`, `"true"` becomes `bool`, and so on.

## Reading Metrics

Use the `read()` class method to iterate over rows in a delimited file:

```python
from pathlib import Path

for metric in AlignmentMetric.read(Path("alignments.tsv")):
print(f"{metric.read_name}: MQ={metric.mapping_quality}")
```

`read()` yields one `Metric` instance per row, lazily --- so you can process files larger than memory.
If any row fails validation, Pydantic raises a `ValidationError` with details about which field failed and why.

Example input file (`alignments.tsv`):

```tsv
read_name mapping_quality is_duplicate
read1 60 false
read2 30 true
```

### Custom Delimiters

By default, `read()` expects tab-separated values. Pass a `delimiter` argument for other formats:

```python
# Reading CSV files
for metric in AlignmentMetric.read(Path("data.csv"), delimiter=","):
...
```

### Empty Fields

Empty fields in optional columns are automatically converted to `None`:

```python
class QualityMetric(Metric):
sample: str
score: float | None # Empty string in file becomes None
```

## Writing Metrics

Use `MetricWriter` as a context manager to write metrics to a file:

```python
from fgmetric import MetricWriter

metrics = [
AlignmentMetric(read_name="read1", mapping_quality=60),
AlignmentMetric(read_name="read2", mapping_quality=30, is_duplicate=True),
]

with MetricWriter(AlignmentMetric, Path("output.tsv")) as writer:
writer.writeall(metrics)
```
Comment on lines +67 to +77
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && cat -n docs/guide.md | sed -n '67,77p'

Repository: fg-labs/fgmetric

Length of output: 436


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && rg "class AlignmentMetric" --type python

Repository: fg-labs/fgmetric

Length of output: 90


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && rg "class AlignmentMetric"

Repository: fg-labs/fgmetric

Length of output: 313


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && cat -n fgmetric/__init__.py

Repository: fg-labs/fgmetric

Length of output: 240


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && cat -n docs/guide.md | sed -n '1,67p' | tail -20

Repository: fg-labs/fgmetric

Length of output: 614


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && rg "AlignmentMetric" docs/guide.md -B 5 | head -50

Repository: fg-labs/fgmetric

Length of output: 949


Make the writing example self-contained.

Lines 75-76 use AlignmentMetric and Path, but the snippet only imports MetricWriter. Copy/pasting this block alone will fail.

Proposed change
 ```python
+from pathlib import Path
+from fgmetric import Metric
+
+
+class AlignmentMetric(Metric):
+    read_name: str
+    mapping_quality: int
+    is_duplicate: bool = False
+
+
 from fgmetric import MetricWriter
 
 metrics = [

Alternatively, if the intent is to keep the class definition separate, import it from the earlier section (though this assumes readers have already executed that code block):

 ```python
+from pathlib import Path
+from fgmetric import Metric
+
+class AlignmentMetric(Metric):
+    read_name: str
+    mapping_quality: int
+    is_duplicate: bool = False
+
 from fgmetric import MetricWriter
 
 metrics = [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/guide.md` around lines 67 - 77, The example uses AlignmentMetric and
Path but only imports MetricWriter, so make the snippet self-contained by either
(A) adding the missing import "from pathlib import Path", importing or defining
AlignmentMetric (e.g., declare class AlignmentMetric(Metric) with fields
read_name, mapping_quality, is_duplicate=False and import Metric from fgmetric)
before using MetricWriter, or (B) if AlignmentMetric is defined elsewhere, add
an explicit "from <module> import AlignmentMetric" import; ensure the metric
class name AlignmentMetric and the Path import are present so
MetricWriter(AlignmentMetric, Path("output.tsv")) and writer.writeall(metrics)
work when copy/pasted.


The writer automatically outputs the header row based on the Metric's field names.
You can also write one metric at a time with `writer.write(metric)`.

### Custom Delimiters

Just like reading, writing supports custom delimiters:

```python
with MetricWriter(AlignmentMetric, Path("output.csv"), delimiter=",") as writer:
writer.writeall(metrics)
```

## List Fields

Fields typed as `list[T]` are automatically parsed from and serialized to delimited strings:

```python
class TaggedRead(Metric):
read_id: str
tags: list[str] # "A,B,C" becomes ["A", "B", "C"]
scores: list[int] # "1,2,3" becomes [1, 2, 3]
optional_tags: list[str] | None # "" becomes None
```

The list delimiter defaults to `,` but can be customized per-metric with the `collection_delimiter` class variable:

```python
class SemicolonMetric(Metric):
collection_delimiter = ";"
values: list[int] # "1;2;3" becomes [1, 2, 3]
```

## Counter Fields

When your file has categorical data with one column per category (e.g. base counts A, C, G, T), you can model them as a single `Counter[StrEnum]` field:

```python
from collections import Counter
from enum import StrEnum

from fgmetric import Metric


class Base(StrEnum):
A = "A"
C = "C"
G = "G"
T = "T"


class BaseCountMetric(Metric):
position: int
counts: Counter[Base]
```

Given an input file:

```tsv
position A C G T
1 10 5 3 2
```

This parses to:

```python
BaseCountMetric(position=1, counts=Counter({Base.A: 10, Base.C: 5, Base.G: 3, Base.T: 2}))
```

The enum members define both the expected column names and the keys in the resulting `Counter`.
When writing, the `Counter` is "pivoted" back into separate columns.
60 changes: 60 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# fgmetric

[![CI](https://github.com/fulcrumgenomics/fgmetric/actions/workflows/python_package.yml/badge.svg?branch=main)](https://github.com/fulcrumgenomics/fgmetric/actions/workflows/python_package.yml?query=branch%3Amain)
[![Python Versions](https://img.shields.io/badge/python-3.12_|_3.13-blue)](https://github.com/fulcrumgenomics/fgmetric)
[![MyPy Checked](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)

**Type-validated Python models for delimited data files.**

`fgmetric` lets you define Python classes ("Metrics") that map directly to rows in CSV/TSV files.
It handles parsing, type coercion (strings to int, float, bool), and validation automatically using [Pydantic](https://docs.pydantic.dev/latest/).

## Installation

=== "pip"

```console
pip install fgmetric
```

=== "uv"

```console
uv add fgmetric
```

## Quick Example

Define a class to represent each row, then read or write:

```python
from pathlib import Path
from fgmetric import Metric, MetricWriter


class AlignmentMetric(Metric):
read_name: str
mapping_quality: int
is_duplicate: bool = False


# Reading
for metric in AlignmentMetric.read(Path("alignments.tsv")):
print(f"{metric.read_name}: MQ={metric.mapping_quality}")

# Writing
metrics = [
AlignmentMetric(read_name="read1", mapping_quality=60),
AlignmentMetric(read_name="read2", mapping_quality=30, is_duplicate=True),
]
with MetricWriter(AlignmentMetric, Path("output.tsv")) as writer:
writer.writeall(metrics)
```

## Why fgmetric?

- **vs. csv + dataclasses** --- Automatic type coercion and validation without boilerplate. Built on Pydantic, so custom validators and serializers can be readily added.
- **vs. pandas** --- Processes records lazily, handling files larger than memory. Metrics are type-validated and can be made immutable.
- **vs. Pydantic alone** --- Handles CSV/TSV specifics (header parsing, delimiter configuration) and provides out-of-the-box features like empty value handling and Counter field pivoting.

[Get started with the User Guide](guide.md)
63 changes: 63 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
site_name: fgmetric
site_url: https://fgmetric.readthedocs.io/en/stable/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

For MkDocs projects hosted on Read the Docs, what is the recommended site_url configuration for versioned builds and canonical URLs?

💡 Result:

For MkDocs projects on Read the Docs (including versioned builds), configure site_url to use Read the Docs’ per-build canonical URL:

# mkdocs.yml
site_url: !ENV READTHEDOCS_CANONICAL_URL

Read the Docs computes and exposes READTHEDOCS_CANONICAL_URL (taking into account your default version like latest/stable and any canonical custom domain) and recommends using it for MkDocs’ site_url, which MkDocs uses to emit canonical <link rel="canonical"> URLs. [1] [2] [3]


Use Read the Docs' canonical URL environment variable instead of hardcoding the stable URL.

Hardcoding site_url to the stable URL produces incorrect canonical links in versioned and preview builds. Read the Docs automatically provides READTHEDOCS_CANONICAL_URL (accounting for the default version and any custom domains), which should be used for site_url so MkDocs emits the correct <link rel="canonical"> tags.

Proposed change
-site_url: https://fgmetric.readthedocs.io/en/stable/
+site_url: !ENV [READTHEDOCS_CANONICAL_URL, "https://fgmetric.readthedocs.io/en/stable/"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mkdocs.yml` at line 2, Replace the hardcoded site_url value with the Read the
Docs canonical URL environment variable: update the mkdocs configuration so that
the site_url key uses the READTHEDOCS_CANONICAL_URL environment variable (e.g.,
set site_url to the environment variable reference) instead of the fixed
"https://fgmetric.readthedocs.io/en/stable/" so MkDocs emits correct canonical
links for versioned and preview builds.

site_description: Type-validated Python models for delimited data files.
repo_name: fulcrumgenomics/fgmetric
repo_url: https://github.com/fulcrumgenomics/fgmetric

nav:
- Home: index.md
- User Guide: guide.md
- API Reference:
- Metric: api/metric.md
- MetricWriter: api/metric_writer.md
- Benchmarks: benchmarks.md
- Contributing: contributing.md

theme:
name: material
palette:
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode
features:
- content.code.copy
- navigation.instant
- navigation.instant.progress
- navigation.sections
- navigation.top

markdown_extensions:
- admonition
- pymdownx.details
- pymdownx.highlight:
anchor_linenums: true
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
- pymdownx.snippets:
base_path: ["."]
- toc:
permalink: true

plugins:
- search
- mkdocstrings:
default_handler: python
handlers:
python:
options:
docstring_style: google
merge_init_into_class: true
members_order: source
separate_signature: true
show_root_heading: true
show_source: false
show_symbol_type_heading: true
show_symbol_type_toc: true
9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ classifiers = [
"homepage" = "https://github.com/fulcrumgenomics/fgmetric"
"repository" = "https://github.com/fulcrumgenomics/fgmetric"
"Bug Tracker" = "https://github.com/fulcrumgenomics/fgmetric/issues"
"documentation" = "https://fgmetric.readthedocs.io/"

[dependency-groups]
dev = [
Expand All @@ -45,6 +46,11 @@ dev = [
ipython = [
"ipython ~=9.2",
]
docs = [
"mkdocs>=1.6.1,<2",
"mkdocs-material ~=9.6",
"mkdocstrings[python] ~=0.29",
]

[project.optional-dependencies]
benchmark = [
Expand Down Expand Up @@ -85,6 +91,9 @@ fix-and-check-all.sequence = [
"check-tests"
]

docs-build = "mkdocs build --strict"
docs-serve = "mkdocs serve"

benchmark = "pytest --no-cov --benchmark-group-by param:num_rows --override-ini testpaths=tests/benchmarks"

[tool.mypy]
Expand Down
Loading