abdenlab · conradbzura · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/docs/dialect/aggregation-operators.rst b/docs/dialect/aggregation-operators.rst
@@ -328,4 +328,129 @@ Related Operators
 ~~~~~~~~~~~~~~~~~
 
 - :ref:`CLUSTER <cluster-operator>` - Assign cluster IDs without merging
+- :ref:`COVERAGE <coverage-operator>` - Compute binned genome coverage
 - :ref:`INTERSECTS <intersects-operator>` - Test for overlap between specific pairs
+
+----
+
+.. _coverage-operator:
+
+COVERAGE
+--------
+
+Compute binned genome coverage by tiling the genome into fixed-width bins.
+
+Description
+~~~~~~~~~~~
+
+The ``COVERAGE`` operator tiles the genome into fixed-width bins and aggregates overlapping intervals per bin. It generates a bin grid using ``generate_series`` and joins it against the source table to count (or otherwise aggregate) overlapping features in each bin.
+
+This is useful for:
+
+- Computing read depth or signal coverage across the genome
+- Creating fixed-resolution coverage tracks from interval data
+- Summarising feature density at a user-defined resolution
+
+The operator works as an aggregate function, returning one row per bin with the bin coordinates and the computed statistic.
+
+Syntax
+~~~~~~
+
+.. code-block:: sql
+
+   -- Basic coverage (count overlapping intervals per bin)
+   SELECT COVERAGE(interval, resolution) FROM features
+
+   -- With a named statistic (either := or => syntax)
+   SELECT COVERAGE(interval, 1000, stat := 'mean') FROM features
+   SELECT COVERAGE(interval, 1000, stat => 'mean') FROM features
+
+   -- Aggregate a specific column instead of interval length
+   SELECT COVERAGE(interval, 1000, stat := 'mean', target := 'score') FROM features
+
+   -- Named resolution parameter
+   SELECT COVERAGE(interval, resolution := 500) FROM features
+
+Parameters
+~~~~~~~~~~
+
+**interval**
+   A genomic column.
+
+**resolution**
+   Bin width in base pairs. Can be given as a positional or named parameter.
+
+**stat** *(optional)*
+   Aggregation function applied to overlapping intervals per bin. One of:
+
+   - ``'count'`` — number of overlapping intervals (default)
+   - ``'mean'`` — average interval length of overlapping intervals
+   - ``'sum'`` — total interval length of overlapping intervals
+   - ``'min'`` — minimum interval length of overlapping intervals
+   - ``'max'`` — maximum interval length of overlapping intervals
+
+   When ``target`` is specified, the stat is applied to that column instead of interval length.
+
+**target** *(optional)*
+   Column name to aggregate. When omitted, non-count stats aggregate interval length (``end - start``). When specified, the stat is applied to the named column. For ``'count'``, specifying a target counts non-NULL values of that column instead of ``COUNT(*)``.
+
+Return Value
+~~~~~~~~~~~~
+
+Returns one row per genomic bin:
+
+- ``chrom`` — Chromosome of the bin
+- ``start`` — Start position of the bin
+- ``end`` — End position of the bin
+- ``value`` — The computed aggregate (default alias; use ``AS`` to rename)
+
+Examples
+~~~~~~~~
+
+**Basic Coverage:**
+
+Count the number of features overlapping each 1 kb bin:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000)
+   FROM features
+
+**Mean Coverage:**
+
+Compute the average interval length per 500 bp bin:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 500, stat := 'mean')
+   FROM features
+
+**Named Alias:**
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM reads
+
+**With WHERE Filter:**
+
+Assuming the source table includes a ``score`` column, compute coverage of high-scoring features only:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM features
+   WHERE score > 10
+
+Performance Notes
+~~~~~~~~~~~~~~~~~
+
+- The operator creates one bin per chromosome per step, so smaller resolutions produce more rows
+- A ``LEFT JOIN`` ensures bins with zero coverage are included in the output
+- For very large genomes, consider restricting the query with a ``WHERE`` clause on chromosome
+
+Related Operators
+~~~~~~~~~~~~~~~~~
+
+- :ref:`MERGE <merge-operator>` - Combine overlapping intervals into single regions
+- :ref:`CLUSTER <cluster-operator>` - Assign cluster IDs to overlapping intervals
diff --git a/docs/recipes/coverage.rst b/docs/recipes/coverage.rst
@@ -0,0 +1,173 @@
+Coverage
+========
+
+This section covers patterns for computing genome-wide coverage and signal
+summaries using GIQL's ``COVERAGE`` operator.
+
+Basic Coverage
+--------------
+
+Count Overlapping Features
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Count the number of features overlapping each 1 kb bin across the genome:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM features
+
+**Sample output:**
+
+.. code-block:: text
+
+   ┌────────┬────────┬────────┬───────┐
+   │ chrom  │ start  │  end   │ depth │
+   ├────────┼────────┼────────┼───────┤
+   │ chr1   │      0 │   1000 │     3 │
+   │ chr1   │   1000 │   2000 │     1 │
+   │ chr1   │   2000 │   3000 │     0 │
+   │ ...    │    ... │    ... │   ... │
+   └────────┴────────┴────────┴───────┘
+
+Each row represents one genomic bin. Bins with no overlapping features appear with a count of zero.
+
+**Use case:** Compute read depth or feature density at a fixed resolution.
+
+Custom Bin Size
+~~~~~~~~~~~~~~~
+
+Use a finer resolution of 100 bp:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 100) AS depth
+   FROM reads
+
+**Use case:** High-resolution coverage tracks for visualisation.
+
+Coverage Statistics
+-------------------
+
+Mean Interval Length per Bin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compute the average length of intervals overlapping each bin:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000, stat := 'mean') AS avg_len
+   FROM features
+
+Sum of Interval Lengths per Bin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compute the total interval length in each bin:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000, stat := 'sum') AS total_len
+   FROM features
+
+Maximum Interval Length per Bin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Find the longest interval overlapping each bin:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000, stat := 'max') AS max_len
+   FROM features
+
+Aggregating a Specific Column
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compute the mean score of overlapping features per bin instead of summarising interval length:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000, stat := 'mean', target := 'score') AS avg_score
+   FROM features
+
+**Use case:** Signal tracks from a numeric column (e.g. ChIP-seq score, p-value).
+
+Filtered Coverage
+-----------------
+
+Strand-Specific Coverage
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Compute coverage for each strand separately by filtering:
+
+.. code-block:: sql
+
+   -- Plus strand
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM features
+   WHERE strand = '+'
+
+.. code-block:: sql
+
+   -- Minus strand
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM features
+   WHERE strand = '-'
+
+**Use case:** Strand-specific signal tracks for RNA-seq or stranded assays.
+
+Coverage of High-Scoring Features
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Restrict coverage to features above a quality threshold:
+
+.. code-block:: sql
+
+   SELECT COVERAGE(interval, 1000) AS depth
+   FROM features
+   WHERE score > 10
+
+5' End Counting
+~~~~~~~~~~~~~~~
+
+To count only the 5' ends of features (e.g. TSS or read starts), first
+create a view or CTE that trims each interval to its 5' end, then apply
+``COVERAGE``:
+
+.. code-block:: sql
+
+   WITH five_prime AS (
+       SELECT chrom, start, start + 1 AS end
+       FROM features
+       WHERE strand = '+'
+       UNION ALL
+       SELECT chrom, end - 1 AS start, end
+       FROM features
+       WHERE strand = '-'
+   )
+   SELECT COVERAGE(interval, 1000) AS tss_count
+   FROM five_prime
+
+Normalised Coverage
+-------------------
+
+RPM Normalisation
+~~~~~~~~~~~~~~~~~
+
+Normalise bin counts to reads per million (RPM) by dividing by the total
+number of reads:
+
+.. code-block:: sql
+
+   WITH bins AS (
+       SELECT COVERAGE(interval, 1000) AS depth
+       FROM reads
+   ),
+   total AS (
+       SELECT COUNT(*) AS n FROM reads
+   )
+   SELECT
+       bins.chrom,
+       bins.start,
+       bins.end,
+       bins.depth * 1000000.0 / total.n AS rpm
+   FROM bins, total
diff --git a/docs/recipes/index.rst b/docs/recipes/index.rst
@@ -19,6 +19,10 @@ Recipe Categories
    Clustering overlapping intervals, distance-based clustering,
    merging intervals, and aggregating cluster statistics.
 
+:doc:`coverage`
+   Binned genome coverage, coverage statistics, strand-specific coverage,
+   normalisation, and 5' end counting.
+
 :doc:`advanced`
    Multi-range matching, complex filtering with joins, aggregate statistics,
    window expansions, and multi-table queries.

diff --git a/src/giql/dialect.py b/src/giql/dialect.py
@@ -14,6 +14,7 @@
 from giql.expressions import Contains
 from giql.expressions import GIQLCluster
 from giql.expressions import GIQLDistance
+from giql.expressions import GIQLCoverage
 from giql.expressions import GIQLMerge
 from giql.expressions import GIQLNearest
 from giql.expressions import Intersects
@@ -54,6 +55,7 @@ class Parser(Parser):
         FUNCTIONS = {
             **Parser.FUNCTIONS,
             "CLUSTER": GIQLCluster.from_arg_list,
+            "COVERAGE": GIQLCoverage.from_arg_list,
             "MERGE": GIQLMerge.from_arg_list,
             "DISTANCE": GIQLDistance.from_arg_list,
             "NEAREST": GIQLNearest.from_arg_list,

diff --git a/src/giql/expressions.py b/src/giql/expressions.py
@@ -142,6 +142,55 @@ def from_arg_list(cls, args):
         return cls(**kwargs)
 
 
+class GIQLCoverage(exp.Func):
+    """COVERAGE aggregate function for binned genome coverage.
+
+    Tiles the genome into fixed-width bins and aggregates overlapping
+    intervals per bin using generate_series and JOIN + GROUP BY.
+
+    Examples:
+        COVERAGE(interval, 1000)
+        COVERAGE(interval, 500, stat := 'mean')
+        COVERAGE(interval, resolution := 1000)
+        COVERAGE(interval, 1000, stat := 'mean', target := 'score')
+    """
+
+    arg_types = {
+        "this": True,  # genomic column
+        "resolution": True,  # bin width (positional or named)
+        "stat": False,  # aggregation: 'count', 'mean', 'sum', 'min', 'max'
+        "target": False,  # column to aggregate (default: interval length)
+    }
+
+    @classmethod
+    def from_arg_list(cls, args):
+        """Parse argument list, handling named parameters.
+
+        :param args: List of arguments from parser
+        :return: GIQLCoverage instance with properly mapped arguments
+        """
+        kwargs = {}
+        positional_args = []
+
+        # Separate named (PropertyEQ for :=, Kwarg for =>) and positional arguments
+        for arg in args:
+            if isinstance(arg, (exp.PropertyEQ, exp.Kwarg)):
+                param_name = (
+                    arg.this.name if hasattr(arg.this, "name") else str(arg.this)
+                )
+                kwargs[param_name.lower()] = arg.expression
+            else:
+                positional_args.append(arg)
+
+        # Map positional arguments
+        if len(positional_args) > 0:
+            kwargs["this"] = positional_args[0]
+        if len(positional_args) > 1:
+            kwargs["resolution"] = positional_args[1]
+
+        return cls(**kwargs)
+
+
 class GIQLDistance(exp.Func):
     """DISTANCE function for calculating genomic distances between intervals.