alpha-bhu

alpha-bhu is a Python toolkit for clustering and classifying AlphaEarth Foundation embedding rasters for land-use analysis.

The repository is built with nbdev and jupytext: the checked-in source of truth lives in the Jupytext Markdown files under nbs/, notebook files are paired/generated on demand, and the importable package is generated into alpha_bhu/.

What it does

Loads 64-band AlphaEarth Foundation embedding rasters from GeoTIFF/COG files.
Reshapes and validates embeddings for clustering workflows.
Runs FAISS-based clustering across multiple k values.
Evaluates segmentations with nesting and spatial quality metrics.
Organizes multiple segmentations with the SegSet abstraction.
Assigns colors and exports cluster rasters plus legends for downstream mapping.

Repository layout

alpha_bhu/: generated Python package.
nbs/: checked-in Jupytext Markdown notebook sources for nbdev.
data/: local example data, including aef_3.5k_roi_cog.tif.
pyproject.toml: package metadata and tool configuration.
settings.ini: nbdev project settings.

Requirements

Python >=3.12
uv recommended for environment management

Core dependencies include faiss-cpu, rasterio, geopandas, numpy, polars, scikit-learn, altair, and ipyleaflet.

Installation

Create the environment and install the package with development dependencies:

uv sync --extra dev

If you only want the runtime package:

uv sync

Quick start

This is the highest-level workflow currently exposed by the package:

from pathlib import Path

from alpha_bhu.segset_workflow import SegSetWorkflow

cog_path = Path("data/aef_3.5k_roi_cog.tif")

workflow = SegSetWorkflow.from_cog(cog_path)
best_k = workflow.select_optimal_k(
    low_k_range=[8, 10, 12, 15, 18, 20],
    high_k_range=[40, 50, 60, 70, 80, 90],
)

results = workflow.export_results(Path("outputs"))
print("Best k:", best_k)
print("Exported files:", results["exported_files"])

For lower-level usage:

from pathlib import Path

from alpha_bhu.data import load_aef_embeddings, reshape_for_clustering
from alpha_bhu.segset import SegSet

embeddings, metadata = load_aef_embeddings(Path("data/aef_3.5k_roi_cog.tif"))
embeddings_flat = reshape_for_clustering(embeddings)

segset = SegSet.from_embeddings(embeddings_flat, metadata["shape"])
segset = segset.with_kmeans_range([8, 10, 12], random_state=42, verbose=True)

quality = segset.spatial_quality("k10_s42")
print(quality)

Main modules

alpha_bhu/data.py: raster loading, reshaping, embedding validation.
alpha_bhu/clustering.py: FAISS clustering and nesting analysis helpers.
alpha_bhu/cluster_qa.py: spatial quality checks for cluster rasters.
alpha_bhu/segset.py: immutable segmentation collection and workflow helpers.
alpha_bhu/segset_workflow.py: end-to-end orchestration for k-selection and export.
alpha_bhu/export.py: GeoTIFF and legend export utilities.
alpha_bhu/land_cover.py: land-cover core extraction for manual labeling workflows.

Data notes

The repository currently includes local sample assets under data/, including:

data/aef_3.5k_roi_cog.tif
data/cluster_animation/

Development workflow

Because this project uses nbdev with Jupytext pairing (ipynb,md), edit the Markdown notebook sources in nbs/, not the generated files in alpha_bhu/.

In practice, the workflow is:

keep the Jupytext Markdown files in nbs/ under version control
generate or sync notebook .ipynb files when needed for notebook work
export Python modules from the notebook sources with nbdev
run blacken-docs on the checked-in Markdown sources, not on .ipynb notebooks

Typical development loop:

uv sync --extra dev
uv run nbdev_export
uv run nbdev_test
uv run blacken-docs .
uv run ruff check .
uv run mypy alpha_bhu

Useful commands:

uv run jupytext --sync nbs/*.md   # sync paired notebook files on demand
uv run nbdev_export          # regenerate package code from notebooks
uv run nbdev_docs            # build docs into _docs/
uv run blacken-docs README.md nbs/*.md   # format code examples in checked-in Markdown sources
uv run jupyter lab           # work directly in notebooks

Current status

This repository is still in an early stage:

package metadata is alpha-quality
tests are not yet set up as a first-class workflow
some features are notebook-oriented and assume local data availability

License

Apache 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpha-bhu

What it does

Repository layout

Requirements

Installation

Quick start

Main modules

Data notes

Development workflow

Current status

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

alpha-bhu

What it does

Repository layout

Requirements

Installation

Quick start

Main modules

Data notes

Development workflow

Current status

License