Skip to content

Latest commit

 

History

History
144 lines (99 loc) · 5.23 KB

File metadata and controls

144 lines (99 loc) · 5.23 KB

alpha-bhu

alpha-bhu is a Python toolkit for clustering and classifying AlphaEarth Foundation embedding rasters for land-use analysis.

The repository is built with nbdev and jupytext: the checked-in source of truth lives in the Jupytext Markdown files under nbs/, notebook files are paired/generated on demand, and the importable package is generated into alpha_bhu/.

What it does

  • Loads 64-band AlphaEarth Foundation embedding rasters from GeoTIFF/COG files.
  • Reshapes and validates embeddings for clustering workflows.
  • Runs FAISS-based clustering across multiple k values.
  • Evaluates segmentations with nesting and spatial quality metrics.
  • Organizes multiple segmentations with the SegSet abstraction.
  • Assigns colors and exports cluster rasters plus legends for downstream mapping.

Repository layout

  • alpha_bhu/: generated Python package.
  • nbs/: checked-in Jupytext Markdown notebook sources for nbdev.
  • data/: local example data, including aef_3.5k_roi_cog.tif.
  • pyproject.toml: package metadata and tool configuration.
  • settings.ini: nbdev project settings.

Requirements

  • Python >=3.12
  • uv recommended for environment management

Core dependencies include faiss-cpu, rasterio, geopandas, numpy, polars, scikit-learn, altair, and ipyleaflet.

Installation

Create the environment and install the package with development dependencies:

uv sync --extra dev

If you only want the runtime package:

uv sync

Quick start

This is the highest-level workflow currently exposed by the package:

from pathlib import Path

from alpha_bhu.segset_workflow import SegSetWorkflow

cog_path = Path("data/aef_3.5k_roi_cog.tif")

workflow = SegSetWorkflow.from_cog(cog_path)
best_k = workflow.select_optimal_k(
    low_k_range=[8, 10, 12, 15, 18, 20],
    high_k_range=[40, 50, 60, 70, 80, 90],
)

results = workflow.export_results(Path("outputs"))
print("Best k:", best_k)
print("Exported files:", results["exported_files"])

For lower-level usage:

from pathlib import Path

from alpha_bhu.data import load_aef_embeddings, reshape_for_clustering
from alpha_bhu.segset import SegSet

embeddings, metadata = load_aef_embeddings(Path("data/aef_3.5k_roi_cog.tif"))
embeddings_flat = reshape_for_clustering(embeddings)

segset = SegSet.from_embeddings(embeddings_flat, metadata["shape"])
segset = segset.with_kmeans_range([8, 10, 12], random_state=42, verbose=True)

quality = segset.spatial_quality("k10_s42")
print(quality)

Main modules

Data notes

The repository currently includes local sample assets under data/, including:

  • data/aef_3.5k_roi_cog.tif
  • data/cluster_animation/

Development workflow

Because this project uses nbdev with Jupytext pairing (ipynb,md), edit the Markdown notebook sources in nbs/, not the generated files in alpha_bhu/.

In practice, the workflow is:

  • keep the Jupytext Markdown files in nbs/ under version control
  • generate or sync notebook .ipynb files when needed for notebook work
  • export Python modules from the notebook sources with nbdev
  • run blacken-docs on the checked-in Markdown sources, not on .ipynb notebooks

Typical development loop:

uv sync --extra dev
uv run nbdev_export
uv run nbdev_test
uv run blacken-docs .
uv run ruff check .
uv run mypy alpha_bhu

Useful commands:

uv run jupytext --sync nbs/*.md   # sync paired notebook files on demand
uv run nbdev_export          # regenerate package code from notebooks
uv run nbdev_docs            # build docs into _docs/
uv run blacken-docs README.md nbs/*.md   # format code examples in checked-in Markdown sources
uv run jupyter lab           # work directly in notebooks

Current status

This repository is still in an early stage:

  • package metadata is alpha-quality
  • tests are not yet set up as a first-class workflow
  • some features are notebook-oriented and assume local data availability

License

Apache 2.0. See LICENSE.