From 25c0545fc3ecbfbfc6c030c3c3543603155a36a3 Mon Sep 17 00:00:00 2001 From: Jared Lewis Date: Wed, 30 Oct 2024 13:51:40 +1100 Subject: [PATCH 1/3] docs: Package structure RFC --- text/0001-package-structure.md | 121 +++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 text/0001-package-structure.md diff --git a/text/0001-package-structure.md b/text/0001-package-structure.md new file mode 100644 index 0000000..0ff9911 --- /dev/null +++ b/text/0001-package-structure.md @@ -0,0 +1,121 @@ +- Feature Name: `package-structure` +- Start Date: 2024-10-24 +- RFC PR: [CMIP-REF/rfcs#0001](https://github.com/CMIP-REF/rfcs/pull/0001) + +# Summary +[summary]: #summary + +This RFC proposes a standard package structure for REF packages. +The goal is to make it easier for developers to find and understand the contents of a REF package. + +# Motivation +[motivation]: #motivation + +* At the start of this project we will be in a state of flux until we have a working prototype. + APIs will change, and we will be adding and removing features. +* Each metrics package will come with its own set of dependencies. Odds are that they will clash with other packages. +* We want something that is easy to extend in future, incorporating new metrics providers/metrics. +* Consistent tooling across the project + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +It is proposed that we start with a mono-repo structure consisting of multiple packages, +each with their own set of dependencies. +The tooling and CI will be shared across all projects. +There will be a `packages` directory at the root of the repo, with each package in its own subdirectory. + +Each package will have a `README.md` file that describes the purpose of the package and how to use it, +its own `pyproject.toml` describing the package and it's dependencies, +and their own set of tests. +The test suite for each package is run with only the dependencies required for that package. + +An example has been implemented in [CMIP-REF/cmip-ref#1](https://github.com/CMIP-REF/cmip-ref/pull/0001), +which contains a `ref-core` package and an example `ref-metrics-example` package. + +The purpose of this structure is to split the project into smaller, more manageable parts +and to differentiate between the "library" part of the project which is reusable +and the "application" which builds ontop of the library + metrics packages + additional services. + +An additional benefit is that it will be easier to refactor code if it is all accessible in the same repository. + +## Packages +### `ref-core` +This package will be a library containing the core functionality of the REF. +This package will be a dependency of all other packages as it will describe the interfaces that metrics providers must implement. +This allows us to keep the core functionality separate from the metrics providers. + +### `ref-metrics-*` +Each metric provider will implement a separate package that implements the interfaces described by the `ref-core` package. + +The purpose of these packages are to provide a Python-based hook to execute a metric and return the results. +This will include (hopefully thin) wrappers around the metric providers existing APIs +or build the required CMEC configuration files. + +These packages will maintain their own dependencies (python and otherwise) and packaged up as a docker image. + +The package will be named `ref-metrics-` where `` is the name of the provider. + +### `ref-api` (TBD) +This application package will bring together the core and metrics packages to provide a service that +responds to new data submissions, +tracks the execution of metrics and the results. +The package may have indirect dependencies on the metrics packages to avoid conflicting dependencies. +That adds additional complexity, but will be more flexible in the long run. + +This is the service that will be deployed to ESGF. + +Whether this application can be done solely via a CLI-based tool is up for debate. + +# Drawbacks +[drawbacks]: #drawbacks + +* More people working in the same repository may lead to more merge conflicts. +* Larger repository with more moving parts + + +Another drawback is that it will be harder to manage different package managers. +The project currently uses [uv](https://docs.astral.sh/uv/) as a package manager as the core and framework +will not require any complex to install dependencies, +but science packages may require more complex dependencies. +There is nothing stopping the use of something like [pixi](https://pixi.sh/dev/) when conda-based +dependencies are required. + +The CI times may become longer. This may be somewhat insidious +as at the beginning the developer experience will be great. +This will gradually degrade as the number of packages/providers increases. +The test for each provider can be run as a separate job. +We may also choose to target a single Python version if it becomes an issue. + + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Splitting out each package to a separate repository is a possibility +and is a very common approach in the Python community. +The tooling for managing mono-repos is more mature than it was a few years ago which enables the proposed approach. +The downside of that is that changes across multiple repositories will be required as we add new features. +- A single package with extras. +The downside of this is that it will be harder to incorporate new providers as they will be incorporated +into the main package. +The framework will be an application rather than a library so pip installing it will not be as useful. + +# Prior art +[prior-art]: #prior-art + +I've tried this out in another [recent project](https://github.com/climate-resource/bookshelf) which +published multiple packages to PyPI from a single repository. +It worked well for that project as it enabled clearer separations and reuse of parts of the codebase +without depending on a much larger package. + + +# Unresolved questions +[unresolved-questions]: #unresolved-questions +None + +# Future possibilities +[future-possibilities]: #future-possibilities + +Once the project has matured, we can consider splitting the packages into separate repositories. +Each package can then have its own tooling and versioning. +That would depend on packaging and publishing the `ref-core` package to PyPI and conda-forge. From 8efd4f46ea133f1e21f1e207431ac2afa2094f79 Mon Sep 17 00:00:00 2001 From: Jared Lewis Date: Wed, 30 Oct 2024 14:01:43 +1100 Subject: [PATCH 2/3] chore: Tweaks --- text/0001-package-structure.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/text/0001-package-structure.md b/text/0001-package-structure.md index 0ff9911..62b6a59 100644 --- a/text/0001-package-structure.md +++ b/text/0001-package-structure.md @@ -24,14 +24,15 @@ It is proposed that we start with a mono-repo structure consisting of multiple p each with their own set of dependencies. The tooling and CI will be shared across all projects. There will be a `packages` directory at the root of the repo, with each package in its own subdirectory. +An example has been implemented in [CMIP-REF/cmip-ref#1](https://github.com/CMIP-REF/cmip-ref/pull/0001), +which contains a `ref-core` package and an example `ref-metrics-example` package. Each package will have a `README.md` file that describes the purpose of the package and how to use it, its own `pyproject.toml` describing the package and it's dependencies, and their own set of tests. The test suite for each package is run with only the dependencies required for that package. -An example has been implemented in [CMIP-REF/cmip-ref#1](https://github.com/CMIP-REF/cmip-ref/pull/0001), -which contains a `ref-core` package and an example `ref-metrics-example` package. +A common set of integration tests will also be run to ensure that the packages work together. The purpose of this structure is to split the project into smaller, more manageable parts and to differentiate between the "library" part of the project which is reusable @@ -52,6 +53,9 @@ The purpose of these packages are to provide a Python-based hook to execute a me This will include (hopefully thin) wrappers around the metric providers existing APIs or build the required CMEC configuration files. +The repository will set up a CI pipeline that will run the unit tests for each package. +This includes fetching some sample data. + These packages will maintain their own dependencies (python and otherwise) and packaged up as a docker image. The package will be named `ref-metrics-` where `` is the name of the provider. @@ -63,7 +67,7 @@ tracks the execution of metrics and the results. The package may have indirect dependencies on the metrics packages to avoid conflicting dependencies. That adds additional complexity, but will be more flexible in the long run. -This is the service that will be deployed to ESGF. +This is the service that will be deployed to ESGF alongside any additional services that are required. Whether this application can be done solely via a CLI-based tool is up for debate. @@ -71,7 +75,7 @@ Whether this application can be done solely via a CLI-based tool is up for debat [drawbacks]: #drawbacks * More people working in the same repository may lead to more merge conflicts. -* Larger repository with more moving parts +* Larger repository with more moving parts/concepts Another drawback is that it will be harder to manage different package managers. @@ -79,7 +83,7 @@ The project currently uses [uv](https://docs.astral.sh/uv/) as a package manager will not require any complex to install dependencies, but science packages may require more complex dependencies. There is nothing stopping the use of something like [pixi](https://pixi.sh/dev/) when conda-based -dependencies are required. +dependencies are required for a particular package. The CI times may become longer. This may be somewhat insidious as at the beginning the developer experience will be great. @@ -87,10 +91,13 @@ This will gradually degrade as the number of packages/providers increases. The test for each provider can be run as a separate job. We may also choose to target a single Python version if it becomes an issue. - # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives +A mono-repo structure will enable the rapid development of the project while it is immature +and in a state of flux. +It will be easier to refactor code as it is all in the same repository. + - Splitting out each package to a separate repository is a possibility and is a very common approach in the Python community. The tooling for managing mono-repos is more mature than it was a few years ago which enables the proposed approach. @@ -108,10 +115,10 @@ published multiple packages to PyPI from a single repository. It worked well for that project as it enabled clearer separations and reuse of parts of the codebase without depending on a much larger package. - # Unresolved questions [unresolved-questions]: #unresolved-questions -None + +* An example that uses `pixi` to install conda packages would be useful as a proof of concept. # Future possibilities [future-possibilities]: #future-possibilities From d79d176c9328565af7ee9aa38522d11755ed19cc Mon Sep 17 00:00:00 2001 From: Jared Lewis Date: Wed, 30 Oct 2024 16:59:28 +1100 Subject: [PATCH 3/3] docs: Tweak following comments by Forrest --- text/0001-package-structure.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/text/0001-package-structure.md b/text/0001-package-structure.md index 62b6a59..f78e326 100644 --- a/text/0001-package-structure.md +++ b/text/0001-package-structure.md @@ -5,8 +5,10 @@ # Summary [summary]: #summary -This RFC proposes a standard package structure for REF packages. -The goal is to make it easier for developers to find and understand the contents of a REF package. +This RFC proposes how we should structure the git repository to enable the rapid development of the project while it is immature. +The core of the project will have some reuse outside of the REF frontend +so it should be publishable as a separate package. +The goal is to make it easier for users to extend the project with new metrics providers. # Motivation [motivation]: #motivation @@ -48,10 +50,15 @@ This allows us to keep the core functionality separate from the metrics provider ### `ref-metrics-*` Each metric provider will implement a separate package that implements the interfaces described by the `ref-core` package. +These will very thin packages that wrap around the existing APIs of the benchmarking packages or use CMEC to execute. + +An example package `ref-metrics-example` has been implemented in [CMIP-REF/cmip-ref#5](https://github.com/CMIP-REF/cmip-ref/pull/5). +See the [code](https://github.com/CMIP-REF/cmip-ref/tree/basic-interface/packages/ref-metrics-example) for more details. The purpose of these packages are to provide a Python-based hook to execute a metric and return the results. This will include (hopefully thin) wrappers around the metric providers existing APIs or build the required CMEC configuration files. +It should require little to no change to the existing codebase benchmarking packages. The repository will set up a CI pipeline that will run the unit tests for each package. This includes fetching some sample data.