Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 59 additions & 34 deletions community/infrastructure/intro.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,53 @@
(github-intro)=
# pyOpenSci Infrastructure


pyOpenSci uses GitHub to manage almost all of its infrastructure, from community processes to website rendering.
This page provides an overview of our core repositories, how they work together, and how they contribute to the website and community operations.
This page provides a **high-level overview** of our infrastructure, focusing on how our core repositories work together and contribute to the website and community operations.

For detailed information about specific infrastructure components, see the [Learn more](#learn-more) section below.

## What is pyOpenSci infrastructure?

pyOpenSci infrastructure encompasses:

[Learn more about all of our repos here.](github-repos-overview)
* **GitHub repositories:** All code, content, and documentation repositories
* **Website and documentation:** Main website and sub-sites (handbook, guides, lessons)
* **Data processing:** Automated collection and processing of contributor and peer review data
* **Continuous Integration (CI):** GitHub Actions workflows for testing, building, and deploying
* **Access and permissions:** Repository access management and team structures
* **Issue and pull request workflows:** Processes for managing contributions and reviews

## Infrastructure overview diagrams

```{mermaid}
%%{ init: { "theme": "default", "themeVariables": { "fontSize": "200%" } } }%%
The diagrams below illustrate two key aspects of our infrastructure:

flowchart TD
### Data flow and processing

The first diagram shows how peer review data is extracted from GitHub issues through our automated processing system to update the website:

```{figure} /images/diagrams/website-diagram.svg
:name: website-diagram

pyOpenSci infrastructure data flow diagram showing how peer review issues are processed through pyosMeta to update the website.
```

%% Note above Software Review
Note["Peer review happens<br>in GitHub issues"]
This diagram illustrates the automated workflow: peer review happens in GitHub issues, which are parsed by scripts in the `pyosMeta` package to generate YAML files that automatically update the website's package and contributor pages.

%% Top: software-review repo
A(<b>Software Review</b><br><sub>github.com/pyOpenSci/software-review</sub>) --> B{{pyosMeta<br><sub>github.com/pyOpenSci/pyosMeta</sub><br><i>A Python package that processes review<br>and contributor data and creates YML files</i>}}
C(<b>pyOpenSci Website</b><br><sub>pyopensci.github.io</sub>)
A --> |"pyosMeta parses review issues<br>and contributor metadata<br>to create YAML files"| B
B --> |"_data/contributors.yml & packages.yml are used to populate website package and contributor pages."| C
### Website structure

%% Website outputs to 4 child "books"
C --> D1(Handbook<br><sub>github.com/pyOpenSci/handbook<br>SPHINX</sub>)
C --> D2(Python Package Guide<br><sub>github.com/pyOpenSci/python-package-guide<br>SPHINX</sub>)
C --> D3(Software Peer Review Guide<br><sub>github.com/pyOpenSci/software-peer-review-guidebook<br>SPHINX</sub>)
C --> D4(Lessons<br><sub>github.com/pyOpenSci/software-peer-review-guidebook<br>SPHINX</sub>)
C --> D5(Peer Review Metrics<br><sub>github.com/pyOpenSci/peer-review-metrics<br>QUARTO</sub>)
The second diagram shows how the main pyOpenSci website connects to its sub-sites:

%% Style for Website box (light purple)
style C fill:#f3e8ff,stroke:#7e22ce,stroke-width:1px
style B fill:#fef9c3,stroke:#ca8a04,stroke-width:1px
style Note fill:#f3f4f6,stroke:#9ca3af,stroke-width:1px
```{figure} /images/diagrams/website-repositories-structure.svg
:name: website-repositories-structure

pyOpenSci website structure diagram showing the main website and its sub-sites (Handbook, Python Package Guide, Software Peer Review Guide, Lessons, and Metrics).
```
## pyOpenSci data flow and continuous integration

All sub-sites are built separately but served under the `pyopensci.org` domain, with the main website (`pyopensci.github.io`) serving as the central hub.

## Data flow and continuous integration

In simple terms: pyOpenSci uses automated workflows to collect data from GitHub and automatically update our website.

pyOpenSci uses a set of **Continuous Integration (CI)** jobs (GitHub Actions) to:

Expand All @@ -44,24 +56,37 @@ pyOpenSci uses a set of **Continuous Integration (CI)** jobs (GitHub Actions) to

The [`pyosMeta`](https://github.com/pyOpenSci/pyosMeta) package is a Python package that **parses review and contributor data** and transforms it into **machine-readable YAML files** used by our website.

### Summary of flow
### How data flows through our system

* `pyosMeta` parses the **Markdown data** within review issues in the [`software-review`](https://github.com/pyOpenSci/software-review) GitHub repository. It:
* Gathers review editors, reviewers, and maintainers’ GitHub usernames, and uses the GitHub API to retrieve contributor names, emails, and other public GitHub profile information
* Extracts the GitHub URL of each reviewed package and retrieves basic repository statistics (number of forks, stars, contributors)
* Stores this peer review information in `packages.yml`
* Gathers review editors, reviewers, and maintainers’ GitHub usernames, and uses the GitHub API to retrieve contributor names, emails, and other public GitHub profile information
* Extracts the GitHub URL of each reviewed package and retrieves basic repository statistics (number of forks, stars, contributors)
* Stores this peer review information in `packages.yml`

* `pyosMeta` also parses **contributor data** from across all pyOpenSci repositories. It:
* Parses `all-contributors` bot files to compile a list of contributors and their associated repositories/projects
* Parses peer review metadata to populate roles such as reviewers, editors, and other contributor roles within our organization
* Stores this contributor information in `contributors.yml`
* Parses `all-contributors` bot files to compile a list of contributors and their associated repositories/projects
* Parses peer review metadata to populate roles such as reviewers, editors, and other contributor roles within our organization
* Stores this contributor information in `contributors.yml`

* The `packages.yml` and `contributors.yml` files generated by `pyosMeta` are updated **daily** via a GitHub Action **cron job** in the [`pyopensci.github.io`](https://github.com/pyOpenSci/pyopensci.github.io/tree/main/_data) repository. This data is used to populate:
* The **Our Community** page
* The **Packages** page
* The **Our Community** page
* The **Packages** page

For more detailed information about data collection and processing, see the [Data Workflows](data-process) page.

### Sphinx books and publishing
### Website publishing

* The **Python Package Guide**, **Peer Review Guide**, and **Handbook** are all **Sphinx books** that use the `pydata_sphinx_theme`. These books are built separately but are served under the `pyopensci.org` domain.
* All Sphinx books use the [`pyos-sphinx-theme`](https://github.com/pyOpenSci/pyos-sphinx-theme`), which is a Sphinx theme built on top of `pydata_sphinx_theme`.
* The final site is published at [pyopensci.org](https://www.pyopensci.org) using **GitHub Pages**.

## Learn more

This page provides a high-level overview. For detailed information about specific infrastructure components, see:

* **[All repositories](our-repositories):** Complete list and description of all pyOpenSci GitHub repositories
* **[Data workflows](data-process):** Detailed information about data collection and processing
* **[Continuous Integration](continuous-integration):** CI/CD workflows and GitHub Actions
* **[Permissions](permissions):** Repository access management and team structures
* **[Pull requests](pull-requests):** How to work with pull requests in pyOpenSci repos
* **[Issues](issues):** Issue management and labeling workflows
Loading
Loading