Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 27 additions & 6 deletions docs/GettingStarted/Environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,19 @@ description: How to set up environment variables for DevLake
This document explains how to set environment variables for Apache DevLake and what environment variables can be set.

## Environment Variables

### ENABLE_SUBTASKS_BY_DEFAULT

This environment variable is used to enable or disable the execution of subtasks.

#### How to set

The format is as follows: plugin_name1:subtask_name1:enabled_value,plugin_name2:subtask_name2:enabled_value,plugin_name3:subtask_name3:enabled_value

Guidance on locating the [plugin_name and subtask_name](https://github.com/apache/incubator-devlake/blob/release-v1.0/backend/plugins/jira/tasks/issue_changelog_collector.go#L41):

- plugin_name: Represents the plugin's name, such as 'jira' for the Jira plugin.
- subtask_name: Denotes the subtask's name, like 'collectIssueChangelogs' for the Jira plugin."
- subtask_name: Denotes the subtask's name, like 'collectIssueChangelogs' for the Jira plugin."

Example 1: Enable some subtasks that are closed by default

Expand All @@ -25,18 +28,36 @@ ENABLE_SUBTASKS_BY_DEFAULT="jira:collectIssueChangelogs:true,jira:extractIssueCh
```

Example 2: Close some subtasks that are executed by default

```shell
ENABLE_SUBTASKS_BY_DEFAULT="github_graphql:Collect Job Runs:false,github_graphql:Extract Job Runs:false,github_graphql:Convert Job Runs:false"
```

#### How to take effect
After setting the environment variable, restart the DevLake service to take effect.
- For Docker Compose, run `docker-compose down` and `docker-compose up -d`.
- For Helm, run `helm upgrade devlake devlake/devlake --recreate-pods`.
### GITHUB_GRAPHQL_JOB\_...

This set of environment variables is used to configure and finetune the behavior of the GitHub GraphQL Job Runs collection process.

| Environment Variable | Description | Default Value |
| --------------------------------------- | ------------------------------------------------------------------------------------- | ------------- |
| GITHUB_GRAPHQL_JOB_COLLECTION_MODE | Specifies the mode of job collection. Possible values are `BATCHING` and `PAGINATING` | `BATCHING` |
| GITHUB_GRAPHQL_JOB_BATCHING_INPUT_STEP | Defines the step size for batching mode. | `10` |
| GITHUB_GRAPHQL_JOB_BATCHING_PAGE_SIZE | Defines the limit of jobs to collect in a batch for each run. | `20` |
| GITHUB_GRAPHQL_JOB_PAGINATING_PAGE_SIZE | Defines the page size for paginating mode. | `50` |

#### When to Use

These environment variables are particularly useful when dealing with large repositories that have a significant number of job runs. By adjusting these settings, you can optimize the data collection process to better suit your specific needs and infrastructure capabilities. Also this can help to avoid timeouts on the github GraphQL API with too large requests.

- Use `BATCHING` for `GITHUB_GRAPHQL_JOB_COLLECTION_MODE` when your workflow runs typically have less than 20 jobs and you want to minimize the number of API calls to GitHub.
- Adjust `GITHUB_GRAPHQL_JOB_BATCHING_INPUT_STEP` and `GITHUB_GRAPHQL_JOB_BATCHING_PAGE_SIZE` to control how many jobs are collected in each batch. **NOTE:** Increasing these values can lead to timeouts if the requests become too large.
- Use `PAGINATING` for `GITHUB_GRAPHQL_JOB_COLLECTION_MODE` when your workflow runs have a large number of jobs (e.g., more than 50). This mode will only query 1 Workflow run at a time and paginate through the jobs, reducing the risk of timeouts.
- Adjust `GITHUB_GRAPHQL_JOB_PAGINATING_PAGE_SIZE` to control how many jobs are fetched per page. A smaller page size can help avoid timeouts but may increase the total number of API calls.

TLDR: `BATCHING` is more efficient for smaller workflows, while `PAGINATING` will guarantee complete collection of jobs for larger workflows.

## How to take effect

After setting the environment variable, restart the DevLake service to take effect.

- For Docker Compose, run `docker-compose down` and `docker-compose up -d`.
- For Helm, run `helm upgrade devlake devlake/devlake --recreate-pods`.
1 change: 1 addition & 0 deletions docs/Plugins/github.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Metrics that can be calculated based on the data collected from GitHub:

- Configuring GitHub via [Config UI](/Configuration/GitHub.md)
- Configuring GitHub via Config UI's [advanced mode](/Configuration/AdvancedMode.md#1-github).
- Configurable via [Environment Variables](/GettingStarted/Environment.md#github_graphql_job_...).

## API Sample Request

Expand Down