Skip to content

Commit 37be210

Browse files
authored
Merge pull request #70 from nasaharvest/clean-generate
Version 0.1.0
2 parents 39f3a33 + 709bdad commit 37be210

72 files changed

Lines changed: 867 additions & 4367 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/buildings-example-test.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,7 @@ jobs:
3030
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
3131
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
3232
run: |
33-
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
34-
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
35-
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
33+
dvc pull $(openmapflow datapath DATASETS) -f
3634
dvc pull $(openmapflow datapath MODELS) -f
3735
3836
- name: Integration test - Project

.github/workflows/crop-mask-example-test.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,7 @@ jobs:
3030
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
3131
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
3232
run: |
33-
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
34-
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
35-
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
33+
dvc pull $(openmapflow datapath DATASETS) -f
3634
dvc pull $(openmapflow datapath MODELS) -f
3735
3836
- name: Integration test - Project

.github/workflows/maize-example-test.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,7 @@ jobs:
3030
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
3131
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
3232
run: |
33-
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
34-
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
35-
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
33+
dvc pull $(openmapflow datapath DATASETS) -f
3634
dvc pull $(openmapflow datapath MODELS) -f
3735
3836
- name: Integration test - Project

README.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -88,19 +88,22 @@ After all configuration is set, the following project structure will be generate
8888
8989
└─── data
9090
│ raw_labels/ # User added labels
91-
│ processed_labels/ # Labels standardized to common format
92-
│ features/ # Labels combined with satellite data
93-
│ compressed_features.tar.gz # Allows faster features downloads
94-
│ models/ # Models trained using features
91+
│ datasets/ # ML ready datasets (labels + earth observation data)
92+
│ models/ # Models trained using datasets
9593
| raw_labels.dvc # Reference to a version of raw_labels/
96-
| processed_labels.dvc # Reference to a version of processed_labels/
97-
│ compressed_features.tar.gz.dvc # Reference to a version of features/
94+
| datasets.dvc # Reference to a version of datasets/
9895
│ models.dvc # Reference to a version of models/
9996
10097
```
10198

10299
This project contains all the code necessary for: Adding data ➞ Training a model ➞ Creating a map.
103100

101+
**Important:** When code is pushed to the repository a Github action will be run to verify project configuration, data integrity, and script functionality. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
102+
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
103+
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)
104+
105+
After this the Github action should successfully run.
106+
104107

105108
## Adding data [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/new_data.ipynb)
106109

@@ -134,25 +137,20 @@ datasets = [
134137
...
135138
]
136139
```
137-
Run feature creation:
140+
Run dataset creation:
138141
```bash
139142
earthengine authenticate # For getting new earth observation data
140143
gcloud auth login # For getting cached earth observation data
141144

142-
openmapflow create-features # Initiatiates or checks progress of features creation
145+
openmapflow create-dataset # Initiatiates or checks progress of dataset creation
143146
openmapflow datasets # Shows the status of datasets
144147

145148
dvc commit && dvc push # Push new data to data version control
146149

147150
git add .
148-
git commit -m'Created new features'
151+
git commit -m'Created new dataset'
149152
git push
150153
```
151-
**Important:** When new data is pushed to the repository a Github action will be run to verify data integrity. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
152-
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
153-
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)
154-
155-
After this the Github action should successfully run if the data is valid.
156154

157155

158156
## Training a model [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/train.ipynb)

buildings-example/data/.gitignore

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1+
/datasets
12
/raw_labels
2-
/processed_labels
3-
/compressed_features.tar.gz
43
/models
5-
/features

buildings-example/data/compressed_features.tar.gz.dvc

Lines changed: 0 additions & 4 deletions
This file was deleted.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
outs:
2+
- md5: db853058c80b597bb44bfc0ecf37866f.dir
3+
size: 121467360
4+
nfiles: 2
5+
path: datasets

buildings-example/data/duplicates.txt

Lines changed: 0 additions & 4 deletions
This file was deleted.

buildings-example/data/missing.txt

Lines changed: 0 additions & 133 deletions
This file was deleted.

buildings-example/data/processed_labels.dvc

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)