4 band semantic segmentation using RasterVision (pdf) to detect Impervious Surfaces in the Highlands Region.
An example output is the 2020 Impervious Surface of the NJ Highlands Region:.
Highlands-Rastervision adheres to Cloud Native GeoSpatial Processing thanks in part to NJGIN hosting its aerial imagery in a Cloud Optimized GeoTiff format via S3.
These steps are deployed to AWS via EC2 Spot Instances with generated data written to an S3 bucket. The EC2 Spot Instance Advisor should be used to verify selected instance has a low Frequency of Interruption.
Other AWS services may be more ideal to run Rastervision (S3 Batch & SageMaker for example), but this process proved cost and time effective.
You will need an AWS Account, Python and the AWS-CLI installed locally.
- Create labels and validation scenes from target sets of imagery. For each image, the label file must match the name of the grid id.
- You can optionally identify an Area of Interest for each image with the naming convention <grid_id>_aoi.geojson.
- Once labels have been created for the image, save to the ./labels/ directory with the following naming convention: _labels.geojson.
- The grid id should then be added into the associated env file's
TRAIN_IDSorVAL_IDSsections.
Once the labels have been created, train the model using the following command(s) for running in AWS:
# edit the shell script if desired in aws/train/<year>/userdata.sh, then encode the script as base64:
openssl base64 -A -in ./aws/train/2020/userdata.sh -out ./aws/train/2020/userdata.txt
openssl base64 -A -in ./aws/train/2015/userdata.sh -out ./aws/train/2015/userdata.txt
openssl base64 -A -in ./aws/train/2012/userdata.sh -out ./aws/train/2012/userdata.txt
openssl base64 -A -in ./aws/train/2007/userdata.sh -out ./aws/train/2007/userdata.txt
openssl base64 -A -in ./aws/train/2002/userdata.sh -out ./aws/train/2002/userdata.txt
# copy the contents of the output text file to the associated spec.json file's "userdata" property, then launch an EC2 instance to do the training
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/train/2020/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 10 --type "one-time" --launch-specification file://aws/train/2015/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/train/2012/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/train/2007/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/train/2002/spec.json
# connect to the EC2 and monitor the process with
sudo tail -f /var/log/cloud-init-output.logYou can continue to train a model for additional epochs by increasing this value in the *src/impervious.py file if desired.
To retrain from scratch, delete the contents of the train directory and re-run the above command.
Predictions are run in AWS on EC2 Spot instances with the output vectorized results written to a s3 bucket.
# edit the shell script if desired in aws/predict/<year>/userdata.sh, then encode the script as base64:
openssl base64 -A -in ./aws/predict/2020/userdata.sh -out ./aws/predict/2020/userdata.txt
openssl base64 -A -in ./aws/predict/2015/userdata.sh -out ./aws/predict/2015/userdata.txt
openssl base64 -A -in ./aws/predict/2012/userdata.sh -out ./aws/predict/2012/userdata.txt
openssl base64 -A -in ./aws/predict/2007/userdata.sh -out ./aws/predict/2007/userdata.txt
openssl base64 -A -in ./aws/predict/2002/userdata.sh -out ./aws/predict/2002/userdata.txt
# copy the contents of the output text file to the associated spec.json file's "userdata" property, then launch an EC2 instance(s) to do the prediction
# notice this instance count is set higher, this can be scaled to whatever number is desirable for quick processing.
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 10 --type "one-time" --launch-specification file://aws/predict/2020/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 10 --type "one-time" --launch-specification file://aws/predict/2015/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 10 --type "one-time" --launch-specification file://aws/predict/2012/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 10 --type "one-time" --launch-specification file://aws/predict/2007/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 20 --type "one-time" --launch-specification file://aws/predict/2002/spec.json
# connect to the EC2 and monitor the process with
sudo tail -f /var/log/cloud-init-output.logCheck prediction progress by monitoring the object count in the target s3 path. There should be 1651 objects for each year when completed.
aws s3 ls s3://njhighlands/geobia/impervious/2020/predicted/ | wc -l
aws s3 ls s3://njhighlands/geobia/impervious/2015/predicted/ | wc -l
aws s3 ls s3://njhighlands/geobia/impervious/2012/predicted/ | wc -l
aws s3 ls s3://njhighlands/geobia/impervious/2007/predicted/ | wc -l
aws s3 ls s3://njhighlands/geobia/impervious/2002/predicted/ | wc -l This process should take roughly 30-60 minutes.
# encode the userdata scripts
openssl base64 -A -in ./aws/aggregate/2020/userdata.sh -out ./aws/aggregate/2020/userdata.txt
openssl base64 -A -in ./aws/aggregate/2015/userdata.sh -out ./aws/aggregate/2015/userdata.txt
openssl base64 -A -in ./aws/aggregate/2012/userdata.sh -out ./aws/aggregate/2012/userdata.txt
openssl base64 -A -in ./aws/aggregate/2007/userdata.sh -out ./aws/aggregate/2007/userdata.txt
openssl base64 -A -in ./aws/aggregate/2002/userdata.sh -out ./aws/aggregate/2002/userdata.txt
# copy the contents of the output text file to the associated spec.json file's "userdata" property, then launch an EC2 instance(s) to do the aggregation
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/aggregate/2020/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/aggregate/2015/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/aggregate/2012/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/aggregate/2007/spec.json
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/aggregate/2002/spec.jsonAfter the above process is finished, you can copy the final compiled output (gzipped) to your local directory
aws s3 cp s3://njhighlands/geobia/impervious/2020/predicted.gpkg.gz predicted-2020.gpkg.gz
aws s3 cp s3://njhighlands/geobia/impervious/2015/predicted.gpkg.gz predicted-2015.gpkg.gz
aws s3 cp s3://njhighlands/geobia/impervious/2012/predicted.gpkg.gz predicted-2012.gpkg.gz
aws s3 cp s3://njhighlands/geobia/impervious/2007/predicted.gpkg.gz predicted-2007.gpkg.gz
aws s3 cp s3://njhighlands/geobia/impervious/2002/predicted.gpkg.gz predicted-2002.gpkg.gz# encode the userdata scripts
openssl base64 -A -in ./aws/mvt/2020/userdata.sh -out ./aws/mvt/2020/userdata.txt
openssl base64 -A -in ./aws/mvt/2015/userdata.sh -out ./aws/mvt/2015/userdata.txt
openssl base64 -A -in ./aws/mvt/2012/userdata.sh -out ./aws/mvt/2012/userdata.txt
openssl base64 -A -in ./aws/mvt/2007/userdata.sh -out ./aws/mvt/2007/userdata.txt
openssl base64 -A -in ./aws/mvt/2002/userdata.sh -out ./aws/mvt/2002/userdata.txt
# copy the contents of the output text file to the associated spec.json file's "userdata" property, then launch an EC2 instance(s) to do the aggregation
aws ec2 request-spot-instances --spot-price "0.4" --instance-count 1 --type "one-time" --launch-specification file://aws/mvt/2020/spec.json
aws ec2 request-spot-instances --spot-price "0.4" --instance-count 1 --type "one-time" --launch-specification file://aws/mvt/2015/spec.json
aws ec2 request-spot-instances --spot-price "0.4" --instance-count 1 --type "one-time" --launch-specification file://aws/mvt/2012/spec.json
aws ec2 request-spot-instances --spot-price "0.4" --instance-count 1 --type "one-time" --launch-specification file://aws/mvt/2007/spec.json
aws ec2 request-spot-instances --spot-price "0.4" --instance-count 1 --type "one-time" --launch-specification file://aws/mvt/2002/spec.jsonThe 2002 imagery is missing a spatial reference. This pre-processing step assigns the projection before running the training and prediction steps.
# encode the userdata scripts
openssl base64 -A -in ./aws/project/2002/userdata.sh -out ./aws/project/2002/userdata.txt
# copy the contents of the output text file to the associated spec.json file's "userdata" property, then launch an EC2 instance(s) to do the projection assignment
aws ec2 request-spot-instances --spot-price "0.8" --instance-count 1 --type "one-time" --launch-specification file://aws/project/2002/spec.json
# confirm 1651 images are present
aws s3 ls s3://njhighlands/imagery/2002/cog/ | wc -l