-
Notifications
You must be signed in to change notification settings - Fork 90
Recreation model cloud infrastructure
For many years the Rec model server process has run on a Google Cloud VM.
The VM hosts the endless process started by natcap.invest.recreation.recmodel_server.execute.
- See
invest/scripts/recreation_server/launch_recserver.shfor starting the process.
The VM was setup following the steps detailed in invest/scripts/recreation_server/setup_vm.sh.
- It may be best to run the commands in this shell script interactively.
As of March 2025, this VM is running two recmodel_server processes:
- process is started using
/usr/local/recreation-server/launch_recserver.sh - uses a
natcap.investinstallation in conda env: /usr/local/recreation-server/invest/env - logs to
/usr/local/recreation-server/nohup_recmodel_server.txt - writes local cache data to
/usr/local/recreation-server/recserver_cache_py36 -
cron_find_rm_cached_workspaces.sh: a cron job for deleting week-old workspaces fromrecserver_cache_py36.
-
invest/is a clone of the invest repo.invest/envis a mamba environment with invest installed -
server/is a workspace for,-
volume/: agcs-fusemounted GCS bucket (see Data section below) -
flickr/: a local directory containing a build of the global flickr quadtree. -
flickr/local: a local cache workspace for flickr quadtree queries -
twitter/: a local directory containing just the index of the global twitter quadtree. -
twitter/local: a local cache workspace for twitter quadtree queries -
pyro.log: a logfile for the server process
-
-
invest/scripts/recreation_server/launch_recserver.shstarts therecmodel_serverprocess. -
server/cron_find_rm_cached_workspaces.sh: a cron job for deleting week-old workspaces fromflickr/localandtwitter/local.- This is a local copy of
invest/scripts/recreation_server/cron_find_rm_cached_workspaces.sh
- This is a local copy of
- This process shares all the same resources as the 3.15.0 - 3.18.0 process, except,
-
natcap.investis installed intoinvest_3_19_0/envfrom source code ininvest_3_19_0. Currently, this installsdavemfish/bugfix/REC-1950-smaller-data-transfers. -
invest_3_19_0/scripts/recreation_server/launch_recserver.shstarts therecmodel_serverprocess. - The server logs to
server/pyro_invest319.log
-
- Cache workspaces for this server are the same as for the 3.15.0 - 3.18.0 server, so the same cron job will clear them.
- GCP Project: NatCap Servers
- Name: recreation-server-3
- CPUs: 2
- Memory: 7.5 GB
- Boot Disk: 20 GB
- Additional Disk: 200 GB
- OS: Debian 12 bookworm
The recmodel server process depends on two large datasets, geotagged Flickr metadata, and geotagged twitter metadata.
recmodel_server.py is designed to build a filesystem-based quadtree spatial index, from each dataset. Whenever the server process starts, it checks for existing quadtrees on the filesystem, and rebuilds from the raw data if needed & available.
Before version 3.15.0, we only used the Flickr dataset. A copy of this data lives locally on the VM:
- /usr/local/recreation-server/photos_2005-2017_odlla.csv (and compressed: photos_2005-2017_odlla.tgz)
- 320 million points
- EDIT: none of the active servers use these data. They could be safely removed from this VM because we also store them on GCS (see below).
Since version 3.15.0, we also use the twitter dataset, which is roughly 100 times larger. So large that it is not feasible to have the recmodel_server process build the quadtree on the VM described above. Instead, the quadtree was constructed on Sherlock and then copied to a GCS bucket. The VM running the recmodel_server mounts this bucket using gcs-fuse, and the args_dict passed to recmodel_server.execute() points to the quadtree index file (pickle) on the mounted volume.
- GCS Bucket:
natcap-recreation - The quadtree index file:
natcap-recreation/twitter_quadtree/global_twitter_qt.pickle - indexes ~20 billion points
-
invest/scripts/recreation_server/build_twitter_quadtree.sh(SLURM script) -
invest/scripts/recreation_server/build_twitter_quadtree.py(quadtree construction) -
invest/scripts/recreation_server/copy_quadtree_to_gcs.sh(copy files from Sherlock scratch to GCS) - This bucket now also contains the Flickr CSV file mentioned above, and the post-3.15.0
recmodel_serverprocesses point to this file on the mounted volume instead of the local file on the VM. (seeinvest/scripts/recreation_server/launch_recserver_twitter.py)