Skip to content

Recreation model cloud infrastructure

Dave Fisher edited this page Apr 1, 2026 · 7 revisions

Google Cloud VM

For many years the Rec model server process has run on a Google Cloud VM.

The VM hosts the endless process started by natcap.invest.recreation.recmodel_server.execute.

  • See invest/scripts/recreation_server/launch_recserver.sh for starting the process.

The VM was setup following the steps detailed in invest/scripts/recreation_server/setup_vm.sh.

  • It may be best to run the commands in this shell script interactively.

As of March 2025, this VM is running two recmodel_server processes:

A pre-3.15.0 version This version is no longer running.

  • process is started using /usr/local/recreation-server/launch_recserver.sh
  • uses a natcap.invest installation in conda env: /usr/local/recreation-server/invest/env
  • logs to /usr/local/recreation-server/nohup_recmodel_server.txt
  • writes local cache data to /usr/local/recreation-server/recserver_cache_py36
  • cron_find_rm_cached_workspaces.sh: a cron job for deleting week-old workspaces from recserver_cache_py36.

3.15.0 - 3.18.0 - all resources are under /usr/local/recreation-server/invest_3_15_0/

  • invest/ is a clone of the invest repo. invest/env is a mamba environment with invest installed
  • server/ is a workspace for,
    • volume/: a gcs-fuse mounted GCS bucket (see Data section below)
    • flickr/: a local directory containing a build of the global flickr quadtree.
    • flickr/local: a local cache workspace for flickr quadtree queries
    • twitter/: a local directory containing just the index of the global twitter quadtree.
    • twitter/local: a local cache workspace for twitter quadtree queries
    • pyro.log: a logfile for the server process
  • invest/scripts/recreation_server/launch_recserver.sh starts the recmodel_server process.
  • server/cron_find_rm_cached_workspaces.sh: a cron job for deleting week-old workspaces from flickr/local and twitter/local.
    • This is a local copy of invest/scripts/recreation_server/cron_find_rm_cached_workspaces.sh

3.19.0 - ? - all resources are also under /usr/local/recreation-server/invest_3_15_0/

  • This process shares all the same resources as the 3.15.0 - 3.18.0 process, except,
    • natcap.invest is installed into invest_3_19_0/env from source code in invest_3_19_0. Currently, this installs davemfish/bugfix/REC-1950-smaller-data-transfers.
    • invest_3_19_0/scripts/recreation_server/launch_recserver.sh starts the recmodel_server process.
    • The server logs to server/pyro_invest319.log
  • Cache workspaces for this server are the same as for the 3.15.0 - 3.18.0 server, so the same cron job will clear them.

VM Details:

  • GCP Project: NatCap Servers
  • Name: recreation-server-3
  • CPUs: 2
  • Memory: 7.5 GB
  • Boot Disk: 20 GB
  • Additional Disk: 200 GB
  • OS: Debian 12 bookworm

Data

The recmodel server process depends on two large datasets, geotagged Flickr metadata, and geotagged twitter metadata.
recmodel_server.py is designed to build a filesystem-based quadtree spatial index, from each dataset. Whenever the server process starts, it checks for existing quadtrees on the filesystem, and rebuilds from the raw data if needed & available.

Before version 3.15.0, we only used the Flickr dataset. A copy of this data lives locally on the VM:

  • /usr/local/recreation-server/photos_2005-2017_odlla.csv (and compressed: photos_2005-2017_odlla.tgz)
  • 320 million points
  • EDIT: none of the active servers use these data. They could be safely removed from this VM because we also store them on GCS (see below).

Since version 3.15.0, we also use the twitter dataset, which is roughly 100 times larger. So large that it is not feasible to have the recmodel_server process build the quadtree on the VM described above. Instead, the quadtree was constructed on Sherlock and then copied to a GCS bucket. The VM running the recmodel_server mounts this bucket using gcs-fuse, and the args_dict passed to recmodel_server.execute() points to the quadtree index file (pickle) on the mounted volume.

  • GCS Bucket: natcap-recreation
  • The quadtree index file: natcap-recreation/twitter_quadtree/global_twitter_qt.pickle
  • indexes ~20 billion points
  • invest/scripts/recreation_server/build_twitter_quadtree.sh (SLURM script)
  • invest/scripts/recreation_server/build_twitter_quadtree.py (quadtree construction)
  • invest/scripts/recreation_server/copy_quadtree_to_gcs.sh (copy files from Sherlock scratch to GCS)
  • This bucket now also contains the Flickr CSV file mentioned above, and the post-3.15.0 recmodel_server processes point to this file on the mounted volume instead of the local file on the VM. (see invest/scripts/recreation_server/launch_recserver_twitter.py)

Clone this wiki locally