This repository contains instructions for setting up a new Nero instance. First, you'll create a new instance with the gcloud command. Then, you'll run a script to install and set up tools that we commonly use.
NOTE: These instructions are in bash and thus for Mac and Linux users. If you are a Windows user, you'll either need to adapt these instructions for PowerShell or use Windows Subsystem for Linux (WSL).
For example, to create my-instance on the Nero project som-nero-phi-my-project, set the bash variables INSTANCE_NAME and PROJECT_ID:
# CHANGE THIS TO THE NAME YOU WANT FOR YOUR INSTANCE
INSTANCE_NAME="my-instance"
PROJECT_ID="som-nero-phi-my-project"Additionally, set ZONE, MACHINE_TYPE, DISK_SIZE, IMAGE_NAME, and IMAGE_PROJECT, changing any values you want to adjust.
ZONE="us-west1-c"
# see all machine types with:
# gcloud compute machine-types list --zones="$ZONE"
# 8 vCPUs (4 cores) and 30 GB RAM
MACHINE_TYPE="n1-standard-8"
DISK_SIZE="200" # in GB
# Recommended as the setup script assumes this OS
IMAGE_NAME="ubuntu-2404-noble-amd64-v20241004"
IMAGE_PROJECT="ubuntu-os-cloud"Then, run the gcloud compute instances create command below:
# Create instance with above specs
gcloud compute instances create "$INSTANCE_NAME" \
--project="$PROJECT_ID" \
--zone="$ZONE" \
--machine-type="$MACHINE_TYPE" \
--network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY \
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform \
--tags=ssh \
--create-disk=auto-delete=yes,boot=yes,device-name="$INSTANCE_NAME",image="$IMAGE_NAME",image-project="$IMAGE_PROJECT",mode=rw,size="$DISK_SIZE",type=pd-balanced \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--labels=goog-ec-src=vm_add-gcloud \
--reservation-affinity=any(Note: this gif has been trimmed, so creating the instance will likely take longer than in this clip.)
Note that it may take a moment for the server to initialize before you can connect. Then connect to the server via ssh with:
gcloud compute ssh --zone "$ZONE" "$INSTANCE_NAME" --project "$PROJECT_ID"When you've successfully SSH'd into the server, run the installation script:
curl -fsSL https://github.com/StanfordHPDS/gcp_setup_script/releases/download/v1.1.2/setup.sh | bashThis process will take several minutes to run.
After the script as completed, the server will reboot to finish updating the Linux kernel. This is also intended to finish updating the paths for all the new software. Eventually, this will disconnect you.
It will take a few moments for the server to reboot.
Log back in with the ports for VS Code and RStudio open.
gcloud compute ssh --zone "$ZONE" "$INSTANCE_NAME" --project "$PROJECT_ID" \
-- -L 8787:localhost:8787 -L 8080:localhost:8080To update the software on an existing instance, SSH into your server and run:
curl -fsSL https://github.com/StanfordHPDS/gcp_setup_script/releases/download/v1.1.2/update.sh | bashThis will update system packages, R, Quarto, RStudio Server, VS Code, DuckDB, and development tools. Unlike the setup script, no reboot is required.
Use uv to manage Python versions on a per-project basis. See the Using the instance section below for more information.
You may also want to manage R on a per-project basis with rig and the renv package.
You can also update specific components only:
# Update only RStudio Server
curl -fsSL https://github.com/StanfordHPDS/gcp_setup_script/releases/download/v1.1.2/update.sh | bash -s -- --rstudio
# See all options
curl -fsSL https://github.com/StanfordHPDS/gcp_setup_script/releases/download/v1.1.2/update.sh | bash -s -- --helpWe recommend stopping the instance when you are not using it to save costs.
Each instance has the most recent versions of Python and R available for Ubuntu 24. Both pip and install.packages() use Posit Public Package Manager to install binaries for packages. Additionally, the instance has Quarto, Docker, conda, ruff, sqlfluff, uv, duckdb, gh, TinyTeX, and Rust installed, as well as a number of common system libraries used in data science packages.
If you think another tool should be included in the default setup, please file an issue or pull request.
Authorize your GitHub credentials with
gh auth loginAnd tell git who you are
git config --global user.name "Jane Doe"
git config --global user.email "jane@example.com"You should be able to connect to BigQuery on the instance without authorization. For new code, prefer connecting without explicit authorization.
However, if older code you are running expects a credentials file, you can create one with:
gcloud auth application-default loginNote where the file is created in case you need to reference it.
We use uv as our primary Python package manager. It's fast and automatically manages virtual environments. Here's how to use it:
# Create a new project
uv init my-project
cd my-project
# Pin a specific Python version (creates .python-version file)
uv python pin 3.12
# Add packages
uv add pandas numpy scikit-learn
# Run Python scripts
uv run script_name.py
# Run Quarto documents
uv run quarto render document.qmdEach project automatically gets its own isolated environment—no manual activation is needed. The Python version is controlled by the .python-version file in your project directory.
To see available Python versions:
uv python listNote: Conda is still installed on the instance for older projects that require it. The base conda environment is set not to auto-activate.
VS Code (Browser) (http://localhost:8080/)
VS Code should now be running. If you open http://localhost:8080/, you'll get a start up message that tells you where the credential file is. You can see the password with
cat /path/to/the/file/code-server/config.yamlMake sure to replace the path with the path in the startup message.
The Python, Quarto, and Jupyter extensions are already installed.
We also recommend activating a Python interpreter for your session, ideally matching the uv environment for your project. This allows the different spaces (Quarto, IPython, etc.) to use the same Python interpreter.
First, use CMD/CTRL + Shift + P to open the command palette and search for the Python interpreter option from the Python extension.
Then pick the environment and interpreter you want. For uv projects, look for the .venv directory in your project folder.
RStudio Server (http://localhost:8787/)
RStudio Server should now be running. You'll need to add a user for yourself. Run
sudo adduser your_username
and follow the prompts. Then, visit http://localhost:8787/ and enter the credentials you just created.
RStudio is configured to run R in a blank slate by default.
Instead of using the browser-based IDEs, you can connect your local VS Code or Positron to the instance via SSH.
First, set your default project and configure SSH for your GCP instances:
gcloud config set project "$PROJECT_ID"
gcloud compute config-sshThis adds all your instances to ~/.ssh/config. You can then connect using the hostname format: instance-name.zone.project-id.
Note that by default, GCP assigns emphemeral external IP addresses to instances. That means when you stop and restart an instance, it will likely get a new external IP address. To make this work whenever you restart your instance, you will need to edit the ~/.ssh/config file:
- Find your instance in the format
instance-name.zone.project-id - Remove all but the
IdentityFilefields - Add this command (replacing
YOUR_PROJECT_IDandYOUR_ZONEwith the appropriage project ID and zone, respectively)
ProxyCommand gcloud compute ssh instance-name --command "nc 0.0.0.0 22" --project YOUR_PROJECT_ID --zone YOUR_ZONESuch that your config entry is something to the effect of
Host instance-name.zone.project-id
IdentityFile /Users/your_user_name/.ssh/google_compute_engine
ProxyCommand gcloud compute ssh instance-name --command "nc 0.0.0.0 22" --project YOUR_PROJECT_ID --zone YOUR_ZONEThen, in your local IDE:
VS Code: Use the build-in SSH feature to connect to the hostname in the config file generated by gcloud.
Positron: Use the built-in Remote SSH feature to connect to the same hostname.
Both IDEs handle port forwarding automatically, giving you the same development experience as working locally. While they both include built-in SSH tools, they use different ones and so work slightly differently. See the documentation.
gcloud storage allows you to work with Cloud Storage, including buckets. To download data from a bucket, use gcloud storage cp:
gcloud storage cp gs://name_of_bucket/path/to/data ~path/to/data/on/instanceFind the name of the disk for your instance using gcloud:
gcloud compute disks list --project="${PROJECT_ID}"Then, resize it with gcloud compute disks resize. For instance, to change the disk your-disk-name to be 234 GB, I would run this command:
gcloud compute disks resize your-disk-name --size=234GB --zone="${ZONE}"See the gcloud documentation for more details.
After you've resized, you may need to resize it on the instance, too. SSH into your instance, then run this command in the terminal for information on your disks:
df -hIf the disk doesn't have approximately the same size you resized to, run:
sudo resize2fs /name/of/diskWhere /name/of/disk is the name listed in df -h.
Run df -h again to confirm the disk is resized. You may need to stop and restart the instance for the changes to take effect.
The default value for the MACHINE_TYPE is "n1-standard-8", a machine that has 8 vCPUs (4 cores) and 30 GB RAM. You can change the machine type after creation by stopping the instance, running gcloud compute instances set-machine-type, and restarting the instance.
For instance, if "n1-standard-8" suits most of your needs but you occassionally need to do more intensive computation, you could temporarily change the instance to use "n1-highmem-32", which has has 32 vCPUs and 208 GB RAM.
INSTANCE_NAME="your-instance-name"
ZONE="your-instance-zone"
# Example: Change to n1-highmem-32
NEW_MACHINE_TYPE="n1-highmem-32"
# Stop the instance
gcloud compute instances stop "$INSTANCE_NAME" --zone="$ZONE"
# Change the machine type
gcloud compute instances set-machine-type "$INSTANCE_NAME" \
--zone="$ZONE" \
--machine-type="$NEW_MACHINE_TYPE"
# Start the instance
gcloud compute instances start "$INSTANCE_NAME" --zone="$ZONE"If you've set the machine type to a high-compute type for a temporary computation, be sure to change it back to the original one when you are done to save costs.







