Reference materials for Node Summit 2017 talk 'Tightly Packed Parallelization To Get Happy Hour Back'.
The video lives here.
This should provide some reference for putting together the most basic possible Node.js task scheduler using the 'child_process' module. Additionally, this provides material for putting together a basic distributed system for doing work in parallel with Node.js and Kubernetes. This is about as simple as it gets for getting started doing work in parallel with Node.js.
Our basic Node.js task scheduler utilizes the 'child_process' module to work on multiple tasks in parallel. This version of the task worker forks Node.js child processes off of the parent, but could be easily modified to spawn other executables or run commands as child processes.
The task scheduler can be found at task_scheduler/task_scheduler.js. The task_scheduler/ directory also includes an example that imports the task scheduler and tasks to be run and then executes those tasks in parallel.
Try it out:
cd ./task_scheduler/examplesnpm install(the task_scheduler uses immutable as a dependency)node example1
The scheduler should fork child processes to run tasks while resources available. As tasks exit, resources are refilled and dependencies are honored. Check out the task_scheduler/examples/tasks directory to view/modify the tasks to run in parallel.
- Allow for retries if child_process exits with error.
- Persist stats associated with tasks run and optimize on next run.
The basic distributed worker system is put together to run on GKE. The following are steps to setup and run the distributed scheduler on GKE:
gcloud - https://cloud.google.com/sdk/gcloud/
gcloud -v
docker - https://docs.docker.com/engine/installation/
docker -v
kubectl version
kubetail (optional) - this boss script to tail Kubernetes pod logs (https://github.com/johanhaleby/kubetail)
curl https://raw.githubusercontent.com/johanhaleby/kubetail/master/kubetail > kubetail
chmod +x kubetail
https://cloud.google.com/resource-manager/docs/creating-managing-projects
gcloud config set project PROJECT_ID
Make sure Google Compute Engine API is enabled for project.
- From console menu select 'API Manager'.
- Click blue enable API button.
- Search for 'Google Compute Engine API' and select.
- Click blue enable button.
Make sure Google Container Registry API is enabled for project.
- From console menu select 'API Manager'.
- Click blue enable API button.
- Search for 'Google Container Registry API' and select.
- Click blue enable button.
Update distributed/jobs/filler.yaml and distributed/jobs/worker.yaml image tag replacing [PROJECT_ID] with the id for your gcloud project.
(To list available zones run gcloud compute zones list)
gcloud container clusters create CLUSTER_NAME --zone [ZONE]
docker build -t filler -f ./distributed/jobs/Dockerfile.filler ./distributed/jobs/
docker build -t worker -f ./distributed/jobs/Dockerfile.worker ./distributed/jobs/
docker tag worker gcr.io/PROJECT_ID/filler
docker tag worker gcr.io/PROJECT_ID/worker
gcloud docker -- push gcr.io/PROJECT_ID/filler
gcloud docker -- push gcr.io/PROJECT_ID/worker
kubectl create -f ./distributed/redis-service.yaml
kubectl create -f ./distributed/redis-master.yaml
kubectl create -f ./distributed/jobs/filler.yaml
kubectl create -f ./distributed/jobs/worker.yaml
./kubetail worker -k pod
kubectl delete jobs/worker
kubectl delete jobs/filler
kubectl delete services/redis
kubectl delete pods/redis
gcloud container clusters delete CLUSTER_NAME --zone [ZONE]