js2-gateway technical overview and onboarding #38
Knguyen-dev
started this conversation in
General
Replies: 1 comment
-
|
Of course there could be some inaccuracies. It could definitely do with some visuals like graphs and whatnot, but this should probably be used for onboarding new people on what js2-gateway truly is. Yeah I just figured out the new js2-gateway runs on jet_stream remote. Just go there navigate to a folder called "Calloway" and you should find it. I should probably that that after making a commit and stuff builds, you'll go to the jet_stream, pull down changes, and do Note: # Destroys current stack and redeploys it; I don't think this is what Calloway intended, but it works. The reason I resort to doing this
# is because I read that watchtower in theory should poll every 5 minute for new builds, but after waiting 15 minutes nothing happened.
docker stack rm js2-gateway-stack
make deploy
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Js2-Gateway Introduction and Overview
Introduction
What is JS2-Gateway?
JS2-Gateway is a web-app that contains various software ("calculators") created by Zhu Laboratory to help other Geochemical modeling researchers. They have many calculators, but I guess the most important ones are "Supcrtbl" (Super Critical Table) and maybe "Phreeqc". Here you'll be maintaining the website and adding new features.
Project Setup
uv syncon the backend. This should setup your backend.npm installon your frontend. This should setup your frontendYou should get the
.envand.env.devfiles from someone via IU Secure.General tips working with build tools
Backend and Frontend build tools
For the backend you're using the UV build tool. I recommend watching a youtube video to get familliar with how to use it. Other than that there's a
pyproject.tomlfile, which you're unlikely to touch. Right now it just contains stuff like formatting configs for the Python formatter that we useRuff.The frontend should be pretty simple. You're just running commands defined in
package-lock.json. Again if you're intending to run a development environment, remember to open127.0.0.1:4000instead oflocalhost:4000since the former opens stuff in a dev and admin environment, no need to login to use protected routes.Makefile
Before we talk about containerization, we should talk about the
Makefile. You're going to run commands defined in this file to run the project as it's the centralized place to spin up the application, look at logs, exec into containers, even start the frontend, etc. If you haven't worked with Makefiles before, I recommend reading up on it. It's really helpful if you're able to understand what each command is not only doing, but also be able to write your own commands if need be.Connecting to MySQL Database with Ivanti Secure
Essentially if you try to connect to the MySQL database on your local machine, you're going to get an error saying something along the lines of "Service Not Found". The reason being is that the MySQL database is located within IU's internal intranet, rather than the broader internet, so if you look it up, it's not going to appear. Solve this by connecting to IU's VPN. After connecting to IU's VPN, you will be able to access the js2 MySQL database:
vpn.uits.iu.edusasrdsmp01.uits.iu.eduAfter doing this, you should have database capabilities for the backend, which is really helpful. You should get the database credentials from the person who onboarded you.
Other onboarding info and standards
System Architecture Overview
High Level Architecture
Backend Architecture
API Routes
The API structure is subject to change, but you can always navigate to the
http:127.0.0.1:4000/docsroutes to see the endpoints that the application has:/api/co2): Handles Co2 calculations./api/h2s): Handles H2S calculations./api/phreeqc/*): Handles schedulling phreeqc sim, checking status, getting the results, downloading output files./api/rate/calculate): Handles rate calculator calculations./api/supcrtbl): Handles scheduling supcrtbl jobs, getting results, download output files, and getting status. There's also an endpoint for seeing the logs, but it's not really for public use, moreso for debugging and stuff. May remove it or at least make it better./auth/*): OIDC and authentication routes. There's also/callback, which you shouldn't change this because that's what's registered with CILogon, our OIDC platform./api/species): For getting mineral species, which is need in multiple places. Right now it's needed for Supcrtbl and RateCalculator.Database Schema and User Management
Here are the main tables that you need to know:
There are of course other tables that we use. For example, in the minerals API, we access MySQL tables like
geosci_consolidated_tables1and its associates. Honestly it's not that important to know these, as they moreso have scientific meanings and whatnot, and you'd probably ask the scientific staff at Zhu Lab if we need to use them. The main table you need to know is theuser_detailstable, which we have shown above.Most of the user management such as authentication flow, onboarding, will be covered in a different section.
Managing Calculators
There are different things to note about the calculators:
Frontend Architecture
Page Structure
For the calculators, each calculator has a landing page that describes what the calculator is about and other links (e.g.
CO2Calculator). Then there's the actual user input page where the user enters parameters into the calculatorCO2CalculatorOnline. For calculators that have long-running tasks, you'd also have a page where the user does short-polling to get the status of the job, and eventually get the result of the job.CO2Calculator: Files for the CO2 calculatorH2SCalculator: Files for the H2S calculatorPhreeqc: Files for the Phreeqc calculatorRateCalculator: Files for the rate calculatorSupcrtbl: Files for the Supcrtbl calculatorAdminPage: Page for the admin dashboard.Home: Home PageOnboarding: Page where the user sees their onboarding information and confirms their profile.RateScripts: Page linking to some visual basic rate scripts. This isn't a calculator, but it's just a page to link to other scientific stuff.AuthContext: Defines theAuthProviderwhich holds the authentication state of the user and handles persistence.There's also a
/componentspage where we store React comopnents that we re-use in the application.Frontend Libraries and Standards
Async Jobs: Handling Long Running Tasks in a Web Service
A simulation for Phreeqc can take from a couple of seconds to up to 20 minutes to yield results, depending on the actual inputs. This is an issue because HTTP requests time out after a few seconds, so keeping the user's request in memory for that amount of time would just timeout the request. To fix this issue, we'll have to use different methods and techniques other than a basic synchronous request-response flow.
When the user does
POST /api/phreeqc, assuming that their input is correct, we'll run the binary asynchronously. So even if the request is finished, the simulation will still be running on our server. After we start the Phreeqc job, we need to make sure that we return theexperiment_idassociated with the experiment. This is crucial for later, as it's used to identify the specific experiment simulation that was run, and also the location of the files related to that experiment. The app should then redirect the user to a results page like/phreeqc/results/:experiment_idor something similar. Here, the client will occasionally poll our backend to see if the simulation has finished using the endpoint undercheck_status. Once the phreeqc process is listed in a finished state, we'll send back the data to that page. That's the entire high-level overview of what happens. For other binaries and simulations that are long-running, we're going to use this method to get things done. This is all done inphreeqc_interceptor, the high-level route handler, but let's explore how we're running phreeqc asynchronously withinprocess_manager.py.When starting a simulation, we first create a
.lockfilethat will contain the status of the process that's currently running. We'll use multi-threading by creating a separate thread for runningrun_phreeqc_process. This separate thread handles updating the experiment's status to things like "running" (via modifying the.lockfile), and also handles sending (piping) data to the subprocess. Based on various scenarios, we write to that.lockfilewith different statuses:phreeqcthis is about 20 minutes.We keep a dictionary
running_processesthat's used to keep track all the running phreeqc simulations. It contains the reference to a subprocess being run, when the simulation started, and the location of the.lockfileassociated with the experiment. Its main use is so that we have references to all running phreeqc subprocesses, and through this dictionary, we can kill and clean up simulations that are running longer than expected. In the code where we start PhreeQC as a subprocess, if PhreeQC takes more than 20 minutes to complete, then thesubprocesslibrary will kill the process. As a result, we'll remove the process that's recorded inrunning_processes. However, if we run into an unrelated error and exit the function early, then that binary will keep running, not hitting that timeout catch block. However, at least we still have that process recorded in our dictionary! To cover cases like this, we periodically run a job to clean up and kill any long-running phreeqc processes:1. Using the process's start time, if it's longer than 20 minutes, then we will kill that process.
2. Update the lock file to indicate the process timed out.
running_processesto make sure we're correctly keeping track of all running processes only. Again, one of the main purposes of this is that it lets us kill processes that are running longer than usual. Its purpose is different from the.lockfile, which tracks the status of the simulation, even after it's completed or timed out.After getting information that their simulation is working, we expose 3 important endpoints to the client;
GET /api/phreeqc/status/:experiment_id: Gives the frontend a way to check the status of a job. This is the main function that makes use of that.lockfilethat we keep saved as a log of the results of a given simulation.GET /api/phreeqc/result/:experiment_id: Gives the frontend the status of the results of phreeqc, it makes use of reading the experiment's.lockfile.GET /api/phreeqc/download/:experiment_id: Downloads the zip file that was created for this experiment.Other than that, this is the whole technical explanation behind how we handle long-running requests involving these binaries. We'll be using the exact same technique when dealing with Supcrtbl.
Note:
thread.daemon = true: By default, threads in Python are non-daemons, meaning they will prevent the program from exiting until they finish running. This is useful when you need threads to do critical work that must be completed before the program ends. However, in many cases, you don't want the program to wait for background threads to finish. By settingdaemon=True, you are telling the Python interpreter to allow the main program to exit even if the daemon threads are still running. This is great for background tasks. In our code, we have a cleanup thread that periodically checks long-running processes. This is a background task that runs independently of the main program. It's not critical to the main program flow, so set it as a daemon true so it doesn't block the main program. TLDR, setdaemon=Trueto ensure they run in the background without preventing the program from exiting..lockand any experiment-related files are actually stored within the container's file directory themselves. Yeah, there are no bind mounts or volumes for persistence, so experiment-related files are deleted after a container stops running.nextTDB_workdirsonce a week or something.Authentication System
What is OIDC and CILogon
OIDC (OpenID Connect) allows users to log in through third-party identity providers (like Google), but most major providers require you to register and sometimes pay. If you aren't familiar, watch
OAuth2.0 and OpenID Connect (in plain English) - OktaDev before you even proceeed.
Now, CILogon is a free identity platform designed for researchers, universities, and federally recognized institutions. What makes CILogon special is that it supports over 5,000 identity providers, including Google, Microsoft, and various universities and research institutions. However, to use CILogon, your app must be affiliated with an approved institution (like a university or research organization). So it's free, but it comes with that restriction.
Scopes and Claims
When you authenticate with CILogon, you typically request the openid scope.Minimal claims guaranteed under openid:
sub: Unique user ID across the provideriss: The issuer (CILogon)Optional scopes:
email: May provide user’s email (if available)org.cilogon.userinfo: Adds CILogon-specific claims:idp: Identity provider’s unique ID (e.g., Google, IU)idp_name: Human-readable name of the providerThese optional claims may not always be present, which is why our app has an onboarding stage where users fill out these claims when they aren't there. We also have other mechanisms that checks for fields that our app requires such as
email.Authentication Infrastructure and User Journey
1. User Login Flow
User clicks “Sign In” → redirected to
/auth./auth: Begins the OIDC process, redirecting tohttps://cilogon.org/authorize.profile,email,org.cilogon.userinfo.2. Callback Handling (
/callback)Exchange authorization code for access token.
Use token to fetch user info.
On success:
3. Authentication Check (
/auth/me)Read token from cookie; if missing → reject.
Validate token with CILogon userinfo endpoint.
Extract and verify user email.
Query DB for user:
user_authobject.approved: false, etc.), then return response.4. Logout (
/auth/logout)Onboarding Flow
Post-login, users who aren’t onboarded are redirected to
/onboarding./api/user/onboard.Onboarding Handler (
/api/user/onboard)Admin Approval Flow
Admins can update users via
/api/admin/users/:userId.admin,approved, etc.).approved, notify relevant parties.Email Notifications
Deployment Overview
Basics: Containerization & Reverse Proxy
8000inside the container.8000inside the app container using Caddy. We forward traffic fromjs2-gateway.ear180013.projects.jetstream-cloud.orgto port 8000, which is where our containers are located.Caddy + Docker Swarm (Production Setup)
Caddy is a modern web server acting as a reverse proxy with automatic HTTPS/TLS certificate management (no manual setup).
Caddy is configured via container labels and used only in production (via
stack.yml).Docker Swarm is used for:
GitHub Actions (CI)
Located in
.github/workflows/docker-publish.ymlTrigger: Runs on push to
mainor when relevant files are changed.What it does:
latestDocker Swarm + Watchtower (CD)
Deployment file:
stack.ymlWatchtower monitors running containers and:
Swarm deploy config ensures:
--
Deployment Flow Summary
mainlatestimageBeta Was this translation helpful? Give feedback.
All reactions