Supcrtbl Notes: Asynchronous workflow and Quirks #24
Knguyen-dev
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Supcrtbl Notes
Folder Structure
supcrtbl_fastapi/
├── main.py # FastAPI application ; Will be this folder
├── uploads/ # Temporary directory for uploaded files and calculations; where our unique experiment folders will be created
└── supcrtfiles/ # Directory containing SUPCRTBL binary and data files; this sounds good at least
├── supcrtbl # SUPCRTBL binary itself.
├── dpronsbl.dat # Data file
└── dpronsbl_all.dat # Data file;
Inside uploads, you'll have a directory for each unique supcrtbl job/experiment. You can call them jobs/experiments, they're the same thing, so if you see something like
job_dirbeing referred to as the "experiment directory" or "unique experiment directory", or the "working directory", they refer to the same thing. In this unique experiment directory, you'll have "bin" which holds a copy of your data files insupcrtfiles. You will have a copy of thesupcrtblbinary in the unique experiments directory. Anoutputdirectory, which contains all the outputted files, like zip files or files generated by the binary.How it all works
Supcrtbl can take up to 2 minutes to run. Anything over that should be invalid. In the original implementation, users should submit their files, and while most requests are quick, there could be times when they'd wait a while. In Phreeqc, that could be up to 20 minutes. In our new implementation, here's the workflow:
POST /api/supcrtbl. This validates user input mostly, and it schedules a supcrtbl job to be run asynchronously. As a result, the user can immediately get feedback that their job was submitted..lockfile within the job directory, which will contain information about the status of the job, the message related to the status, and the timestamp of when we last updated the.lockfile. We basically run the binary and update the lock file at key points within the process. So if the process fails, we indicate it by updating the lock file with a status of "error", alongside an appropriate error message. If it was successful, we would update the.lockfile. A separate thread is created for the job, so we're multithreading. We also keep a dictionary containing the subprocesses that are currently running, which is also used for cleaning up any running processes that slipped through the cracks and are somehow still running. Though I could argue that it may not be needed and that the code is airtight, but you never know.GET /api/supcrtbl/status/<experiment_id>(short polling). If the job is still running, it will still polling. However if the job completed (e.g. timeout, completed, error) it will stop polling. If it was completed successfully (completed), we will doGET /api/supcrtbl/results/<experiment_id>.GET /api/supcrtbl/download/<experiment_id>.Details About Supcrtbl Binary
The Supcrtbl binary was a program developed in 1991, but then remastered in 2016. It's a menu program that has many different branches and scenarios that you can go down that makes interfacing with it a little bit tricky, and overall the way it's handled is tricky. At the point of writing we don't have the source code, but here's what I've learned about it.
It's not a dynamic executable. So it's different from the executables like the CO2 or Phreeqc binary. As a result, we don't need any extra libraries or such to run the binary.
It's very specific about the folder structure and environment you need to run it in. It's ideal that you run the binary in a job directory. This will allow the binary to generate outputfiles in that job directory. If you run the binary in a directory above where the binary currently is, you will have trouble with finding files such as the
dpronsbl.datanddpronsbl_all.dat.The usual answer would have been that it's interpreting the file names relative to where you're running the binary from, not where the binary is located, or where the data file is located. However if that was the case, then why did the first scenario not work? Well without knowing the source code, there's only so much we can draw. However my leading hypothesis is that this old program expects the .dat file to be in the same directory or somewhere below. In the original source code, they made things work by running the binary in the current job directory and also by setting up the data files within the
binthat's in the job directory.I should also mention that the input doesn't accept inputs with paths. For example:
As well as this, the binary generates a
zero.datwhich is a generic data file that contains 0. Just ignore that, it's just a byproduct, and frankly I don't know what it does yet. As well as this, the binary can run into some errors every once in a while due to the user input. Especially with things such as thetempRange, if thetempRangeis invalid, then the program is going to keep asking for a validtempRangeand its going to get stuck. Or maybe the reactions that the user entered were wrong or not supported, our program doesn't handle that case yet (in the for loop).Questions and uncertainties:
The use of the data files in Supcrtbl is strange. In the original implementation they had this code:
The strange conflict comes when you realize the frontend only sends over
slopFilevalues ofdpronsblordpronsbl_ree. So I don't know howdpronsbl_allfits into this equation.Also I don't know why we take off the file extension
.dat, but again, we should realize that in Linux, and in general, file extensions don't matter. But that first issue is still very critical: what file extensions should we use?Elements and SupcrtblOnline
The dropdown in SupcrtblOnline may be missing some elements that Dr. Zhu wants. I know he spoke to me about why H2O isn't listed. From my investigations, the queries are fine and they align with the old implementation of the application. Just putting it here for historical reasons though because it just didn't exist in the database.
Extra stuff: Error handling
Don't forget that the Supcrtbl and phreeqc fortran binaries can throw errors if they get bad input. I know for sure that I forgot to handle like sending back a status 500 when this happens. Basically in the supcrtbl_process.py, after the job has been scheduled and after it starts running, you may want to make sure that the actual error messages are logged and that we just return back human readable error messages.
Beta Was this translation helpful? Give feedback.
All reactions