- Authors: Ruben Peters, Welmoed Tjepkema, Eric Wolters, Ingmar Loohuis
- Contact: datascience@umcutrecht.nl
This repo contains the code for the No-Show prediction model, currently implemented at the UMC Utrecht and developed by the AI For Health team. For more information on implemented AI-tools, see https://research.umcutrecht.nl/ai-applications-in-use
Note that all the internal development, including PRs is done in a separate private repo and synced to this repo when a new release is published.
We welcome issues or pull requests! The easiest way to use this repo in your own organisation is to fork the repo. You can then change the data pipelines to fit with your organisation. If you need help, either add an issue or send an e-mail to AI for Health.
To install the noshow package use:
pip install -r requirements.txtOr better use a package manager like uv, a modern Python package manager that simplifies dependency management and ensures reproducibility:
uv syncTo run the entire pipeline from data export to model training, you can use the train_no_show command (or python src/noshow/train_pipeline.py):
train_no_show --skip-export # skip the export step if you already have the dataFor more information on data used, check the dataset card here
Deployment of the Api and streamlit dashboard is handled by the deploy.sh script. Create a .env file with the required variables (see .example.env for reference).
. .env
. deploy.shDeployment is done through the manifest files.
The prediction API is a fastapi application that runs every two hours and gives predictions for all input appointments given the start date. The API expects the complete history of all appointments of a patient to construct the features, but will only return predictions that are on the start_date or later.
The API also saves the prediction and information of the request to a database. Furthermore it will delete all previous rows of sensitive information (name, birthdate, phone number) and only add the sensitive info for the predictions of that day. This way we only store sensitive info for the day in which the patient needs te be called. All other info will be collected and used to validate the results.
To run the API locally run:
python run/app.pyThe calling dashboard is a Streamlit dashboard that will be used by the person who will call the patients. It will show the prediction in 3 working days sorted by decreasing predicted risk and will also include other appointments of those patients between 3 days and 10 days. This way we make sure that a patient is not called multiple times per week. The result of calling the patient will also be stored in the dashboard and will be used to track who needs to be called, as well as validating the outcomes.
To run the dashboard locally run:
streamlit run run/calling_dash.pyThe orchestration of the data flows will be handled by Apache Nifi, a powerful data integration tool that automates the movement and transformation of data between systems. The Nifi-flow requests new data from the dataplatform, adds the authentication API-Key as a header, and sends the request to the prediction API. For more information on Apache Nifi, see the official documentation.