Healthcare-associated infection (HCAI) prediction

This repository contains files about the article "Healthcare-associated infection prediction on hospitalized COVID-19 patients" submitted to the Brazilian Congress on Biomedical Engineering. In this repository, you can find a DatasetsConstruction folder containing the files used in the datasets construction. In the leading directory of this repository, there is the "ML_notebook," which provides training and testing of machine learning models and SHAP analysis.

RawDatasets

The raw data is available in three ".csv" files, containing data from patients who were tested for COVID-19 from 2020-02-26 to 2020-12-29:

(1) "HSL_Pacientes_3.csv": Spreadsheet with anonymized data about patients;
(2) "HSL_Exames_3.csv": Respective laboratory test results, including the anonymized patient identifier and a service identifier;
(3) "HSL_Desfechos_3.csv": Outcomes - each record includes describes a patient's care, and the corresponding outcome, when applicable.

This data was downloaded from the COVID-19 Data Sharing FAPESP open repository (https://repositoriodatasharingfapesp.uspdigital.usp.br/).

DatasetsConstruction

In the "DatasetConstruction" directory, you can check the notebooks used to connect the raw datasheets. These notebooks are:

"Covid_exploratory_analysis.ipynb": In this file, all patients diagnosed with COVID-19 via PCR are identified and a first patient filter is performed. At the end of the code, the following spreadsheet is generated: "lista_covid_positivito.xlsx";
"Patients_selection.ipynb": This file checks positive and non-positive patients for hospital infections. At the end of the code, the following spreadsheets are generated: "pacientes_positivos.xlsx" and "pacientes_negativos.xlsx";
"Datasets_construction.ipynb": In this dataset, we select the desired variables for the research datasets. First, we discarded variables with more than 50% null values. Next, we perform manual attribute selection. We did a manual selection with a doctor specialized in the field. In manual selection, we discard variables that are correlated with each other. In the end, two datasets were generated: "original_dataset.xlsx" and "discreted_dataset.xlsx".

The "DatasetsConstruction" directory also has the following files:

"exames_COVID19_considerados.xlsx" and "exames_infeccoes.xlsx": These are manually constructed spreadsheets that contain the list of tests considered for the COVID-19 and hospital infection diagnosis, respectively;
"negative_patients.xlsx" and "positive_patients.xlsx": these files were generated in "Patients_selection.ipynb" and refer to patients' labels as negative and positive, respectively;
"agrupado.xlsx" is a spreadsheet containing the reference values for each exam. These values were taken from the original data, and it is used to date discretization in the "Datasets_construction.ipynb";
"describe.xlsx": This spreadsheet contains a statistical description of the variables selected in the "Datasets_construction.ipynb";
"describe_2.xlsx": It is the summarized spreadsheet of "describe.xlsx".

Citation

@InProceedings{10.1007/978-3-031-94934-0_7, author="Loyola Lopes, Filipe and Ferreira, P. R. A. and Lorena, A. C.", editor="Soares, Alcimar Barbosa and Leoni, Renata Ferranti and Cardoso, George Cunha", title="Healthcare-Associated Infection Prediction on Hospitalized COVID-19 Patients", booktitle="XXIX Brazilian Congress on Biomedical Engineering - Volume 3: Biomedical Informatics, and Biomedical Signal and Image Processing ", year="2025", publisher="Springer Nature Switzerland", address="Cham", pages="64--74", isbn="978-3-031-94934-0" }

Available on: https://link.springer.com/chapter/10.1007/978-3-031-94934-0_7

Acknowledgements

This research was supported in part by the Coordenação de Aperfeiçoamento de Pessoalde Nível Superior - Brasil (CAPES) - Finance Code 001. The authors gratefully acknowledge the Brazilian funding agency FAPESP (Fundação Amparo à Pesquisa do Estado de São Paulo) for the COVID-19 Data Sharing repository.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
DatasetsConstruction		DatasetsConstruction
LICENSE		LICENSE
ML_notebook.ipynb		ML_notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare-associated infection (HCAI) prediction

RawDatasets

DatasetsConstruction

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

filipeloyola/HCAIprediction

Folders and files

Latest commit

History

Repository files navigation

Healthcare-associated infection (HCAI) prediction

RawDatasets

DatasetsConstruction

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages