Welcome to the IFMBE Scientific Challenge Competition at the IUPESM World Congress on Medical Physics and Biomedical Engineering 2025 (IUPESM 2025). In this edition, participants are required to predict patients' length of stay and respective clinical outcome based on a set of parameters collected at the time of hospital admission.
The development and validation of practical and reliable risk assessment tools is central for clinical decision-making. At an early stage, they can be used to assist clinicians in stratification strategies, guide the type of intervention, optimize hospital bed monitoring, and matching the intensity of therapy with the individual patient’s risk. A precise risk assessment model or score is decisive in the identification of patients at high risk, in whom invasive strategies may improve the outcome and to identify patients at low risk, to whom little to no benefits are expected from potentially hazardous and costly treatments.
In the present challenge participants should develop a risk tool to estimate the length of stay (in days) and the type of hospital discharge (death or survival).
Participants must follow these steps to join the challenge:
The dataset was collected under the Project Regula SESAP-RN/FUNCERN, grant number 69/2021, from October 2021 to January 2024, by the of Technological Innovation in Health (LAIS) at the Federal University of Rio Grande do Norte (UFRN) in cooperation with the Secretary of Public Health of Rio Grande do Norte, Brazil.
For this challenge, a subset of N = 13.415 records was considered, each containing 20 input features and two target variables (outputs). The features are categorized into four main groups of information: i) Request, ii) Admission, iii) Patient data, and iv) Clinical information (collected at hospital admission) (see dataset description).
Group | Feature | Description / Units |
---|---|---|
Request | Request date | Date on which a bed request was registered |
Request | Request type | Age group of bed requested for the patient: {Adult, Pediatric} |
Request | Requested bed type | Type of occupancy: {Ward, Intensive Care Unit} |
Admission | Admission date | Date when patient was admitted |
Admission | Admission bed type | Type of bed: {Ward, Intensive Care Unit} |
Admission | Admission Health unit | Hospital admitting the patient |
Patient | Gender | {Female, Male} |
Patient | Age | Patient age |
Patient | Patient’s federal unit | Acronym for the federal unit |
Clinical | ICD code | ICD-10 diagnosis code |
Clinical | Blood pressure | Systolic/Diastolic (mmHg) |
Clinical | Glasgow Coma Scale | Scale from 3 to 15 |
Clinical | Hematocrit | Units (%) |
Clinical | Hemoglobin | Units g/dL |
Clinical | Leukocytes | Unitscells/mm³ |
Clinical | Lymphocytes | Units (%) |
Clinical | Urea | Units mg/dL |
Clinical | Creatinine | Units mg/dL |
Clinical | Platelets | Units 10³/μL |
Clinical | Diuresis | Units mL/day |
Note: Some of the variable values in the dataset may be inaccurate. The reliability and accuracy of the data is not guaranteed.
The target variables are:
Group | Feature | Description / Units |
---|---|---|
Target | Outcome type | Possible values are: {Death, Survival} |
Target | Length of stay | Number of days in hospital |
Disclaimer: The dataset used in this challenge is for research purposes only and should not be used for commercial applications.
Model performance is assessed based on three scores:
F1-score = 2 * (Precision * Recall) / (Precision + Recall) where: Precision = TP / (TP + FP) Recall = TP / (TP + FN) DTscore = 10 * (1 - F1-score) TP: The number of death discharges that the model correctly classifies as death. FP: The number of survival discharges that the model incorrectly classifies as death. FN: The number of death discharges that the model incorrectly classifies as survival. TN: The number of survival discharges that the model correctly classifies as survivalNote: In the case of a completely failed model, the DTscore will be 10, while In the case of a perfect stratification the DTscore will be zero.
LSscore = min(10, (1/N) * Σ |True_LS - Estimated_LS|) True LS: true values for the length of stay Estimated LS: estimated values for the length of stay N: number of recordsNote: For all LSerror > 10, the LTscore will be 10, while for a perfect match the LSscore will be zero.
GLscore = DTscore + LSscoreIn the case of a perfect match, the GL score will be zero. If the predictions completely fail, the GL score will be 20.
As a result, the team with the lowest GLscore will win the competition.
Final submissions must include a report and an executable or prediction file.
For reading the training and testing datasets please consider the following folder structure.
…\IUPESM2025\trainData.csv …\IUPESM2025\valData.csv …\IUPESM2025\testData.csv
Note: The trainData.csv and valData.csv file will be provided in CSV format.
The testData.csv dataset has the same structure as valData.csv but contains a different number of examples (records). The testData.csv file will not be available to participants and will be used to compute the global score.
We assume that your main program is in the same folder as the data files.
Using the trainData.csv and valData.csv you should develop a model to estimate the length of stay and the type of discharge. The result of your program must consists of the model used and two vectors of dimension N, in csv format, to be save in the same folder as:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression, LinearRegression
# Data Loading
df = pd.concat([pd.read_csv("train_dataset.csv"),
pd.read_csv("val_dataset.csv")])
# Input features and target variables
X = df[features]
y_DT = df['Outcome']
y_LS = df['LengthOfStay']
# Model Training and Validation
model_DT = LogisticRegression().fit(X, y_DT)
model_LS = LinearRegression().fit(X, y_LS)
# Models Results
np.savetxt("DTestimation.csv", model_DT.predict(X), fmt='%d', delimiter=',')
np.savetxt("LSestimation.csv", model_LS.predict(X).round(0), fmt='%d', delimiter=',')
Your program should be placed in the ..\IPUEMS25\ folder, the same folder where the testData.csv data file is located.
import pandas as pd
import numpy as np
import joblib
# Data Loading
df = pd.read_csv("test_dataset.csv")
X = df[features]
# Model Loading
model_DT = joblib.load("discharge_model.pkl")
model_LS = joblib.load("lengthstay_model.pkl")
# Model Predictions
np.savetxt("DTestimation.csv", model_DT.predict(X), fmt='%d', delimiter=',')
np.savetxt("LSestimation.csv", model_LS.predict(X), fmt='%d', delimiter=',')
If you have any questions, contact us at:
Email: sciupesm2025@dei.uc.pt
Official Website: IUPESM 2025
Anyone affiliated with an academic or research institution can participate. Teams can have up to 5 members.
Send an email to sciupesm2025@dei.uc.pt with your team name, full names of all members, institutional emails & affiliations, and confirmation of agreement to participate.
The registration deadline is July 18, 2025.
The data becomes available as soon as the team's application is accepted. Participants will receive the data by email.
Submissions will be evaluated using three scores: DTscore (based on F1-score for outcome prediction), LSscore (mean absolute error for length of stay), and GLscore (the sum of both).
Good Luck!