Scientific Challenge 2025
Predicting Length of Stay and Patient Outcome

Challenge Description

Welcome to the IFMBE Scientific Challenge Competition at the IUPESM World Congress on Medical Physics and Biomedical Engineering 2025 (IUPESM 2025). In this edition, participants are required to predict patients' length of stay and respective clinical outcome based on a set of parameters collected at the time of hospital admission.

The development and validation of practical and reliable risk assessment tools is central for clinical decision-making. At an early stage, they can be used to assist clinicians in stratification strategies, guide the type of intervention, optimize hospital bed monitoring, and matching the intensity of therapy with the individual patient’s risk. A precise risk assessment model or score is decisive in the identification of patients at high risk, in whom invasive strategies may improve the outcome and to identify patients at low risk, to whom little to no benefits are expected from potentially hazardous and costly treatments.

In the present challenge participants should develop a risk tool to estimate the length of stay (in days) and the type of hospital discharge (death or survival).

How to Participate

Participants must follow these steps to join the challenge:

1| Challenge Registration

Register through the challenge website by providing your username, email address, and team name. [Registration Link]
Teams can have up to 5 members, affiliated with academic or research institutions.
Upon registration, you will receive an email with a link to download the training and validation dataset.
Develop your predictive model(s) using Python (preferably version 3.11 or 3.12) to estimate the length of stay and discharge outcome.
Submit your Python script, ensuring it runs correctly on the test dataset. The final score will be computed based on the evaluation metrics.

2| Conference Submission

Submit an overview paper detailing your solution and methodology to the IUPESM 2025 Conference. [Conference Website]
Attend the conference and present your work.
At least one team member must be registered and attend the conference.

Important Notes:

The registered user is responsible for ensuring that the downloaded datasets are not disclosed to third parties.
Participants must not submit their solutions to other conferences or journals before the end of the IUPESM 2025 conference.
A review article summarizing the best solutions will be submitted to a prestigious journal, co-authored by challenge participants.
By registering, you confirm your agreement to participate in the challenge and adhere to all rules and guidelines:

"As a representative of the team [Your Team Name], confirm our agreement to participate in the IUPESM Scientific Challenge 2025. We acknowledge and agree to adhere to all the rules and guidelines set forth by the organizers. Our team consists of [Number of Members] members affiliated with [Your Institution]. We understand that the dataset provided is for research purposes only and will not be disclosed to third parties."

Dataset Description

The dataset was collected under the Project Regula SESAP-RN/FUNCERN, grant number 69/2021, from October 2021 to January 2024, by the of Technological Innovation in Health (LAIS) at the Federal University of Rio Grande do Norte (UFRN) in cooperation with the Secretary of Public Health of Rio Grande do Norte, Brazil.

For this challenge, a subset of N = 13.415 records was considered, each containing 20 input features and two target variables (outputs). The features are categorized into four main groups of information: i) Request, ii) Admission, iii) Patient data, and iv) Clinical information (collected at hospital admission) (see dataset description).

Group	Feature	Description / Units
Request	Request date	Date on which a bed request was registered
Request	Request type	Age group of bed requested for the patient: {Adult, Pediatric}
Request	Requested bed type	Type of occupancy: {Ward, Intensive Care Unit}
Admission	Admission date	Date when patient was admitted
Admission	Admission bed type	Type of bed: {Ward, Intensive Care Unit}
Admission	Admission Health unit	Hospital admitting the patient
Patient	Gender	{Female, Male}
Patient	Age	Patient age
Patient	Patient’s federal unit	Acronym for the federal unit
Clinical	ICD code	ICD-10 diagnosis code
Clinical	Blood pressure	Systolic/Diastolic (mmHg)
Clinical	Glasgow Coma Scale	Scale from 3 to 15
Clinical	Hematocrit	Units (%)
Clinical	Hemoglobin	Units g/dL
Clinical	Leukocytes	Unitscells/mm³
Clinical	Lymphocytes	Units (%)
Clinical	Urea	Units mg/dL
Clinical	Creatinine	Units mg/dL
Clinical	Platelets	Units 10³/μL
Clinical	Diuresis	Units mL/day

Note: Some of the variable values in the dataset may be inaccurate. The reliability and accuracy of the data is not guaranteed.

The target variables are:

Group	Feature	Description / Units
Target	Outcome type	Possible values are: {Death, Survival}
Target	Length of stay	Number of days in hospital

Disclaimer: The dataset used in this challenge is for research purposes only and should not be used for commercial applications.

Evaluation Criteria

Model performance is assessed based on three scores:

DTscore (Discharge Type Score): With respect to the type of discharge, the underlying score (DTscore) will be used to evaluate the classifier’s performance. This metric returns a real value in the interval [0,…,10] and is based on the F1 score, as follows:

F1-score = 2 * (Precision * Recall) / (Precision + Recall)
where:
  Precision = TP / (TP + FP)
  Recall = TP / (TP + FN)

DTscore = 10 * (1 - F1-score)

TP: The number of death discharges that the model correctly classifies as death.
FP: The number of survival discharges that the model incorrectly classifies as death.
FN: The number of death discharges that the model incorrectly classifies as survival.
TN: The number of survival discharges that the model correctly classifies as survival

Note: In the case of a completely failed model, the DTscore will be 10, while In the case of a perfect stratification the DTscore will be zero.

LSscore (Length of Stay Score): The length of stay score (LSscore) is represented by a real value in the interval [0.0,…,10.0]. It is computed based on the mean absolute error, as follows:
```
LSscore = min(10, (1/N) * Σ |True_LS - Estimated_LS|)

True LS: true values for the length of stay
Estimated LS: estimated values for the length of stay
N: number of records
```
Note: For all LSerror > 10, the LTscore will be 10, while for a perfect match the LSscore will be zero.

Global Score (GLscore): The summation of DTscore and LSscore, calculated as:
```
GLscore = DTscore + LSscore            
```
In the case of a perfect match, the GL score will be zero. If the predictions completely fail, the GL score will be 20.

As a result, the team with the lowest GLscore will win the competition.

Final submissions must include a report and an executable or prediction file.

Dataset Files and Examples (Training / Validation / Testing)

1| Description

For reading the training and testing datasets please consider the following folder structure.

…\IUPESM2025\trainData.csv
…\IUPESM2025\valData.csv
…\IUPESM2025\testData.csv

Note: The trainData.csv and valData.csv file will be provided in CSV format.

The testData.csv dataset has the same structure as valData.csv but contains a different number of examples (records). The testData.csv file will not be available to participants and will be used to compute the global score.

2| Example in Python

We assume that your main program is in the same folder as the data files.

2.1| Training and Validation

Using the trainData.csv and valData.csv you should develop a model to estimate the length of stay and the type of discharge. The result of your program must consists of the model used and two vectors of dimension N, in csv format, to be save in the same folder as:

DTestimation.csv: Vector of dimension N with the estimated discharge type (Survival, Death).
LSestimation.csv: Vector of dimension N with the estimated length of stay (days).


            import pandas as pd
            import numpy as np
            from sklearn.linear_model import LogisticRegression, LinearRegression
            
            # Data Loading
            df = pd.concat([pd.read_csv("train_dataset.csv"),
                            pd.read_csv("val_dataset.csv")])

            # Input features and target variables
            X = df[features]
            y_DT = df['Outcome']
            y_LS = df['LengthOfStay']
            
            # Model Training and Validation
            model_DT = LogisticRegression().fit(X, y_DT)
            model_LS = LinearRegression().fit(X, y_LS)
            
            # Models Results
            np.savetxt("DTestimation.csv", model_DT.predict(X), fmt='%d', delimiter=',')
            np.savetxt("LSestimation.csv", model_LS.predict(X).round(0), fmt='%d', delimiter=',')

2.2| Testing

Your program should be placed in the ..\IPUEMS25\ folder, the same folder where the testData.csv data file is located.

1. Your script must be able to calculate the model based on trainData.csv and valData.csv (or load a previously trained model) and make predictions using the features of the test file.
2. Your software should compute two vectors of dimension N, where N is the number of records in the test dataset.
3. The vectors should be saved in CSV format in the same folder: LSestimation.csv and DTestimation.csv.
4. Based on these vectors we will compute GLscore.


            import pandas as pd
            import numpy as np
            import joblib
            
            # Data Loading
            df = pd.read_csv("test_dataset.csv")
            X = df[features]
            
            # Model Loading
            model_DT = joblib.load("discharge_model.pkl")
            model_LS = joblib.load("lengthstay_model.pkl")
            
            # Model Predictions
            np.savetxt("DTestimation.csv", model_DT.predict(X), fmt='%d', delimiter=',')
            np.savetxt("LSestimation.csv", model_LS.predict(X), fmt='%d', delimiter=',')

Contact Information

If you have any questions, contact us at:

Email: sciupesm2025@dei.uc.pt

Official Website: IUPESM 2025

Frequently Asked Questions (FAQ)

Anyone affiliated with an academic or research institution can participate. Teams can have up to 5 members.

Send an email to sciupesm2025@dei.uc.pt with your team name, full names of all members, institutional emails & affiliations, and confirmation of agreement to participate.

The registration deadline is July 18, 2025.

The data becomes available as soon as the team's application is accepted. Participants will receive the data by email.

Submissions will be evaluated using three scores: DTscore (based on F1-score for outcome prediction), LSscore (mean absolute error for length of stay), and GLscore (the sum of both).

Good Luck!

Important Dates

Contents

Scientific Challenge 2025
Predicting Length of Stay and Patient Outcome

Challenge Description

How to Participate

1| Challenge Registration

2| Conference Submission

Important Notes:

Dataset Description

Evaluation Criteria

Dataset Files and Examples (Training / Validation / Testing)

1| Description

2| Example in Python

2.1| Training and Validation

2.2| Testing

Contact Information

Frequently Asked Questions (FAQ)

Important Dates

Contents

Scientific Challenge 2025 Predicting Length of Stay and Patient Outcome

Challenge Description

How to Participate

1| Challenge Registration

2| Conference Submission

Important Notes:

Dataset Description

Evaluation Criteria

Dataset Files and Examples (Training / Validation / Testing)

1| Description

2| Example in Python

2.1| Training and Validation

2.2| Testing

Contact Information

Frequently Asked Questions (FAQ)

Scientific Challenge 2025
Predicting Length of Stay and Patient Outcome