Lessons Learned Building Early Warning Models

I've spent much of the past decade building and evaluating statistical models that predict rapid deterioration in hospitalized patients (early warning score or EWS models), and I've learned quite a bit along the way. Here is a rundown of some of the key recommendations I make for this type of project. If you are starting out on your own EWS model building journey, I would love to brainstorm with you!

1. Carefully consider your target outcome.

Specifically, you need to define the event that the EWS model will be trained to predict so that when it is deployed in a real world setting, and it works, it will actually lead to improvement. Here is an example of what I mean: It can be tempting, when developing EWS models, to train the model to predict ICU transfers because patients who deteriorate rapidly often get transferred to the ICU. The thinking goes: if the model is good at predicting ICU transfers, it must be good at predicting deteriorating patients. To an extent, this will be true, but consider some downsides to this approach: a) If your model is trained to predict historical transfers, it is only learning to predict the transfer decisions that are already being made. Such a model may not add much value over what you are already doing. b) There are times when patients go to the ICU for reasons other than deterioration. Often, patients will be transferred to the ICU to get a procedure (e.g., endoscopy) for diagnostic purposes and will then return to the floor. Unless you can filter these procedural transfers out of your model training data, they will be a source of error.

ICU transfer events are the result of clinical decision making, and if we want to build an EWS model that will improve decision-making, we should avoid using past decisions as the outcome that the model is trained to predict. Clinical decompensation is a physiological process and for this reason, physiological markers (e.g., low systolic pressure / high white blood cell count, etc.) make more sense as an outcome. This will keep the model focused on the state of the patient in a way that will provide a valuable contribution to clinical decision-making.

2. Chose model inputs wisely.

Inputs are the variables that the model will use to try to predict when patient deterioration will occur. We live in a data-driven world, which means that we often have a lot of options available for inputs. This creates many opportunities to go down dead-end roads in the model building process. A lot of hospital data is collected to facilitate billing, and much of this data is revised and cleaned throughout the hospital encounter and beyond. When using this historical data to train a model, it can create problems if the stored data differs from what will be available to the model during the patient encounter. Model building teams need to understand the data archiving process so that they can best understand how to build a training dataset that accurately represents the information that was available at the time the model would have been making a prediction.

Along these same lines, data that summarizes the patient encounter or that wasn't available until after the encounter ended should never be used as input data to a model. An obvious example of data in this class is overall length of stay. A less obvious, but equally important example is diagnosis information. Again, diagnoses are primarily used for billing purposes, and the coded diagnoses associated with a patient encounter are generally not available until after the encounter ends and billing is finalized.

Choosing model inputs that are useful for prediction, that are collected regularly for all patients, and which exist in your historical archive unchanged from the way they appeared "in the moment" during the patient encounter will be the most valuable for building a useful EWS model.

3. Maximize the clinical value of predictions.

Making an accurate prediction is essential, but it is also very important to provide clinical information about WHY a prediction is being made. If a clinician is to be alerted about a deteriorating patient, they need information about exactly what is going wrong so that they can figure out what to do about it as quickly as possible. For this reason, providing a clinical justification for the model's prediction is essential.

This is the point in the modeling process where it may make the most sense to invest in informative plots or graphics that can quickly, concisely, and accurately convey as much information as possible to the alert recipients about what the model has noticed that has lead to the alert. We don't want clinicians to have to wade through statistical jargon or complicated data in order to understand the alert. Stoplight (red / yellow / green) designations for model inputs as well as trend data can convey a lot of useful information at a glance when an alert is triggered.

4. Set clear expectations

Prior to implementing any alert in a hospital setting, the intended recipients of EWS alerts will need to have a grounded understanding of the alerts that will be triggered by the model if they are to react appropriately and be supportive of continued use. Consider the following: clinical deterioration events happen very infrequently - maybe once or twice per week depending on the size of the clinical unit. For the sake of illustration, let's say that actual deterioration happens in about 1% of all patient-days. If your model evaluates a patient as having a 20% risk of deterioration, it is still (according to the model) very likely that the patient will NOT deteriorate. At the same time, the patient's risk of deterioration is 20 times higher than baseline, which deserves clinical attention. We would likely consider a 20-fold increase in risk to be a 'high risk' patient, and a decent model will only rarely produce a score this high. If an alert fires at this threshold, most alerts will be false alarms. This may seem paradoxical, but it is what we should expect even from a very good predictive model if the outcome being predicted is sufficiently rare (be on the lookout for my forthcoming technical post about evaluating models that predict rare events). This state of affairs underscores the importance of training and collaboration with clinical stakeholders prior to implementation as they must have clear expectations about what the model is telling them in order to respond appropriately.

5. Involve Clinical Stakeholders from the Start

The prior two sections underscore the importance of building clinical support for an EWS modeling process from the get go. Clinical stakeholders will point you to data that will likely be the most useful inputs for models, they will help define the most appropriate outcomes to predict, they will tell you when the model's evaluation of risk makes sense (and when it doesn't), they will let you know if the output that the model produces is useful, and most importantly, they will be championing support for the alerting process and will be channeling feedback from the end users when the model eventually goes live. Even a very accurate predictive model may struggle to gain support if clinicians are not bought into the process early, so it cannot be overstated how important it is to have clinical champions on board and engaged early in the model development process.

6. The ultimate goal is an accurate assessment of risk.

Ideally, we would have hundreds of historical examples of patients who looked nearly identical (physiologically) to the patients hospitalized at any given time. Then, we could simply calculate the percentage of these historical patients that went on to deteriorate in the coming hours and we would have a highly accurate predictive model for our EWS. However, we don't have this data, and we won't any time soon. The number of unique combinations of vital signs and lab values that can exist in a patient at any given time is astronomically high, and combinations of abnormal values, when the risk of deterioration is likely to be most pronounced, tend to be the most rare. So our EWS model is left to make the best evaluation possible about the risk of deterioration using a large amount of historical data that probably contains very few (if any) examples of historical patients that were in a nearly identical physiological state.

This task is manageable, but proper expectations need to be set. EWS models will not produce certainty; they will produce probabilities. Useful models will convey justifications for these probabilities, and will provide clinicians with actionable information about what is wrong and what they can do to help the patient.

7. Evaluating EWS Models

There are two types of evaluation that are essential in EWS model building: 1) Retrospective Evaluation - to be conducted as input to the decision about whether to move forward with using the model in a live production setting, and 2) Prospective Evaluation - to be conducted in an ongoing manner after a model is deployed for use.

Retrospective evaluation should provide information about how the model is likely to perform in production so that stakeholders can make an informed decision about whether or not to use it. There are a few key statistics that I have found to be very informative in retrospective analyses:

Calibration: This tells us whether the probabilities produced by the model when making a prediction are accurate. In other words, if we look at all the times that the model said that an outcome had a 20% probability, did the outcome actually happen about 20% of the time? A well calibrated model is an essential input to the clinical decision-making process (see my prior post on the Briar Skill Score for model calibration).
Positive Predictive Value (PPV): Also known as precision, this is a measure of how often the outcome actually occurs when an alert is triggered. Keep in mind the note from earlier - clinical decompensation is a rare event. Thus, although it would be great if all of our predictions told us with a high degree of certainty whether clinical deterioration would or would not happen, this level of confidence is unlikely in reality. For really rare events, a PPV of 15%-30% can be pretty good, and is likely to be a big improvement over simply knowing the baseline event rate. Also consider that even if deterioration did not occur, there still may be secondary benefits from evaluating patients that trigger an alarm: updates to the plan of care, or extra diagnostic checks that may ultimately help improve patient outcomes.
Sensitivity: Also known as recall, measures the proportion of all outcome events that the model is expected to detect. Clearly, we want this to be as high as possible, but we also don't want the model firing constantly. That is why we also pay attention to...
Fire rate: This is the frequency with which we expect the alert to go off. When determining this measure using historical data, be sure to simulate the real-world as closely as possible. If your model will have a lockout mechanism (i.e., once it fires, it can't fire again for X number of hours...), be sure to incorporate this into the evaluation process.

I will go into more technical detail about these evaluation metrics in a future post. The key is that these metrics should give you an accurate picture of what will happen when your model is promoted to production use, and your decision-making team needs to be on board with this set of expectations. It is also important to point out that when you build a model that produces a probability of the outcome, you can make some tradeoffs to find a mix of PPV, sensitivity, and fire rate that works for you by adjusting the threshold probability where the model will fire an alert. Adjusting the threshold up to a higher probability will typically improve your PPV and reduce your fire rate, but will likely reduce your sensitivity as well. Conversely, adjusting the threshold down, to a lower probability will improve your sensitivity at the expense of PPV, and the alert will fire more frequently.

Prospective evaluation is tricky - By definition, our model is intended to help prevent the outcomes that it is built to predict and if it does this well, we can't use the occurrence of those events as indication of our model making a 'correct' prediction. Additionally, the model is only sounding the alarm; the clinical intervention that occurs after the alarm goes off is what truly alters the trajectory of the patient. And, as was pointed out earlier, the model is likely to fire a lot when decompensation would not have occurred anyway. So how can we tell if our model is adding value? For this, we need to examine long-term trends in the incidence of the outcome measure as well as other key metrics such as mortality and morbidity across all of the patient encounters that were evaluated by our model and monitored by our alerting system. The project team needs to think clearly about what type of long-term trends an effective model would produce. Most importantly, stakeholders will need to buy-in to the fact that, because we are dealing in rare events, we will likely need a lot of time (months or even years) to determine if the model is truly helping to improve patient outcomes. This can be a tough sell when there is often pressure on clinical leadership to produce rapid improvement in care, but to be fair - any intervention will require the same time period for evaluation when the outcome happens infrequently. The key is - improvement may be occurring, it will just take time to be able to be able to quantify it.

8. Summary

If done correctly, implementing an alerting system that is based on an EWS model can add considerable value to your patient care process. To build an effective model, your data needs to be readily available and accessible in real-time (or near real-time) from your EHR. When an alert fires, your model needs to provide insight about what is wrong so that clinicians can attempt to intervene. Most importantly, the clinical team that will respond to alerts must be involved from the start of the model building process and appropriate expectations must be set about the likelihood of false positives and the time that will be required to conduct an appropriate prospective evaluation of the model once it has been implemented.

Hopefully you found this post helpful. I would love to hear your thoughts, and would be happy to discuss your EWS project in more detail if you are interested!

Lessons Learned Building Early Warning Models

Recent Posts

Comments