Data-Driven Strategies for Disease Management in Patients Admitted for Heart Failure

Ankita Agarwal, Wright State University

Abstract

Heart failure is a syndrome which effects a patient’s quality of life adversely. It can be caused by different underlying conditions or abnormalities and involves both cardiovascular and non-cardiovascular comorbidities. Heart failure cannot be cured but a patient’s quality of life can be improved by effective treatment through medicines and surgery, and lifestyle management. As effective treatment of heart failure incurs cost for the patients and resource allocation for the hospitals, predicting length of stay of these patients during each hospitalization becomes important. Heart failure can be classified into two types: left sided heart failure and right sided heart failure. Left sided heart failure can be further divided into two types: systolic heart failure or heart failure with reduced ejection fraction (HFrEF) and diastolic heart failure or heart failure with preserved ejection fraction (HFpEF). As right sided heart failure develops as a result of left sided heart failure, it is important to predict the two types of heart failures categorized based on their ejection volume to manage heart failure. Electronic Health Records (EHRs) of the patients contain information about the diagnostic codes, procedure reports, physiological vitals, medications administered, and discharge summary for each hospitalization. These EHRs can be leveraged to build predictive models to predict outcomes like length of stay and type of heart failure (HFrEF or HFpEF) in the patients. However, these predictive models can be demographically biased and so can lead to unfair decisions. Thus, it is necessary to mitigate these biases in the predictive models without impacting their performance on downstream tasks. In this regard, first I used diagnostic codes and procedure reports of the heart failure during each hospitalization to identify their clinical phenotypes through a probabilistic framework, using Latent Dirichlet Allocation (LDA). I found 12 clinical phenotypes in the form of themes based on diagnostic codes and procedure reports. I used these themes and their percentage contribution to predict length of stay and type of heart failure, i.e., HFrEF or HFpEF. Specifically, I was able to predict length of stay of the patients with an accuracy of 61.1% and HFrEF and HFpEF with an accuracy of 66.1%, and 67.5% respectively. Finally, I used these clinical phenotypes to measure gender and ethnicity related bias in the representation space. Second, I proposed a novel debiasing deep learning framework, known as Debias-CLR. Debias-CLR is based on a contrastive learning framework to obtain debiased representations such that the sensitive attributes do not impact the predictions on downstream tasks. I developed two Debias-CLR models: one to mitigate the gender related bias and the other to mitigate the ethnicity related bias in the representation space. For each of these two models, I first devised a novel method to generate counterfactual examples for each sample. This method is generalizable to different domains including healthcare and can include multimodalities of data such as text and numerical features. Then I trained a contrastive learning framework to obtain debiased representations using these counterfactual examples. Finally, I evaluated the fairness of the debiased model by modifying the Single-Category Word Embedding Association Test (SC-WEAT) metric so as to calculate the effect size of association between the biased feature embeddings and clinical phenotypes identified in diagnostic codes and procedure reports and compare these with effect size of association between the debiased feature embeddings and clinical phenotypes identified in diagnostic codes and procedure reports. I found that for gender related bias, the SC-WEAT effect size got reduced from 0.8 to 0.3 for association between feature embeddings and diagnostic codes phenotypes and from 0.4 to 0.2 for association between feature embeddings and procedure reports phenotypes. Similarly, for ethnicity related bias, the SC-WEAT effect size got reduced from 1 to 0.5 for association between feature embeddings and diagnostic codes phenotypes and from -1 to 0.3 for association between feature embeddings and procedure reports phenotypes. Thus, the results indicate that I was able to obtain fair models for gender and ethnicity using the proposed Debias-CLR framework. Finally, I tested the representativeness of Debias-CLR by measuring its performance on three downstream tasks like predicting length of stay and type of heart failure, i.e. HFrEF and HFpEF and by implementing feature regularization using a cutout strategy. I called this debiased framework with feature regularization as Debias-CLR-R. I found that as the proportion of training data increased, Debias-CLR reduced the value of SC-WEAT effect size and Debias-CLR-R reduced this effect size even further indicating a fairer model with feature regularization. Additionally, using feature representations obtained through Debias-CLR or Debias-CLR-R for predicting downstream tasks like length of stay and types of heart failure (HFrEF or HFpEF) did not cause a reduction in predicting these outcomes as compared to using the raw embeddings before debiasing. Instead, the accuracy of predicting these downstream tasks got improved even in case of unbalanced outcomes as evident from the values of Mathews Correlation Coefficient and Cohen’s Kappa score. Debiased representations also reduced the computational time complexity of predicting downstream tasks using a linear classifier like logistic regression. Thus, I concluded that the proposed framework for mitigating bias led to fairer and representative representations.