Using Natural Language Processing and Machine Learning for Analyzing Clinical Notes in Sickle Cell Disease Patients
Tanvi Banerjee (Advisor), Michelle Cheatham (Committee Member), Mateen Rizki (Committee Member)
Master of Science (MS)
Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of RBC can be triggered by a number of conditions such as dehydration, acidity, low levels of oxygen, stress, and change in temperature. There is no specific medication for pain crisis and the signs and symptoms varies from person to person, making it difficult to provide a common treatment for SCD and understanding the disease. It is believed that 90,000 to 100,000 American are affected by SCD. Myriad number of studies have been working on gaining better understanding of the disease and predict pain crisis and pain level. These studies help people to mitigate or prevent pain crisis by taking precautions. However, no study has used clinical notes to predict pain score and pain sentiment. Clinical notes provide patient specific information including procedures and medication; and can therefore help in predicting accurate scores. Our study focuses on four research problems namely patient informative, pain informactive, pain sentiment and pain scores using SCD data. Notes are taken for a patient during hospitalization but only few provide beneficial information, therefore patient informative and pain informative helps healthcare professionals to scan through the notes that can pro- vide valuable information from all the clinical notes maintained. Pain sentiment and pain score predict the change in pain and pain level for a particular note. Our study experimented with two feature sets, firstly features obtained from cTAKES, a Natural Language Processing (NLP) and secondly features obtained from text using NLP techniques. Four supervised machine learning models namely Logistic Regression, Random Forest, Support Vector Machines, and Multinomial Naive Bayes are built on these different sets of features. From the results, it can be noted that cTAKES features are performing well for SCD problem for all the four research problems with F1 score ranging from 0.40 to 0.86. This indicates that there is promise for using NLP techniques in clinical notes as a means to better understand pain in SCD patients.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2018, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.