Publication Date

2022

Document Type

Dissertation

Committee Members

Michael Raymer, Ph.D. (Committee Co-Chair); Tim Crawford, Ph.D. (Committee Co-Chair); Reza Sadeghi, Ph.D. (Committee Member); Lingwei Chen, Ph.D. (Committee Member); Mahdiyeh Zabihimayvan, Ph.D. (Committee Member); Thomas Wischgoll, Ph.D. (Committee Member)

Degree Name

Doctor of Philosophy (PhD)

Abstract

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State University’s Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model's performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence.

Page Count

101

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2022

Copyright

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

ORCID ID

0000-0003-3661-6766

Download

Request Accessible Version

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Browse all Theses and Dissertations

Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Creative Commons License

ORCID ID

Included in

Search

Browse

About

Browse all Theses and Dissertations

Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter

Author

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Creative Commons License

ORCID ID

Included in

Share

Search

Browse

About