Document Type
Article
Publication Date
1-1-2024
Identifier/URL
42356345 (Pure)
Abstract
Monitoring public sentiment via social media is potentially helpful during health crises such as the COVID-19 pandemic. However, traditional frequency-based, data-driven neural network-based approaches can miss newly relevant content due to the evolving nature of language in a dynamically evolving environment. Human-curated symbolic knowledge sources, such as lexicons for standard language and slang terms, can potentially elevate social media signals in evolving language. We introduce a neurosymbolic method that integrates neural networks with symbolic knowledge sources, enhancing the detection and interpretation of mental health-related tweets relevant to COVID-19. Our method was evaluated using a corpus of large datasets (approximately 12 billion tweets, 2.5 million subreddit data, and 700k news articles) and multiple knowledge graphs. This method dynamically adapts to evolving language, outperforming purely data-driven models with an F1 score exceeding 92\%. This approach also showed faster adaptation to new data and lower computational demands than fine-tuning pre-trained large language models (LLMs). This study demonstrates the benefit of neurosymbolic methods in interpreting text in a dynamic environment for tasks such as health surveillance.
Repository Citation
Khandelwal, V.,
Gaur, M.,
Kursuncu, U.,
Shalin, V. L.,
& Sheth, A. P.
(2024). A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19. Arxiv, 959-968.
https://corescholar.libraries.wright.edu/psychology/642
DOI
10.48550/arXiv.2411.0716
Comments
Pre-Print.
This work is licensed under CC BY 4.0

Accepted for publicaiton in 2024 IEEE International Conference on Big Data (BigData): 10.1109/BigData62323.2024.10825174