Publication Date
2023
Document Type
Thesis
Committee Members
Krishnaprasad Thirunarayan, Ph.D. (Advisor); Shu Schiller, Ph.D. (Committee Member); Michael Raymer, Ph.D. (Committee Member)
Degree Name
Master of Science (MS)
Abstract
Obtaining accurate inferences from deep neural networks is difficult when models are trained on instances with conflicting labels. Algorithmic recognition of online hate speech illustrates this. No human annotator is perfectly reliable, so multiple annotators evaluate and label online posts in a corpus. Labeling scheme limitations, differences in annotators' beliefs, and limits to annotators' honesty and carefulness cause some labels to disagree. Consequently, decisive and accurate inferences become less likely. Some practical applications such as social research can tolerate some indecisiveness. However, an online platform using an indecisive classifier for automated content moderation could create more problems than it solves. Disagreements can be addressed in training by using the label a majority of annotators assigned (majority vote), training only with unanimously annotated cases (clean filtering), and representing training labels as probabilities (soft labeling). This study shows clean filtering occasionally outperforming majority voting, and soft labeling outperforming both.
Page Count
58
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2023
Copyright
Copyright 2023, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
ORCID ID
0000-0003-3332-4485