Publication Date

2023

Document Type

Thesis

Committee Members

Krishnaprasad Thirunarayan, Ph.D. (Advisor); Shu Schiller, Ph.D. (Committee Member); Michael Raymer, Ph.D. (Committee Member)

Degree Name

Master of Science (MS)

Abstract

Obtaining accurate inferences from deep neural networks is difficult when models are trained on instances with conflicting labels. Algorithmic recognition of online hate speech illustrates this. No human annotator is perfectly reliable, so multiple annotators evaluate and label online posts in a corpus. Labeling scheme limitations, differences in annotators' beliefs, and limits to annotators' honesty and carefulness cause some labels to disagree. Consequently, decisive and accurate inferences become less likely. Some practical applications such as social research can tolerate some indecisiveness. However, an online platform using an indecisive classifier for automated content moderation could create more problems than it solves. Disagreements can be addressed in training by using the label a majority of annotators assigned (majority vote), training only with unanimously annotated cases (clean filtering), and representing training labels as probabilities (soft labeling). This study shows clean filtering occasionally outperforming majority voting, and soft labeling outperforming both.

Page Count

58

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2023

ORCID ID

0000-0003-3332-4485


Share

COinS