Document Type

Article

Publication Date

5-1-2021

City

Corvallis

State

OR

Abstract

Machine learning algorithms have become popular tools for automated classification of text; however, performance of such algorithms varies and depends on several factors. We examined how a subjective labeling process based on a human factors taxonomy can influence human, as well as automated, classification of safety incident reports from aviation. In order to evaluate these challenges, we trained a machine learning classifier on a subset of 17,253 incident reports from the NASA Aviation Safety Reporting System using multi-label classification, and collected labels from six human annotators for a representative subset of 400 incident reports each, resulting in a total of 2,400 individual annotations. Results showed that, in general, reliability of human annotation for the set of incident reports selected in this study was comparatively low. Performance of machine learning annotation followed patterns of human agreement on labels. Suggestions on how to improve the data collection and labeling process are provided.


Share

COinS