Guozhu Dong, Ph.D. (Advisor); Keke Chen, Ph.D. (Committee Member); Hemant Purohit, Ph.D. (Committee Member); Michael Raymer, Ph.D. (Committee Member); Krishnaprasad Thirunarayan, Ph.D. (Committee Member)
Doctor of Philosophy (PhD)
Classification is an important branch of machine learning that impacts many areas of modern life. Many classification algorithms (classifiers for short) have been developed. They have highly different levels of sophistication and classification accuracy. Classification problems often have highly different levels of hardness and complexity. Practitioners of classification modeling need better understanding of those algorithms in order to select the optimal algorithm for given classification problems. Researchers of classification need new insight on how given classifiers are weak and how they can be improved by correcting their classification errors. This dissertation introduces new tools and concepts to analyze classifier weakness and provides new insights on classifier weakness and classifier error correctability. Three tools are introduced to discover such insights. (I) The primary tool is a novel algorithm called Pattern-Aided Mixed-Type Modeling (PAMM). This tool produces a structural model revealing the shape and structure of a classifier’s error space, hence offering new analytical possibilities. (ii) Based on the structured model thus produced, new weakness metrics are introduced, incorporating structural properties of the error space and the correctability of classification errors. (iii) This study uses Corrective Method Sets (CMS), which are sets of popular, simple classifiers, to characterize a classifier’s weakness based on how much of a classifier’s errors can be corrected by the CMS. Two families of valuable insights on 11 popular classifiers are obtained using the three new tools. (I) The 11 popular classifiers are ranked in terms of how structured their error spaces are and how correctable their classification errors are, giving insights into classifier weakness and correctability. Such rankings are also compared against pure accuracy-based classifier rankings, giving insights on the relationship between poor classifier accuracy and classifier error correctability. (ii) The top ranked CMS of three types are provided: those applicable to all 11 classifiers, those applicable to each given classifier, and those applicable to each given classifier on data sets with certain characteristics. In summary, this dissertation offers insights on how many opportunities classifiers leave on the table and how much their classification errors can be easily corrected using simple corrective methods.
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
Copyright 2021, all rights reserved. My ETD will be available under the "Fair Use" terms of copyright law.