Publication Date

2021

Document Type

Thesis

Committee Members

Yong Pei, Ph.D. (Committee Co-Chair); Paul J. Hershberger, Ph.D. (Committee Co-Chair); Jack S. Jean, Ph.D. (Committee Member)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

Development of professional communication skills, such as motivational interviewing, often requires experiential learning through expert instructor-guided role-plays between the trainee and a standard patient/actor. Due to the growing demand for such skills in practices, e.g., for health care providers in the management of mental health challenges, chronic conditions, substance misuse disorders, etc., there is an urgent need to improve the efficacy and scalability of such role-play based experiential learning, which are often bottlenecked by the time-consuming performance assessment process. WSU is developing ReadMI (Real-time Assessment of Dialogue in Motivational Interviewing) to address this challenge, a mobile AI solution aiming to provide automated performance assessment based on ASR and NLP. The main goal of this thesis research is to investigate current commercially available speaker diarization capabilities and evaluate their performance in separating the speeches between the trainee and the standard patient/actor in an in-person role-play training environment where the crosstalk could interfere with the operation and performance of ReadMI. Specifically, this thesis research has: 1.) identified the major commercially-available speaker diarization systems, such as those from Google, Amazon, IBM, and Rev.ai; 2.) designed and implemented corresponding evaluation systems that integrate these commercially available cloud services for operating in the in-person role-play training environments; and, 3.) completed an experimental study that evaluated and compared the performance of the speaker diarization services from Google and Amazon. The main finding of this thesis is that the current speaker diarization capabilities alone are not able to provide sufficient performance for our particular use case when integrating them into ReadMI for operating in in-person role-play training environments. But this thesis research potentially provides a clear baseline reference to future developers for integrating future speaker diarization capabilities into similar applications.

Page Count

45

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2021


Share

COinS