Publication Date
2017
Document Type
Thesis
Committee Members
Amit P. Sheth (Thesis Director), Mateen M. Rizki (Department Chair), Keke Chen (Committee Member), Brandon Minnery (Committee Member), Barry Milligan (Committee Member)
Degree Name
Master of Science (MS)
Abstract
Collecting diversified opinions is the key to achieve "the Wisdom of Crowd". In this work, we propose to use a novel multi-view clustering method to group the crowd so that diversified opinions can be effectively sampled from different groups of people.Clustering is the process of dividing input data into possible subsets, where every element (entity) in each subset is considered to be related by some similarity measure. For example, a set of social media users can be clustered using their locations or common interests. However, real-world data is often best represented by multiple views/dimensions. For example, a set of social media users have a friend/follower network as well as a conversation network (different from a follower network). Multiple views enable a better understanding of data by improving knowledge accuracy through cross verification across different views; it also improves the performance by integrating multiple views. Multi-view clustering enables this. Clustering quality, clustering agreement (consensus) and scalability are the three essential qualities for achieving higher correspondence between the clusters and the real underlying groups in multi-view clustering. Existing algorithms either lack scalability or achieve cluster convergence (consistent clusters across the views) very slowly. Most of the existing and recent multi-view clustering algorithms make use of spectral clustering. Spectral clustering which ensures higher accuracy is computationally costly because of eigenvector computation. To address this gap, in this paper we propose a clustering mechanism based on a co-training approach that achieves the three qualities.The two main contributions of our work are as follows: (1) a learning method using power-iteration clustering for clustering a single data view, and (2) an efficient and scalable update method that uses the cluster label information for updating other data views iteratively to achieve convergence (clustering agreement) and cluster quality.The proposed method is evaluated on two real-world datasets to show that it outperforms existing approaches in terms of clustering quality and consensus. We evaluate the clustering quality in the context of a Wisdom of Crowds application. Specifically, we use clustering to identify groups of similar users (crowd members) based on their social media conversations (Tweets) related to a particular topic, in this case, fantasy sports (Fantasy Premier League soccer in particular). We then form virtual groups of diverse and non-diverse users based on the clusters identified. Our results show that diverse crowds outperform non-diverse crowds in a typical fantasy sports task (picking a team captain), consequently validating our cluster qualities.
Page Count
53
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2017
Copyright
© 2017, all rights reserved. This open access ETD is published by Wright State University and OhioLINK.
ORCID ID
http://orcid.org/0000-0001-8142-5964