Publication Date

2017

Document Type

Thesis

Committee Members

Amit P. Sheth (Thesis Director), Mateen M. Rizki (Department Chair), Keke Chen (Committee Member), Brandon Minnery (Committee Member), Barry Milligan (Committee Member)

Degree Name

Master of Science (MS)

Abstract

Collecting diversified opinions is the key to achieve "the Wisdom of Crowd". In this work, we propose to use a novel multi-view clustering method to group the crowd so that diversified opinions can be effectively sampled from different groups of people.Clustering is the process of dividing input data into possible subsets, where every element (entity) in each subset is considered to be related by some similarity measure. For example, a set of social media users can be clustered using their locations or common interests. However, real-world data is often best represented by multiple views/dimensions. For example, a set of social media users have a friend/follower network as well as a conversation network (different from a follower network). Multiple views enable a better understanding of data by improving knowledge accuracy through cross verification across different views; it also improves the performance by integrating multiple views. Multi-view clustering enables this. Clustering quality, clustering agreement (consensus) and scalability are the three essential qualities for achieving higher correspondence between the clusters and the real underlying groups in multi-view clustering. Existing algorithms either lack scalability or achieve cluster convergence (consistent clusters across the views) very slowly. Most of the existing and recent multi-view clustering algorithms make use of spectral clustering. Spectral clustering which ensures higher accuracy is computationally costly because of eigenvector computation. To address this gap, in this paper we propose a clustering mechanism based on a co-training approach that achieves the three qualities.The two main contributions of our work are as follows: (1) a learning method using power-iteration clustering for clustering a single data view, and (2) an efficient and scalable update method that uses the cluster label information for updating other data views iteratively to achieve convergence (clustering agreement) and cluster quality.The proposed method is evaluated on two real-world datasets to show that it outperforms existing approaches in terms of clustering quality and consensus. We evaluate the clustering quality in the context of a Wisdom of Crowds application. Specifically, we use clustering to identify groups of similar users (crowd members) based on their social media conversations (Tweets) related to a particular topic, in this case, fantasy sports (Fantasy Premier League soccer in particular). We then form virtual groups of diverse and non-diverse users based on the clusters identified. Our results show that diverse crowds outperform non-diverse crowds in a typical fantasy sports task (picking a team captain), consequently validating our cluster qualities.

Page Count

53

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2017

ORCID ID

http://orcid.org/0000-0001-8142-5964


Share

COinS