Publication Date
2017
Document Type
Thesis
Committee Members
Derek Doran (Committee Co-Chair), Amit Sheth (Committee Co-Chair), Krishnaprasad Thirunarayan (Committee Member)
Degree Name
Master of Science (MS)
Abstract
The crime and violence street gangs introduce into neighborhoods is a growing epidemic in cities around the world. Today, over 1.4 million people, belonging to more than 33,000 gangs, are active in the United States, of which 88% identify themselves as being members of a street gang. With the recent popularity of social media, street gang members have established online presences coinciding with their physical occupation of neighborhoods. Recent studies report that approximately 45% of gang members participate in online offending activities such as threatening, harassing individuals, posting violent videos or attacking someone on the street for something they said online in social media platforms. Thus, their social media posts may be useful to social workers and law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur in a community. Finding these posts, however, requires a method to discover gang member social media profiles. This is a challenging task since gang members represent a very small population compared to the active social media user base. This thesis studies the problem of automatically identifying street gang member profiles on Twitter, which is a popular social media platform that is commonly used by street gang members to promote their online gang-related activities. It outlines a process to curate one of the largest sets of verifiable gang member Twitter profiles that have ever been studied. A review of these profiles establishes differences in the language, profile and cover images, YouTube links, and emoji shared on Twitter by gang members compared to the rest of the Twitter population. Beyond the earlier efforts in Twitter profile identification that utilize features derived from the profile and tweet text, this thesis uses additional heterogeneous sets of features from the emoji usage, profile images, and links to YouTube videos reflecting gang-related music culture towards solving the gang member profile identification problem. Features from this review are used to train a series of supervised machine learning classifiers and they are further improved upon by using word embeddings learned over a large corpus of tweets. Experimental results demonstrate that heterogeneous features enabled our classifiers to achieve low false positive rates and promising F 1-scores.
Page Count
67
Department or Program
Department of Computer Science and Engineering
Year Degree Awarded
2017
Copyright
Copyright 2017, some rights reserved. My ETD may be copied and distributed only for non-commercial purposes and may not be modified. All use must give me credit as the original author.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
ORCID ID
0000-0002-2884-4032