Kno.e.sis Publications

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Mohammadreza Rezvan, Wright State University - Main Campus
Saeedeh Shekarpour
Lakshika Balasuriya, Wright State University - Main Campus
Valerie L. Shalin, Wright State University - Main CampusFollow
Amit P. Sheth, Wright State University - Main CampusFollow

Document Type

Conference Proceeding

Publication Date

2018

Abstract

A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2.

Repository Citation

Rezvan, M., Shekarpour, S., Balasuriya, L., Shalin, V. L., & Sheth, A. P. (2018). A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research. Proceeding WebSci '18 Proceedings of the 10th ACM Conference on Web Science, 33-36.
https://corescholar.libraries.wright.edu/knoesis/1151

DOI

10.1145/3201064.3201103

Find in your library

Off-Campus WSU Users

Find in your library

COinS

Kno.e.sis Publications

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Document Type

Publication Date

Abstract

Repository Citation

DOI

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Authors

Document Type

Publication Date

Abstract

Repository Citation

DOI

Share

Search

Browse

About

SelectedWorks Sites