Kno.e.sis Publications

Language Independent Authorship Attribution using Character Level Language Models

Document Type

Conference Proceeding

Publication Date

4-2003

Abstract

We present a method for computer-assisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.

Comments

Presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, April 12-17, 2003.

Repository Citation

Peng, F., Schuurmans, D., Wang, S., & Keselj, V. (2003). Language Independent Authorship Attribution using Character Level Language Models. Proceedings of the Tenth Conference of the European Chapter of the Association for Computational Linguistics, 1, 267-274.
https://corescholar.libraries.wright.edu/knoesis/1017

DOI

10.3115/1067807.1067843

Link to Full Text

COinS

Kno.e.sis Publications

Language Independent Authorship Attribution using Character Level Language Models

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Search

Browse

About

Kno.e.sis Publications

Language Independent Authorship Attribution using Character Level Language Models

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Share

Search

Browse

About