Kno.e.sis Publications

Language and Task Independent Text Categorization with Simple Language Models

Fuchun Peng
Dale Schuurmans
Shaojun Wang, Wright State University - Main CampusFollow

Document Type

Conference Proceeding

Publication Date

2003

Abstract

We present a simple method for language independent and task independent text categorization learning, based on character-level n-gram language models. Our approach uses simple information theoretic principles and achieves effective performance across a variety of languages and tasks without requiring feature selection or extensive pre-processing. To demonstrate the language and task independence of the proposed technique, we present experimental results on several languages--Greek, English, Chinese and Japanese--in several text categorization problems--language identification, authorship attribution, text genre classification, and topic detection. Our experimental results show that the simple approach achieves state of the art performance in each case.

Comments

Presented at the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Alberta, Canada, May 27-June 1, 2003.

Repository Citation

Peng, F., Schuurmans, D., & Wang, S. (2003). Language and Task Independent Text Categorization with Simple Language Models. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 1, 110-117.
https://corescholar.libraries.wright.edu/knoesis/1019

DOI

10.3115/1073445.1073470

Link to Full Text

COinS

Kno.e.sis Publications

Language and Task Independent Text Categorization with Simple Language Models

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Search

Browse

About

Kno.e.sis Publications

Language and Task Independent Text Categorization with Simple Language Models

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Share

Search

Browse

About