Kno.e.sis Publications

Combining Statistical Language Models via the Latent Maximum Entropy Principle

Shaojun Wang, Wright State University - Main CampusFollow
Dale Schuurmans
Fuchun Peng
Yunxin Zhao

Document Type

Article

Publication Date

9-2005

Abstract

We present a unified probabilistic framework for statistical language modeling which can simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Our approach is based on a recent statistical inference principle we have proposed—the latent maximum entropy principle—which allows relationships over hidden features to be effectively captured in a unified model. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then use these techniques to combine two standard forms of language models: local lexical models (Markov N-gram models) and global document-level semantic models (probabilistic latent semantic analysis). Our experimental results on the Wall Street Journal corpus show that we obtain a 18.5% reduction in perplexity compared to the baseline tri-gram model with Good-Turing smoothing.

Repository Citation

Wang, S., Schuurmans, D., Peng, F., & Zhao, Y. (2005). Combining Statistical Language Models via the Latent Maximum Entropy Principle. Machine Learning, 60 (1-3), 229-250.
https://corescholar.libraries.wright.edu/knoesis/108

DOI

10.1007/s10994-005-0928-7

Find in your library

Off-Campus WSU Users

Find in your library

COinS

Kno.e.sis Publications

Combining Statistical Language Models via the Latent Maximum Entropy Principle

Document Type

Publication Date

Abstract

Repository Citation

DOI

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

Combining Statistical Language Models via the Latent Maximum Entropy Principle

Authors

Document Type

Publication Date

Abstract

Repository Citation

DOI

Share

Search

Browse

About

SelectedWorks Sites