Kno.e.sis Publications

A Scalable Distributed Syntactic, Semantic and Lexical Language Model

Ming Tan, Wright State University - Main CampusFollow
Wenli Zhou, Wright State University - Main Campus
Lei Zheng, Wright State University - Main CampusFollow
Shaojun Wang, Wright State University - Main CampusFollow

Document Type

Article

Publication Date

2012

Abstract

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and "readability" of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Comments

Repository Citation

Tan, M., Zhou, W., Zheng, L., & Wang, S. (2012). A Scalable Distributed Syntactic, Semantic and Lexical Language Model. Computational Linguistics, 38 (3), 631-671.
https://corescholar.libraries.wright.edu/knoesis/1009

DOI

10.1162/COLI_a_00107

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

A Scalable Distributed Syntactic, Semantic and Lexical Language Model

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

A Scalable Distributed Syntactic, Semantic and Lexical Language Model

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Share

Search

Browse

About

SelectedWorks Sites