Publication Date

2011

Document Type

Thesis

Committee Members

Keke Chen (Committee Member), Shaojun Wang (Advisor), Xinhui Zhang (Committee Member)

Degree Name

Master of Science in Computer Engineering (MSCE)

Abstract

Language model is a crucial component in statistical machine translation system. The basic language model is N-gram which predicts the next word based on previous N-1 words. It has been used in the state-of-the-art commercial machine translation systems over years. However, the N-gram model ignores the rich syntactic and semantic structure in natural languages. We propose a composite semantic N-gram language model which combines probabilistic latent semantic analysis model with N-gram as a generative model. We have implemented the proposed composite language model in a super-computer with thousand processors that is trained by 1.3 billion tokens corpus. Comparing with simple N-gram, the large scale composite language model has achieved significant perplexity reduction and BLEU score improvement in an n-best list re-ranking task for machine translation.

Page Count

Department or Program

Department of Computer Science and Engineering

Year Degree Awarded

2011

Copyright

Download

Included in

Computer Engineering Commons

COinS

Browse all Theses and Dissertations

Large Scale Distributed Semantic N-gram Language Model

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Included in

Search

Browse

About

Browse all Theses and Dissertations

Large Scale Distributed Semantic N-gram Language Model

Author

Publication Date

Document Type

Committee Members

Degree Name

Abstract

Page Count

Department or Program

Year Degree Awarded

Copyright

Included in

Share

Search

Browse

About