Computer Science and Engineering Faculty Publications

Visual Entailment Task for Visually-Grounded Language Learning

Document Type

Article

Publication Date

1-2019

Abstract

We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at https://github.com/necla-ml/SNLI-VE) is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform

Repository Citation

Xie, N., Lai, F., Doran, D., & Kadav, A. (2019). Visual Entailment Task for Visually-Grounded Language Learning. .
https://corescholar.libraries.wright.edu/cse/514

Download

Request Accessible Version

Included in

Computer Sciences Commons, Engineering Commons

COinS

Computer Science and Engineering Faculty Publications

Visual Entailment Task for Visually-Grounded Language Learning

Document Type

Publication Date

Abstract

Repository Citation

Included in

Search

Browse

About

Computer Science and Engineering Faculty Publications

Visual Entailment Task for Visually-Grounded Language Learning

Authors

Document Type

Publication Date

Abstract

Repository Citation

Included in

Share

Search

Browse

About