Type

Text

Type

Dissertation

Advisor

Choi, Yejin | Fodor, Paul | Borodin, Yevgen | Mooney, Raymond.

Date

2015-08-01

Keywords

image descriptions, natural language generation, natural language processing | Computer science

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77293

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

We study the task of image description generation, which can find applications in image search, web accessibility research, story illustration, etc. Rather than concentrating on precise but robotic descriptions, we aim to generate captions, which are human-like, but which are still relevant to the image content. Human generated text is nontrivial in structure and vocabulary. A purely bottom-up approach, relying only on vision detection vocabulary, would struggle to generate such a description as " A cute squirrel having a feast under a tree" . To generate descriptions, which are close to human-like in their complexity and richness, we exploit a vast amount of human-written text available on the Internet and use a dataset of images associated with their captions written by users the web-site Flickr. Based on various aspects of the target image, we collect a set of matching images. From the human-written captions of the obtained images we elicit candidate phrases associated with the matching aspects. We selectively glue together extracted phrases into plausible descriptions, using linguistic patterns and parse tree structure. We tackle this non-trivial task by modeling it as an Integer Linear Programming problem and introducing a novel tree-driven phrase composition framework. As an optional preprocessing step to the generation process, we introduce the task of image caption generalization, the aim of which is to remove extraneous information from image captions written by Flickr users. Evaluation results show that, when using generalized captions as a new source of candidate phrases, we are able to generate descriptions of a better quality in terms of relevance, whilst achieving expressiveness and linguistic sophistication of the resulting output. | 146 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.