Type
Text
Type
Dissertation
Advisor
Choi, Yejin | Fodor, Paul | Borodin, Yevgen | Mooney, Raymond.
Date
2015-08-01
Keywords
image descriptions, natural language generation, natural language processing | Computer science
Department
Department of Computer Science.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/77293
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
We study the task of image description generation, which can find applications in image search, web accessibility research, story illustration, etc. Rather than concentrating on precise but robotic descriptions, we aim to generate captions, which are human-like, but which are still relevant to the image content. Human generated text is nontrivial in structure and vocabulary. A purely bottom-up approach, relying only on vision detection vocabulary, would struggle to generate such a description as " A cute squirrel having a feast under a tree" . To generate descriptions, which are close to human-like in their complexity and richness, we exploit a vast amount of human-written text available on the Internet and use a dataset of images associated with their captions written by users the web-site Flickr. Based on various aspects of the target image, we collect a set of matching images. From the human-written captions of the obtained images we elicit candidate phrases associated with the matching aspects. We selectively glue together extracted phrases into plausible descriptions, using linguistic patterns and parse tree structure. We tackle this non-trivial task by modeling it as an Integer Linear Programming problem and introducing a novel tree-driven phrase composition framework. As an optional preprocessing step to the generation process, we introduce the task of image caption generalization, the aim of which is to remove extraneous information from image captions written by Flickr users. Evaluation results show that, when using generalized captions as a new source of candidate phrases, we are able to generate descriptions of a better quality in terms of relevance, whilst achieving expressiveness and linguistic sophistication of the resulting output. | 146 pages
Recommended Citation
Kuznetsova, Polina, "Composing Image Descriptions in Natural Language" (2015). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3114.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3114