Authors

Song Feng

Type

Text

Type

Dissertation

Advisor

Choi, Yejin | Skiena, Steven | Ramakrishnan, I.V. | Choi, Yejin Choi | Mihalcea, Rada.

Date

2014-12-01

Keywords

Deception, Graph-based Algorithms, Information Retrieval, Natural Language Processing, Sentiment Analysis | Computer science

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77280

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

In natural-language texts, certain information intended by the author, such as connotation, deception, sarcasm, humor, may not be stated explicitly. Recognizing such authorial intention is one of the keys to truly understanding human communications. There are rapidly increasing interests in uncovering the intention that is embedded in the textual content for real-life applications, such as opinion mining, deception detection, news-gathering, text generation, and educational testing. However, identifying the intended information computationally can be very challenging as it usually requires appropriate syntactic and semantic schemes for interpretations or inferences, and sometimes, the factor of the world knowledge. Previous work addressing authorial intention from different perspectives such as linguistics, rhetoric, psychology and sociology, showing the potentials of computational linguistic techniques for detecting the implicit intention; however, the topic remains largely uncharted. This thesis describes our focused and in-depth study on how to automatically identify the authorial intention in the textual content. In particular, our study focuses on two types of applications that have not been explored much so far. One is learning the general connotation, which is essentially to identify the nuanced sentiment that is not necessarily expressed or strictly implied in the text. We aim to exploit the algorithms that are suitable for leveraging large-scale text data with minimalism of world knowledge or human guidance. Therefore, we develop the approaches in light of various linguistic insights and learn the general connotation in a nearly unsupervised manner. We present the first large-scale connotation lexicon over a network of words and senses. The other is detecting the intent of deceit in the writings, which potentially helps suppressing the rampant deceptive behavior in the online community. In this work, we extract salient and discriminating linguistic features from the text and apply supervised learning to predict intended deception in the writing. In addition, this work investigates on the efficacy of assorted informative cues and provides insights based on web resources using computational linguistic techniques. Further more, to generalize our study, we develop automated approaches to collect corpora for deception detection. | 182 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.