Type
Text
Type
Dissertation
Advisor
Choi, Yejin | Skiena, Steven | Ramakrishnan, I.V. | Choi, Yejin Choi | Mihalcea, Rada.
Date
2014-12-01
Keywords
Deception, Graph-based Algorithms, Information Retrieval, Natural Language Processing, Sentiment Analysis | Computer science
Department
Department of Computer Science.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/77280
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
In natural-language texts, certain information intended by the author, such as connotation, deception, sarcasm, humor, may not be stated explicitly. Recognizing such authorial intention is one of the keys to truly understanding human communications. There are rapidly increasing interests in uncovering the intention that is embedded in the textual content for real-life applications, such as opinion mining, deception detection, news-gathering, text generation, and educational testing. However, identifying the intended information computationally can be very challenging as it usually requires appropriate syntactic and semantic schemes for interpretations or inferences, and sometimes, the factor of the world knowledge. Previous work addressing authorial intention from different perspectives such as linguistics, rhetoric, psychology and sociology, showing the potentials of computational linguistic techniques for detecting the implicit intention; however, the topic remains largely uncharted. This thesis describes our focused and in-depth study on how to automatically identify the authorial intention in the textual content. In particular, our study focuses on two types of applications that have not been explored much so far. One is learning the general connotation, which is essentially to identify the nuanced sentiment that is not necessarily expressed or strictly implied in the text. We aim to exploit the algorithms that are suitable for leveraging large-scale text data with minimalism of world knowledge or human guidance. Therefore, we develop the approaches in light of various linguistic insights and learn the general connotation in a nearly unsupervised manner. We present the first large-scale connotation lexicon over a network of words and senses. The other is detecting the intent of deceit in the writings, which potentially helps suppressing the rampant deceptive behavior in the online community. In this work, we extract salient and discriminating linguistic features from the text and apply supervised learning to predict intended deception in the writing. In addition, this work investigates on the efficacy of assorted informative cues and provides insights based on web resources using computational linguistic techniques. Further more, to generalize our study, we develop automated approaches to collect corpora for deception detection. | 182 pages
Recommended Citation
Feng, Song, "Learning the Intention Embedded in the Natural Language Texts: Focused Studies on Connotation and Deception" (2014). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3101.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3101