Authors

Manoj Harpalani

Type

Text

Type

Thesis

Advisor

Yejin Choi. | Johnson, Rob | Skiena, Steve

Date

2010-12-01

Keywords

Computer Science | Machine Learning, Natural Language Processing, Vandalism Detection, Wikipedia, Wikipedia Vandalism Detection, Wiki Vandalysis

Department

Department of Computer Science

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/70926

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Wikipedia describes itself as"The free encyclopedia that anyone canedit". Along with the helpful volunteers who contribute by improving the articles, a great number of malicious users abuse the open nature of Wikipedia by vandalizing articles. Wikipedia editors fight vandalism both manually and with automated bots that use regular expressions and other simple rules to recognize malicious edits[Carter]. Researchers have also proposed Machine Learning algorithms for vandalism detection[Smets et al. | 2008; Potthast et al. | 2008a], but these algorithms are still in their infancy and have much room for improvement. This paper presents an approach to fighting vandalism using natural language processing and machine learning techniques. Along with basic features of the edit like edit distance, edit type, count of abnormal patterns and slang words, we use features related to information about the editor, past revision history of the article, change in sentiment of the article and PCFG sentence parser score. We have successfully been able to achieve an area under the ROC curve (AUC) of 0.94 and F1 score of 0.53 using LogitBoost in a 10 cross validation setting on a training set [Potthast, 2010] of 32444 human annotated edits. We also analyze the performance of our features by building separate classifier for insert or changes, deletes and template edits in a balanced and unbalanced corpus setting.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.