Authors

Xin Feng

Type

Text

Type

Dissertation

Advisor

Lincoln Stein. | Wei Lin | Michael Zhang | Zhiping Weng.

Date

2011-05-01

Keywords

Bioinformatics | chip-seq, peak calling, sequencing

Department

Department of Biomedical Engineering

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/71599

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

The mechanisms of regulating the translation of information encoded in DNA into gene expression have been intensively investigated since last century. A large portion of the efforts concentrate on characterizing the proteins that bind to specific chromatin or DNA regions. These proteins play important roles in the regulating hierarchy. Until the beginning of the 21st century, studies probing these chromatin binding proteins are generally conducted at the scale of a single gene or a limited region of the whole genome. The recent advancement in next-generation sequencing has provided a revolutionary method named as ChIP-seq that accurately generates genome-wide profiles of chromatin binding proteins. The modENCODE project has generated genome wide protein binding sites for a large number of chromatin binding proteins of model organisms D.melanogaster and C.elegans. It is thus possible to investigate the spatial distribution of these proteins at the genome-scale. To achieve this goal, an algorithm is needed to find protein binding sites across the genome. Although many existing algorithms suffice the basic need, none of them can resolve binding sites that stay closely to each other and does not sacrifice other desired properties such as specificity of the algorithm. In this thesis, I present my work in designing a ChIP-seq peak calling algorithm called PeakRanger which addresses the above-mentioned concerns. PeakRanger, along with other accessory computing programs are used to analyze the datasets generated by the modENCODE project. With these tools, genome-wide binding sites of a large selection of chromatin binding proteins are generated for both D.melanogaster and C.elegans. The distributions of D.melanogaster insulator binding proteins were analyzed in details, showing their global correlation with gene expression regulation. The properties of binding sites that stay closely to each other are also characterized, which is the first report of doublet binding sites of D.melanogaster. It is shown that doublet binding sites are preferred regions for histone markers of promoters.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.