Authors

Jason O'Rawe

Type

Text

Type

Dissertation

Advisor

Lyon, Gholson | Patro, Robert | Rest, Joshua | Mason, Christopher | Ferson, Scott | .

Date

2016-12-01

Keywords

Genetics

Department

Department of Genetics

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77607

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

High-throughput DNA sequencing technologies have given us the power to understand genetic disease at extraordinarily detailed resolution. It is now possible to sequence a person’s whole genome and search for the genetic markers that contribute to specific disease, or even markers that contribute to the possibility of developing a new one. However, the task of understanding and sifting through billions of data-points is not a trivial one. There are diverse statistical, algorithmic and practical implementation challenges that must be met so that we can accurately and reliably analyze the vast swaths of data that come from human DNA sequences. Indeed, strategies for detecting human sequence variation in exome and whole genome sequencing data are myriad, but the reliability of these methods, even when applied to the same underlying sequencing data, is unclear. Furthermore, in the context of imperfect agreement among results stemming from these various methods, powerful strategies for assessing and recovering true, but missed, sequence variation have yet to be devised. Most research effort has focused on mitigating false detection. It is in this context that highthroughput sequencing technologies are used for both research and clinical investigations. In the medical genomics realm, our understanding of the genetic origins of human disease has been empowered by these technologies, but unreliable analyses have led to a number of false positive research findings. The community has since recognized the need for robust and comprehensive sequencing and analysis methods, particularly in cases where only a small number of samples from probands or affected families are available. In the clinical realm, most agree that there exists an enormous amount of potential for these technologies to transform clinical care, but the practicality of their use is currently understudied, particularly for individual patients among complex cohorts, such as those harboring psychiatric afflictions. In order to move the field of human genetics research forward and to contribute toward the successful implementation of genomics-guided medical care, several key advancements are needed: a characterization of the reliability of current high-throughout analysis methods, methods for recovering missed sequence variants from discordant detection sets, an understanding of current infrastructural deficiencies for implementation, general guidance on how to use diverse sets of analysis results in the context of generating robust relationships between human sequence variation and disease, and new methodological approaches for generating sequence analysis results that accurately characterize uncertainties in the underlying data, so that the reliabilities of their inferences remain robust throughout the lifetime of their use. | High-throughput DNA sequencing technologies have given us the power to understand genetic disease at extraordinarily detailed resolution. It is now possible to sequence a person’s whole genome and search for the genetic markers that contribute to specific disease, or even markers that contribute to the possibility of developing a new one. However, the task of understanding and sifting through billions of data-points is not a trivial one. There are diverse statistical, algorithmic and practical implementation challenges that must be met so that we can accurately and reliably analyze the vast swaths of data that come from human DNA sequences. Indeed, strategies for detecting human sequence variation in exome and whole genome sequencing data are myriad, but the reliability of these methods, even when applied to the same underlying sequencing data, is unclear. Furthermore, in the context of imperfect agreement among results stemming from these various methods, powerful strategies for assessing and recovering true, but missed, sequence variation have yet to be devised. Most research effort has focused on mitigating false detection. It is in this context that highthroughput sequencing technologies are used for both research and clinical investigations. In the medical genomics realm, our understanding of the genetic origins of human disease has been empowered by these technologies, but unreliable analyses have led to a number of false positive research findings. The community has since recognized the need for robust and comprehensive sequencing and analysis methods, particularly in cases where only a small number of samples from probands or affected families are available. In the clinical realm, most agree that there exists an enormous amount of potential for these technologies to transform clinical care, but the practicality of their use is currently understudied, particularly for individual patients among complex cohorts, such as those harboring psychiatric afflictions. In order to move the field of human genetics research forward and to contribute toward the successful implementation of genomics-guided medical care, several key advancements are needed: a characterization of the reliability of current high-throughout analysis methods, methods for recovering missed sequence variants from discordant detection sets, an understanding of current infrastructural deficiencies for implementation, general guidance on how to use diverse sets of analysis results in the context of generating robust relationships between human sequence variation and disease, and new methodological approaches for generating sequence analysis results that accurately characterize uncertainties in the underlying data, so that the reliabilities of their inferences remain robust throughout the lifetime of their use. | 214 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.