Type

Text

Type

Dissertation

Advisor

Chowdhury, Rezaul A. | Skiena, Steven | Ramakrishnan, I.V. | Bender, Michael | Harrison, Robert.

Date

2015-12-01

Keywords

Computer science | Cache-oblivious recursive divide and conquer, Cache-adaptivity, Robustness, Portability, Energy efficiency, Cache-oblivious wavefront, Hgh-performance, Dynamic programming, Efficient Viterbi algorithm, Lockfree breadth-first search, Molecular dynamics, Parallel Algorithms, Bioinformatics

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77316

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

Since the beginning of the last decade, plateauing of the clock speed of computer processors has forced us to invest more in parallelism — for both hardware and software. This resulted in improvements in computing technologies that have favored parallelism over increased clock speed. Taking advantage of these improvements requires designing algorithms that can leverage parallelism well. In this dissertation, we show how to take advantage of several algorithm design techniques to harness modern heterogeneous parallel architectures for solving problems in bioinformatics efficiently. Our main goal while designing algorithms is to achieve high-performance in terms of running time and scalability. Other desirable goals include energy efficiency, portability and adaptivity. We solve bioinformatics problems on grids (dynamic programming problems), on graphs (breadth-first search), and problems that can be solved using spatial trees (Molecular Dynamics using octrees). We present many novel algorithms, algorithm engineering techniques, theoretical analyses and performance evaluations on a range of state-of-the-art parallel architectures including multicores, manycores, and special purpose accelerators. Although we mainly target problems in bioinformatics, the algorithmic techniques we use to solve those problems have general applicability. For many dynamic programming problems, we show that if we use a cache-oblivious recursive divide-and-conquer technique to solve them, the resulting algorithms become highly optimizable, cache-optimal and often have asymptotically better parallelism than their standard iterative counterparts. These algorithms not only have good theoretical bounds but also, perform better than standard iterative and tiled-loop algorithms in terms of running time, scalability, energy efficiency, cache-adaptivity and portability. Furthermore, it is often possible to improve the parallelism of these recursive algorithms without sacrificing cache-optimality by removing artificial dependencies among the tasks introduced by the recursive structure of the algorithm. Breadth-first search is a popular graph traversal algorithm that has many applications in bioinformatics. We show how to use lock- and atomic instruction-free optimistic parallelization to improve parallelism and load balancing in a shared-memory parallel breadth-first search (BFS) algorithm. We present several work-efficient parallel BFS algorithms (including one that uses recursive divide and conquer) along with their theoretical and empirical performance analyses on state-of-the-art multicore and manycore architectures. Spatial trees (e.g. | quadtree, octree, k-D tree) are recursive space partitioning data structures that are often used to store biological molecules efficiently. We present octree-based distributed and distributed-shared-memory hybrid near-far approximation algorithms to compute molecular polarization energy. These algorithms outperform all other state-of-the-art Molecular Dynamics packages by orders of magnitude on multicores and clusters of multicores. We conclude by discussing implications of our work for future parallel algorithm design, and ways to extend our research to other domains. | Since the beginning of the last decade, plateauing of the clock speed of computer processors has forced us to invest more in parallelism — for both hardware and software. This resulted in improvements in computing technologies that have favored parallelism over increased clock speed. Taking advantage of these improvements requires designing algorithms that can leverage parallelism well. In this dissertation, we show how to take advantage of several algorithm design techniques to harness modern heterogeneous parallel architectures for solving problems in bioinformatics efficiently. Our main goal while designing algorithms is to achieve high-performance in terms of running time and scalability. Other desirable goals include energy efficiency, portability and adaptivity. We solve bioinformatics problems on grids (dynamic programming problems), on graphs (breadth-first search), and problems that can be solved using spatial trees (Molecular Dynamics using octrees). We present many novel algorithms, algorithm engineering techniques, theoretical analyses and performance evaluations on a range of state-of-the-art parallel architectures including multicores, manycores, and special purpose accelerators. Although we mainly target problems in bioinformatics, the algorithmic techniques we use to solve those problems have general applicability. For many dynamic programming problems, we show that if we use a cache-oblivious recursive divide-and-conquer technique to solve them, the resulting algorithms become highly optimizable, cache-optimal and often have asymptotically better parallelism than their standard iterative counterparts. These algorithms not only have good theoretical bounds but also, perform better than standard iterative and tiled-loop algorithms in terms of running time, scalability, energy efficiency, cache-adaptivity and portability. Furthermore, it is often possible to improve the parallelism of these recursive algorithms without sacrificing cache-optimality by removing artificial dependencies among the tasks introduced by the recursive structure of the algorithm. Breadth-first search is a popular graph traversal algorithm that has many applications in bioinformatics. We show how to use lock- and atomic instruction-free optimistic parallelization to improve parallelism and load balancing in a shared-memory parallel breadth-first search (BFS) algorithm. We present several work-efficient parallel BFS algorithms (including one that uses recursive divide and conquer) along with their theoretical and empirical performance analyses on state-of-the-art multicore and manycore architectures. Spatial trees (e.g. | quadtree, octree, k-D tree) are recursive space partitioning data structures that are often used to store biological molecules efficiently. We present octree-based distributed and distributed-shared-memory hybrid near-far approximation algorithms to compute molecular polarization energy. These algorithms outperform all other state-of-the-art Molecular Dynamics packages by orders of magnitude on multicores and clusters of multicores. We conclude by discussing implications of our work for future parallel algorithm design, and ways to extend our research to other domains. | 183 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.