Type

Text

Type

Dissertation

Advisor

Johnson, Rob | Bender, Michael A | Gao, Jie | Farach-Colton, Martin.

Date

2014-12-01

Keywords

Algorithms, Data structures, Lower bounds, Membership | Computer science

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77300

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

There is an increasing need for data structures that efficiently manipulate huge amounts of data. When the data to be represented does not fit into main memory all at once, data structures designed without taking this constraint into account fail to perform as expected. Data structures conceived specifically for this purpose fall into two, not necessarily exclusive, categories. The first consists of data structures designed and analyzed under a model of computation where the cost is given by the amount of data transferred between external and internal memory. The second comprises data structures that make the most out of the internal memory by representing the data using as few space as possible, without hurting the performance of the operations they support. This dissertation falls in the intersection of both categories: It studies representations that use space close to the information-theoretic lower bound while, at the same time, provide good performance guarantees in terms of the number of memory transfers. The resulting data structures are not only of theoretical interest, but can also be (and in most cases have been) easily turned into robust, efficient, well tested, and usable implementations, where the constants and low order terms hidden by Big O notation do not take over the running time in practice. The focus of this dissertation is on data structures that dynamically maintain a set in order to efficiently answer range and membership queries. Specifically, this work presents two compact data structures: the level-based packed memory array, for range and membership queries, and the quotient filter, for approximate membership queries. Furthermore, three extension to this last data structure are also described: the buffered quotient filter, the cascade filter, and the counting quotient filter. Finally, this dissertation shows that the standard B-tree data structure is asymptotically optimal for the batched predecessor problem in external memory if the space is a constrain. The tradeoff that quantifies the minimum space required for a given query cost is also presented. | 108 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.