Type

Text

Type

Dissertation

Advisor

Chiueh, Tzi-cker | Zadok, Erez | Porter, Donald | Aguilera, Marcos.

Date

2014-12-01

Keywords

Beluga, BOSC, Cloud Storage, deduplication, Quality of Service, Sungem | Computer science

Department

Department of Computer Science.

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/77313

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

A fundamental building block for an IaaS (Infrastructure-as-a-Service) cloud service such as Amazon's EC2 is a storage virtualization system that provides block-level storage services to individual virtual machines over the network. This dissertation addresses four major problems in such a block-level cloud storage system, in the context of an end-to-end IaaS solution called ITRI Cloud OS. First, to effectively eliminate redundancies in stored data blocks, we propose a scalable block-level deduplication engine called Sungem, which uses both sampling and prefetching to minimize the performance overhead of fingerprint accesses, and features a storage block garbage collection algorithm whose run- time overhead is proportional only to the size of the delta between consecutive backup operations. Second, to efficiently flush meta-data updates associated with large-scale block-level storage management, we developed a novel storage system architecture called BOSC (Batching mOdifications with Sequential Commit), which uses largely sequential writes to commit updates to disk and is thus able to sustain high-throughput and low-latency metadata updates that are largely random. Third, as part of the BOSC architecture, we invented a high-throughput low-latency disk logging system called Beluga, which fashions a carefully tuned disk write pipeline and makes it possible to provide, on an array of three commodity 7200 RPM SATA disks, close to 5 million fine-grained (64-byte) disk logging operations per second, which is close to the maximum possible bandwidth on a commodity disk, while keeping the latency of each logging operation under 1 msec. Finally, we devised a set of techniques for supporting software-defined storage service on a distributed and replicated storage architecture. Specifically, we developed a distributed storage QoS guarantee system called Cheetah, which is able to provide a bandwidth guarantee to each virtual disk attached to a virtual machine, while ensuring the loads on the distributed storage nodes be balanced, and the locality of the access stream associated with each virtual disk be preserved as much as possible. | 260 pages

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.