Authors

Maohua Lu

Type

Text

Type

Dissertation

Advisor

Jie Gao | Korach, Chad | Nakamura, Toshio | Robert Johnson | David D. Chambliss.

Date

2010-08-01

Keywords

Continuous Data Protection, De-duplication, File system consistency, Flash Translation Layer, Metadata Update, versioning file system | Computer Science

Department

Department of Computer Science

Language

en_US

Source

This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.

Identifier

http://hdl.handle.net/11401/72596

Publisher

The Graduate School, Stony Brook University: Stony Brook, NY.

Format

application/pdf

Abstract

The simple read/write interface exposed by traditional disk I/O systems isinadequate for low-locality update-intensive workloads because it limits theflexibility of the disk I/O systems in scheduling disk access requests andresults in inefficient use of buffer memory and disk bandwidth. We proposed anovel disk I/O subsystem architecture called Batching mOdifications withSequential Commit (BOSC), which is optimized for workloads characterized byintensive random updates. BOSC improves the sustained disk update throughput byeffectively aggregating disk update operations and sequentially committing themto disk.We demonstrated the benefits of BOSC by adapting it to 3 different storagesystems. The first one is a continuous data protection system called Mariner.Mariner is an iSCSI-based storage system that is designed to providecomprehensive data protection on commodity hardware while offering the sameperformance as those without any such protection. With the help of BOSC inmetadata updating, the throughput of Mariner has less than 10\% degradationcompared to that without metadata updating.Flash-based storage is the second storage system we leveraged BOSC.Because of the physics underlying the flash memory technology and the coarseaddress mapping granularity used in the on-board flash translation layer (FTL),commodity flash disks exhibit poor random write performance. We designed LFSM, aLog-structured Flash Storage Manager, to eliminate the random write performanceproblem of commodity flash disks by employing data logging and BOSC in metadataupdating. LFSM is able to reduce the average write latency of a commodity flashdisk by a factor of more than 6 under standard benchmarks.As a third example, we applied BOSC to a scalable data de-duplicationsystem based on the incremental backups. Each input block is de-duplicated bycomparing its fingerprint, a collision-free hash value, with existingfingerprints. A range-based block group, called segment, is the basic unit topreserve data locality for incremental backups. We propose four novel techniquesto improve the de-duplication throughput with minimal impact on data de-duplicationratio (DDR). BOSC is employed to eliminate the performance bottleneck due tocommitting segment updates to the disk.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.