Type
Text
Type
Thesis
Advisor
Zadok, Erez | Johnson, Rob | Porter, Donald.
Date
2012-05-01
Keywords
Block Layer, Context-aware, device mapper, In-line Deduplication | Computer science
Department
Department of Computer Science
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.
Identifier
http://hdl.handle.net/11401/71354
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
The context of data is important for optimal performance of data management systems like deduplication. In typical operating systems, the block layer of the I/O stack is unaware of the context of the data it is operating on. Thanks to the simplicity and modularity of the block layer interface, it is one of the best places to implement data deduplication. We designed an interface between file systems and the block layer that allows a file system to pass the context of the data to the underlying deduplication system at the block layer. This context is in the form of a ``hint'' to convey information that is useful for the block-layer deduplication system, so that it can optimize its operation. For example, the hint can indicate what data is worthy of deduplication, what data should not be deduplicated at all, or that an impending set of I/O operations are likely to generate lot of duplicates. With hints, we observed a 1.5--2x reduction in I/Os and a 10% improvement in CPU utilization for metadata-intensive workloads, compared to a context-unaware deduplication system at the block layer. Our hinting system degraded the deduplication ratio by only 3--5%. To implement hints, we had to change fewer than 0.6% of the Linux kernel, and we changed approximately 600 LoC of file system code in two file systems (Ext3 and NILFS2). Our block-layer deduplication system is about 4,000 LoC of standalone kernel code. | 55 pages
Recommended Citation
Mudrankit, Amar, "A Context Aware Block Layer: The Case for Block Layer Deduplication" (2012). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 560.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/560