Type
Text
Type
Dissertation
Advisor
Zadok, Erez | Chiueh, Tzi-cker. | Stoller, Scott | Cheng, Yueqiang.
Date
2017-08-01
Keywords
Bounded Latency | Computer science | Fault Tolerance | Group Fault Tolerance | Protection | Virtualized Servers | Virtual Machines
Department
Department of Computer Science.
Language
en_US
Source
This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree
Identifier
http://hdl.handle.net/11401/78196
Publisher
The Graduate School, Stony Brook University: Stony Brook, NY.
Format
application/pdf
Abstract
Buggy drivers and hardware failures are two major threats to the reliability of modern virtualized systems. This dissertation proposes SIDE (streamlined isolated driver execution), which protects both a virtual machine monitor and the guest VMs running on top of it from buggy device drivers by isolating the execution of these drivers in a separate protection domain. To enable a guest VM to tolerate any hardware failure of the physical machine it runs on, this dissertation proposes Cuju, an industrial-strength virtualization-based fault tolerance system that allows an individual VM or a group of VMs to continue running despite any failure in the underlying hardware. SIDE protects an OS from buggy device drivers in a way that does not require any modifications to the drivers and that avoids changing the kernel code as much as possible. It exploits virtual memory hardware to set up a device driver execution environment that is compatible with existing device drivers and yet is fully isolated from the kernel. Driver fault is contained and the driver can be reloaded by SIDE without the need of rebooting the kernel. Augmented with a series of optimizations that reduce the number of protection domain crossings between an isolated device driver and the kernel, SIDE is able to run an unmodified device driver for a Gigabit Ethernet NIC while keeping the latency and throughput penalty under 1%. Cuju is based on an epoch-based execution model that holds off a VM's network and disk outputs within an epoch and lets them go only at the end of every epoch, and features a pipeline implementation that maximizes the overlap between consecutive epochs. The current Cuju implementation supports dirty page tracking, dirty page compression, disk I/O retry, bounded-latency execution, network packet handling that minimizes the TCP throughput penalty due to epoch-based execution, and VM execution on multiple virtual CPUs. In addition, Cuju also supports a "fate sharing" approach to supporting the same level of fault tolerance for a group of communicating VMs that form a single service, to reduce the latency penalty that would have arisen had each of these VMs been protected by the single-VM Cuju implementation. When the epoch is set to 10 msec, Cuju incurs a 17.4% performance overhead for a high-page-dirtying-rate VM kernel compilation. For a group of VMs running SPECweb 2009, their throughput under Cuju's protection is almost the same as that when they run without any fault tolerance protection. | 149 pages
Recommended Citation
Sun, Yifeng, "Protection Mechanisms for Virtual Machines on Virtualized Servers" (2017). Stony Brook Theses and Dissertations Collection, 2006-2020 (closed to submissions). 3691.
https://commons.library.stonybrook.edu/stony-brook-theses-and-dissertations-collection/3691