Title: Resource Management in Enterprise Cluster and Storage Systems: Jianzhe Tai
Date: Tuesday July 15th, 2014 at 12:15pm
- Ningfang Mi (Advisor, Northeastern University)
- Kaushik Chowdhury (Northeastern University)
- Peter Desnoyers (Northeastern University)
- Prashant Dhamdhere (VMware)
In this thesis, we present our works on resource management in large scale systems, especially for enterprise cluster and storage systems. Large-scale cluster systems become quite popular among a community of users by offering a variety of resources. Such systems require complex resource management schemes for multi-objective optimizations and should be specific to different system requirements, such as various system architectures and heterogeneous underlying hardware. In addition, burstiness has often been found in enterprise workloads, being a key factor in performance degradation. Therefore, it is an extremely challenging problem of managing heterogeneous resources (e.g., computing, networking and storage) for such a large scale system under bursty conditions while providing performance guarantee and cost efficiency.
To solve this problem, we first investigate the issues of classic load balancers under bursty workloads and explore the new algorithms for effective resource allocation in cluster systems. We demonstrate that burstiness in user demands diminishes the benefits of some existing load balancing algorithms, such as Join Shortest Queue (JSQ). Motivated by this observation, we develop a new class of burstiness-aware load balancing algorithms. First, we present a static version of our new load balancer, named ArA, which tunes the schemes for load balancing by adjusting the degree of randomness and greediness in the selection of computing sites. An online version of our ArA scheme has been developed as well, which predicts the beginning and the end of workload bursts and automatically adjusts the load balancers to compensate. The experimental results show that this new load balancer can adapt quickly to the changes in user demands and thus improve performance in both simulation and real experiments.
Secondly, we work on data management in enterprise storage systems. Tiered storage architectures become attractive in enterprise data centers which provide the shared storage resources to a large variety of applications which might demand for different service level agreements (SLAs). Furthermore, any user query from a data-intensive application could easily trigger a burst of disk I/Os to the back-end storage system, which eventually causes disastrous performance degradation. Therefore, we present a new approach for automated data movement in multi-tiered storage systems aiming to support multiple SLAs for applications with dynamic workloads at the minimal cost.
In addition, Flash technology can be leveraged in virtualized environments as a secondary-level host-side cache for I/O acceleration. However, it might not be able to fully exploit the outstanding performance of Flash and justify the high cost-per-GB of Flash resources. In this dissertation, we present a new Flash Resource Manager, named vFRM, which aims to maximize the utilization of Flash resources with the minimal I/O cost. It borrows the ideas of heating and cooling from thermodynamics to identify the data blocks that benefit most from being put on Flash, and lazily and asynchronously migrates data blocks between Flash and the spinning disks. Further, we investigate the benefits of the global versions of vFRM, named G-vFRM, for managing Flash resources among multiple heterogeneous VMs. Experimental evaluation shows that both vFRM and G-vFRM algorithms can achieve better cost-effectiveness than traditional caching solutions, and cost orders of magnitude less memory and I/O bandwidth.