Top
Back to All Events

The “what”, The “from”, and The “to”: The Migration Games in Deduplicated Systems

The “what”, The “from”, and The “to”: The Migration Games in Deduplicated Systems

Gala Yadgar

Gala Yadgar
Assistant Professor
Computer Science Department
Technion — Israel Institute of Technology

Tuesday, August 2, 2022
11:00AM – 12:30PM (ET)

40 St. George St, Toronto, ON M5S 2E4
Bahen Centre for Information Technology — BA5205

Abstract:

Deduplication reduces the size of the data stored in large-scale storage systems by replacing duplicate data blocks with references to their unique copies. This creates dependencies between files that contain similar content, and complicates the management of data in the system. In this talk, I will describe our work addressing the problem of data migration, where files are remapped between different volumes as a result of system expansion or maintenance. The challenge of determining which files and blocks to migrate has been studied extensively for systems without deduplication. In the context of deduplicated storage, however, only simplified migration scenarios were considered.

We began by formulating the general migration problem for deduplicated systems as an optimization problem: the objective is to minimize the system’s size while ensuring that the storage load is evenly distributed between the system’s volumes, and that the network traffic required for the migration does not exceed its allocation.

We then constructed and compared three algorithms for generating effective migration plans, each based on a different approach and representing a different tradeoff between computation time and migration efficiency. Our greedy algorithm provides modest space savings, but is appealing thanks to its exceptionally short runtime. Our theoretically optimal algorithm formulates the migration problem as an ILP (integer linear programming) instance. Its migration plans consistently result in smaller and more balanced systems than those of the greedy approach, although its runtime is long and, as a result, the theoretical optimum is not always found. Our clustering algorithm enjoys the best of both worlds: its migration plans are comparable to those generated by the ILP-based algorithm, but its runtime is shorter, sometimes by an order of magnitude.

Biography:

Gala Yadgar is an assistant professor at the Computer Science Department at the Technion—Israel Institute of Technology, where she received her Ph.D in 2012.  She is an associate editor for ACM Transactions on Storage, and served as the co-chair for FAST (2021), HotStorage (2019), and SYSTOR (2018). Her research is directed at methods for improving performance and reliability of large-scale storage systems, focusing on complex hierarchies and enhanced storage interfaces. Her works on caching, content distribution, optimizations for flash-based storage, erasure coding, deduplication, workload characterization, and improved analysis tools were published in FAST, USENIX ATC, ICDCS, ACM ToS, ACM ToCS, HotEdge, and HotStorage.