OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS

Hello, I’m Jiajun Mao, a BS/MS student at the University of Chicago studying Computer Science. I will be spending this summer working on the project OpenMLEC: Open-source MLEC implementation with HDFS on top of ZFS under the mentorship of Meng Wang and Anjus George, my proposal.

How to increase data’s durability and reliability while decreasing storage cost have always been interesting topics of research. Erasure coded storage systems in recent years have been seen as strong candidates to replace replications for colder storage tiers. In the paper “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers”, the authors explored using theory and simulation on how a multiple tiered erasure coded system can out-perform systems using single level erasure codes in areas such as encoding throughput and network bandwidth consumed for repair, addressing a few pain points in adopting erasure coded storage systems. I will be implementing the theoretical and simulation result of this paper by building on top of HDFS and ZFS, and benchmarking the system performance.

The project will aim to achieve

  • HDFS understanding the underlying characteristics of ZFS as the filesystem
  • HDFS understanding the failure report from ZFS, and use new and special MLEC repair logic to execute parity repair
  • ZFS will be able to accept repair data from HDFS to repair a suspended pool caused by catastrophic data corruption
Jiajun Mao
Jiajun Mao
OSRE 2024 Participant/Researcher
Meng Wang
Meng Wang
PhD Student, University of Chicago

Meng Wang is a Ph.D. candidate in the Department of Computer Science at the University of Chicago. His research focuses on enhancing the performance and dependability of storage systems in cloud, HPC, and ML environments.

Anjus George
Anjus George
HPC Storage R&D staff, Oak Ridge National Laboratory

Dr. George works with the Technology integration team at Oak Ridge National Laboratory (ORNL) that powers the some of the world’s fastest supercomputers like Summit and Frontier.