🧪 Open Source Research Experience
Open Source Incubator Fellowship
🎓 Open Source Education
FlashNet: Towards Reproducible Data Science for Storage System
The Data Storage Research Vision 2025, organized in an NSF workshop, calls for more “AI for storage” research. However, performing ML-for-storage research can be a daunting task for new storage researchers.
Haryadi S. Gunawi
Reproducible Evaluation of Multi-level Erasure Coding
Massive storage systems rely heavily on erasure coding (EC) to protect data from drive failures and provide data durability. Existing storage systems mostly adopt single-level erasure coding (SLEC) to protect data, either performing EC at the network level or performing EC at the local level.
CephFS is a distributed file system on top of Ceph. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads.
HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. The HDF5 technology suite includes: A versatile data model that can represent very complex data objects and a wide variety of metadata.