🏠Home
đź” About
đź“ş Programs
Overview
🧪 Open Source Research Experience
🧪 Summer of Reproducibility
🪺 Open Source Incubator Fellowship
🎓 Open Source Education
đź“š Resources
đź“ť Blog
🎪 Events
reproducibility
Applying MLOps to overcome reproducibility barriers in machine learning research
Topics: machine learning, MLOps, reproducibility Skills: Python, machine learning, GitOps, systems, Linux, data, Docker Difficulty: Hard Size: Large (350 hours) Mentors: Fraida Fund and Mohamed Saeed Project Idea Description
Fraida Fund
FairFace
FairFace: Reproducible Bias Evaluation in Facial AI Models via Controlled Skin Tone Manipulation Bias in facial AI models remains a persistent issue, particularly concerning skin tone disparities. Many studies report that AI models perform differently on lighter vs.
James Davis
Enhancing Reproducibility in Distributed AI Training: Leveraging Checkpointing and Metadata Analytics
Reproducibility in distributed AI training is a crucial challenge due to several sources of uncertainty, including stragglers, data variability, and inherent randomness. Stragglers—slower processing nodes in a distributed system—can introduce timing discrepancies that affect the synchronization of model updates, leading to inconsistent states across training runs.
Luanzheng "Lenny" Guo
Enhancing Reproducibility in RAG Frameworks for Scientific Workflows
Retrieval-Augmented Generation (RAG) frameworks, which merge the capabilities of retrieval systems and generative models, significantly enhance the relevance and accuracy of responses produced by large language models (LLMs). These frameworks retrieve relevant documents from a large corpus and use these documents to inform the generative process, thereby improving the contextuality and precision of the generated content.
Luanzheng "Lenny" Guo
Exploration of I/O Reproducibility with HDF5
Parallel I/O is a critical component in high-performance computing (HPC), allowing multiple processes to read and write data concurrently from a shared storage system. HDF5—a widely adopted data model and library for managing complex scientific data—supports parallel I/O but introduces challenges in I/O reproducibility, where repeated executions do not always produce identical results.
Luanzheng "Lenny" Guo
,
Wei Zhang
Assessing and Enhancing CC-Snapshot for Reproducible Experiment Environments
Overview A critical challenge in computer systems research reproducibility is establishing and sharing experimental environments. While open testbeds like Chameleon provide access to hardware resources, researchers still face significant barriers when attempting to recreate the precise software configurations, dependencies, and system states needed for reproducible experiments.
Mark Powers
,
Michael Sherman
Chameleon Trovi Support for Complex Experiment Appliances
Overview The discoverability and accessibility of research artifacts remains a significant barrier to reproducibility in computer science research. While digital libraries index research papers, they rarely provide direct access to the artifacts needed to reproduce experiments, especially complex multi-node systems.
Kate Keahey
,
Mark Powers
Contextualization – Extending Chameleon’s Orchestration for One-Click Experiment Deployment
Overview Reproducibility in computer systems research is often hindered by the quality and completeness of artifact descriptions and the complexity of establishing experimental environments. When experiments involve multiple interconnected components, researchers struggle with hardcoded configurations, inadequate documentation of setup processes, and missing validation steps that would verify correct environment establishment.
Paul Marshall
MPI Appliance for HPC Research on Chameleon
Overview Message Passing Interface (MPI) is the dominant programming model for high-performance computing (HPC), enabling applications to scale efficiently across thousands of processing cores. In reproducibility initiatives for HPC research, MPI implementations are critical as they manage the complex communications that underpin parallel scientific applications.
Ken Raffenetti
Smart Environments – An AI System for Reproducible Custom Computing Environments
Overview The complexity of environment setup and the expertise required to configure specialized software stacks can often hinder efforts to reproduce important scientific achievements in HPC and systems studies. Researchers often struggle with incomplete or ambiguous artifact descriptions that make assumptions about “common knowledge” that is actually specific domain expertise.
Paul Marshall
»
Cite
×