Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster

Hi! I’m Martin, and I will be working on Reproducible Performance Benchmarking for Genomics Workflows on HPC Cluster under the mentorship of In Kee Kim. Our work is driven by the scale of computing systems that hosts data commons – we believe that performance characterization of genomics workload should be done rapidly and at the scale similar to production settings. Feel free to check our proposal for more details!

We propose GenScale, a genomics workload benchmarking tool which can achieve both the scale and speed necessary for characterizing performance under large-scale settings. GenScale will be built on top of industrial-grade cluster manager (e.g. Kubernetes), metrics collection & monitoring systems (e.g. Prometheus), and will support comprehensive set of applications used in state-of-art genomics workflows. Initial version developed during this project will include DNA and RNA alignment workflows.

Finally, we believe that open access and reproducible research will greatly accelerate the pace of scientific discovery. We aim to package our artefacts and generated datasets in ways that makes it easiest to replicate, analyze, and build upon. I personally look forward to learn from & contribute to the open source community!

Martin L. Putra
Martin L. Putra
Ph.D. Student, University of Chicago

Martin Putra is a Ph.D. student in the Department of Computer Science at the University of Chicago. His research focuses on designing and building novel systems support for next-generation genomics data processing.