Final Report: Deriving Realistic Performance Benchmarks for Python Interpreters

Hi, I am Mrigank. As a Summer of Reproducibility 2024 fellow, I have been working on deriving realistic performance benchmarks for Python interpreters with Ben Greenman from the University of Utah. In particular, we want to benchmark Meta’s Static Python interpreter (which is a part of their Cinder project) and compare its performance with CPython on different levels of typing. In this post, I will share updates on my work since my last update. This post forms my final report for the Summer of Reproducibility 2024.

Since Last Time: Typing Django Files

Based on the profiling results from load testing a Wagtail blog site, I identified three modules in Django that were performance bottlenecks and added shallow types to them. These are available on our GitHub repository.

  1. django.db.backends.sqlite3._functions
  2. django.utils.functional
  3. django.views.debug

I also wrote a script to mix untyped, shallow-typed, and advanced-typed versions of a Python module and create a series of such gradually typed versions.

Summary of Experience and Contributions

  1. I tried to set up different versions of Zulip to make them work with Static Python. My setup scripts are available in our repository. Unfortunately, Zulip’s Zerver did not run with Static Python due to incompatibility of some Django modules. A few non-Django modules were also initially throwing errors when run with Static Python due to a bug in Cinder โ€“ but I was able to get around with a hack (which I have described in the linked GitHub issue I opened on Cinder’s repository).

  2. I created a locust-version of the small Django-related benchmarks available in pyperperformance and skybison. This helped me confirm that Django is by itself compatible with Static Python, and helped me get started with Locust. This too is available in our repository.

  3. As described in the midterm report, I created a complete pipeline with Locust to simulate real-world load on a Wagtail blog site. The instructions and scripts for running these load tests as well as profiling the Django codebase are available (like everything else!) in our repository.

  4. We added shallow types to the three Django modules mentioned above, and I created scripts to mix untyped, shallow-typed, and advanced-typed versions of a Python module to create a series of gradually typed versions to be tested for performance. We found that advanced-typed code may often be structurally incompatible with shallow-typed code and are looking for a solution for this. We are tracking some examples of this in a GitHub issue.

Going Forward

I had a great time exploring Static Python, typing in Python, load testing, and all other aspects of this project. I was also fortunate to have a helpful mentor along with other amazing team members in the group. During this project, we hit several roadblocks like the challenges in setting up real-world applications with Static Python and the difficulty in adding advanced types โ€“ but are managing to work around them. I will be continuing to work on this project until we have a complete set of benchmarks and a comprehensive report on the performance of Static Python.

Our work will continue to be open-sourced and available on our GitHub repository for anyone interested in following along or contributing.

Mrigank Pawagi
Mrigank Pawagi
Mathematics and Computing Student at the Indian Institute of Science

Mrigank is pursuing mathematics and computing at the Indian Institute of Science. He is interested in software engineering, programming languages, and applications of LLMs for code generation.