AIDRIN Privacy-Centric Enhancements: Backend & UX Upgrades

Harish Balaji Boominathan, Jean Luca Bez, Suren Byna

Jul 25, 2025 Google Summer of Code, GSoC 2025, Open Source, AI Readiness

AIDRIN - building privacy-first AI data pipelines

⏱️ Reading time: 5–6 minutes

Hey everyone,

If you’ve ever wondered what it takes to make AI data pipelines not just smarter, but safer and more transparent, you’re in the right place. The last few weeks working on AIDRIN for GSoC have been a deep dive into the engine room of privacy and backend systems that power the AIDRIN project. My focus has been on building out the core privacy infrastructure and backend features that power AIDRIN’s ability to give users real, actionable insights about their data. It’s been challenging, sometimes messy, but incredibly rewarding to see these changes make a tangible difference.

Having Dr. Jean Luca Bez and Prof. Suren Byna as mentors, along with the support of the entire team, has truly made all the difference. Their guidance, encouragement, and collaborative spirit have been a huge part of this journey, whether I’m brainstorming new ideas or just trying to untangle a tricky bug.

Privacy Metrics: Making Data Safer

A major part of my work has been putting data privacy at the front and center in AIDRIN. I focused on integrating essential privacy metrics like k-anonymity, l-diversity, t-closeness, and more, making sure they’re not just theoretical checkboxes, but real tools that users can interact with and understand. Now, these metrics are fully wired up in the backend and visualized in AIDRIN, so privacy risks are no longer just a vague concern. They are something AI data preparers can actually see and act on. Getting these metrics to work seamlessly with different datasets and ensuring their accuracy took some serious backend engineering, but the payoff has been worth it.

Speeding Things Up (So You Don’t Have To Wait Around)

As AIDRIN started handling bigger datasets, some of the calculations can be time-consuming because data has to be accessed every time a metric is computed. To address this, I added caching for previously computed metrics, like class imbalance and privacy checks, and set up asynchronous execution with Celery and Redis. This should make the app super responsive. Rather than waiting for heavy computations to finish, one can start taking notes about other metrics or explore different parts of the app while their results are loading in the background. It’s a small change, but it helps keep the workflow moving smoothly.

Small Touch Ups That (Hopefully) Make a Big Difference

I also spent time on the details that make the app easier to use. Tooltips now explain what the privacy metrics actually mean, error messages are clearer, and there’s a new cache info page where you can see and clear your cached data. The sensitive attribute dropdown is less confusing now, especially if you’re working with quasi-identifiers. These tweaks might seem minor, but they add up and make the app friendlier for everyone.

Docs, Docs, Docs

I’m a big believer that good documentation is just as important as good code. I updated the docs to cover all the new features, added citations for the privacy metrics, and made the install process a bit more straightforward. Hopefully, this means new users and contributors can get up to speed without too much hassle.

Huge Thanks to My Mentors and the Team

I really want to shine a light on Dr. Bez, Prof. Byna, and the entire AIDRIN team here. Their encouragement, practical advice, and collaborative spirit have been a huge part of my progress. Whether I’m stuck on a bug, brainstorming a new feature, or just need a second opinion, there’s always someone ready to help me think things through. Their experience and support have shaped not just the technical side of my work, but also how I approach problem-solving and teamwork.

What’s Next?

Looking ahead, I’m planning to expand AIDRIN’s support for multimodal datasets and keep refining the privacy and fairness modules. There’s always something new to learn or improve, and I’m excited to keep building. If you’re interested in data quality, privacy, or open-source AI tools, I’d love to connect and swap ideas.

Thanks for reading and for following along with my GSoC journey. I’ll be back soon with more updates!

This is the second post in my 3-part GSoC series with AIDRIN. Stay tuned for the final update.

AIDRIN AI readiness data quality dataset privacy privacy metrics k-anonymity l-diversity t-closeness backend development performance optimization GSoC2025 hpc LBNL Lawrence Berkeley Lab LBL open source osre25 UCB uc University of California Berkeley

AIDRIN Privacy-Centric Enhancements: Backend & UX Upgrades

Privacy Metrics: Making Data Safer

Speeding Things Up (So You Don’t Have To Wait Around)

Small Touch Ups That (Hopefully) Make a Big Difference

Docs, Docs, Docs

Huge Thanks to My Mentors and the Team

What’s Next?

Harish Balaji Boominathan

Master’s student, New York University

Jean Luca Bez

Research Scientist, Lawrence Berkeley National Laboratory

Suren Byna

Professor, The Ohio State University