Developing Trustworthy Large Language Models

Towards building Responsible AI

Hi! Thanks for stopping by.

In this first blog post of a series of three, Iโ€™d like to introduce myself, my mentor, and my project.

My name is Nikhil. I am an ML researcher who works at the intersection of NLP, ML, and HCI. I previously worked as a Machine Learning Engineer II at VMware and spent some wonderful summers interning with ML teams at NVIDIA and IIT Bombay. I also recently graduated from the University of Southern California (USC) with honors in Computer Science and a master’s thesis.

This year at Google Summer of Code (GSoC 24), I will be working on developing trustworthy large language models. Iโ€™m very grateful to be mentored by Leilani H. Gilpin at the AIEA lab, UC Santa Cruz. I truly admire the flexibility and ownership she allows me in pursuing my ideas independently within this project. Please feel free to peruse my accepted GSoC proposal here.

Project: My project has a tangible outcome: An open-source, end-to-end, full-stack web app with a hybrid trustworthy LLM in the backend.

This open-source web app will be a lightweight tool that not only has the ability to take diverse textual prompts and connect with several LLMs and a database but also the capability to gather qualitative and quantitative user feedback. Users will be able to see how this feedback affects the LLMs’ responses and impacts its reasoning and explanations (xAI). The tool will be thoroughly tested to ensure that the unit tests are passing and there is complete code coverage.

At the moment, we are investigating LLMs and making them more trustworthy in constraint satisfaction tasks like logical reasoning and misinformation detection tasks. However, our work has applicability in other areas of Responsible AI, such as Social Norms (toxicity detection and cultural insensitivity), Reliability (misinformation, hallucination, and inconsistency), Explainability & Reasoning (lack of interpretability, limited logical, and causal reasoning), Safety (privacy violation and violence), and Robustness (prompt attacks and distribution shifts).

Impact:

  • Responsible AI research teams across industry and academia can use this as a boilerplate for their user study projects.
  • Diverse PhD students and academic researchers looking to study LLM and user interaction research will find this useful.
  • LLM alignment researchers and practitioners can find this resourceful as user feedback affects the inherent rewards model of the internal LLMs.
  • Explainable AI (xAI) researchers can find value in the explanations that this tool generates, which reveal interpretable insights into how modern LLMs think and use their memory. These are just a few use cases; however, there are several others that we look forward to describing in the upcoming posts.

This was my first blog in the series of three for the UC OSPO. Stay tuned for the upcoming blogs, which will detail my progress at the halfway mark and the final one concluding my work.

If you find this work interesting and would love to share your thoughts, I am happy to chat! :) Feel free to connect on LinkedIn and mention that you are reaching out from this blog post.

It is great to meet the UC OSPO community, and thanks for reading. Bye for now.

Nikhil Wani
Nikhil Wani
Graduate, University of Southern California (USC)

Nikhil Wani is an ML researcher who works at the intersection of NLP, ML, and HCI.

Leilani H. Gilpin
Leilani H. Gilpin
Assistant Professor, UC Santa Cruz

Leilani H. Gilpin is an Assistant Professor in Computer Science and Engineering and an affiliate of the Science & Justice Research Center at UC Santa Cruz. I am part of the AI group @ UCSC and I lead the AI Explainability and Accountability (AIEA) Lab.