This project addresses the critical issue of non-determinism in Retrieval-Augmented Generation (RAG) systems. We aim to develop a suite of tools, benchmarks, and best practices to ensure scientific workflows using Large Language Models are reliable, transparent, and reproducible.