Smart Batching for Large Language Models

Feb 9, 2025

Sequence tokenization is a crucial step during Large Language Model training, fine-tuning, and inference. User prompts and training data are tokenized and zero-padded before being fed to the model in batches. This process allows models to interpret human language by breaking down complex sentences into simple token units that are numerically represented in a token set. However, the process of sequence padding for maintaining batch dimensions can introduce unnecessary overhead if batching is not properly done.

In this project, we introduce Smart Batching, where we dynamically batch sequences in a fine-tuning dataset by their respective lengths. With this method, we aim to minimize the amount of zero padding required during sequence batching, which can result in improved and efficient fine-tuning and inference speeds. We also analyze this method with other commonly used batching practices (Longest Sequence, Random Shuffling) on valuable metrics such as runtime and model accuracy.

Project Title

Topics: Large Language Models Fine-Tuning AI Transformers
Skills: Python, Pytorch, Large Language Models
Difficulty: Moderate
Size: Large (350 hours)
Mentor: [Daniel Wong]Daniel Wong, [Luanzheng “Lenny” Guo]Luanzheng "Lenny" Guo

Project Tasks and Milestones

Implement an open source smart batching framework based on HuggingFace to allow for dynamically grouping sequences of similar token lengths into batches
Analyze runtime, padding, and model accuracy with smart batching and other commonly used batching practices
Apply smart batching with distributed fine-tuning and observe large language model outputs

osre25 uc

Smart Batching for Large Language Models

Project Title

Project Tasks and Milestones

Daniel Wong

Associate Professor of Electrical and Computer Engineering

Luanzheng "Lenny" Guo

Computer Scientist, Pacific Northwest National Laboratory