Smart Batching for Large Language Models

Sequence tokenization is a crucial step during Large Language Model training, fine-tuning, and inference. User prompts and training data are tokenized and zero-padded before being fed to the model in batches. This process allows models to interpret human language by breaking down complex sentences into simple token units that are numerically represented in a token set. However, the process of sequence padding for maintaining batch dimensions can introduce unnecessary overhead if batching is not properly done.

In this project, we introduce Smart Batching, where we dynamically batch sequences in a fine-tuning dataset by their respective lengths. With this method, we aim to minimize the amount of zero padding required during sequence batching, which can result in improved and efficient fine-tuning and inference speeds. We also analyze this method with other commonly used batching practices (Longest Sequence, Random Shuffling) on valuable metrics such as runtime and model accuracy.

Project Title

  • Topics: Large Language Models Fine-Tuning AI Transformers
  • Skills: Python, Pytorch, Large Language Models
  • Difficulty: Moderate
  • Size: Large (350 hours)
  • Mentor: [Daniel Wong]Daniel Wong, [Luanzheng “Lenny” Guo]Luanzheng "Lenny" Guo

Project Tasks and Milestones

  • Implement an open source smart batching framework based on HuggingFace to allow for dynamically grouping sequences of similar token lengths into batches
  • Analyze runtime, padding, and model accuracy with smart batching and other commonly used batching practices
  • Apply smart batching with distributed fine-tuning and observe large language model outputs
Daniel Wong
Associate Professor of Electrical and Computer Engineering

Research interests in Computer Architecture, High Performance Computing, and Warehouse-scale Computing.

Luanzheng "Lenny" Guo
Luanzheng "Lenny" Guo
Computer Scientist, Pacific Northwest National Laboratory

Luanzheng (Lenny) Guo is a computer scientist in Pacific Northwest National Laboratory’s Future Computing Technologies group, working within the research area between scientific computing, data management, large-scale systems (e.g., HPC, CPS, Cloud, Edge, etc.), and machine learning (ML).