The fine-tuning of pre-trained models has become ubiquitous in generative AI, computer vision, and robotics. Although much attention has been paid to improving the efficiency of fine-tuning model, there has been less scholarship around fine-tuning specifically for improved model performance. To remedy this gap, we present PROFIT, one of the first optimizers designed to incrementally fine-tune converged models on new tasks and/or datasets. Unlike traditional optimizers such as SGD or Adam, which make minimal assumptions due to random initializations, PROFIT takes the properties of a converged model into account explicitly to regularize the optimization process. Employing a temporal gradient-orthogonalization process, PROFIT outperforms fine-tuning methods in various tasks, from image classification to multimodal language model training to large-scale motion prediction. Moreover, PROFIT is encapsulated as a modular optimizer, which makes it easy to integrate directly into any training pipeline with minimal engineering effort.
As Machine Learning shifts to fine-tuning, our tools must evolve too. PROFIT is a critical step towards a new class of optimizers designed for adapting the most powerful foundational models. Future works are prioritize to implement PROFIT memory efficient to handle large foundational models.
@inproceedings{chakravarthy2025profit,
author = {Anirudh S Chakravarthy and Shuai Kyle Zheng and Xin Huang and Sachithra Hemachandra and Xiao Zhang and Yuning Chai and Zhao Chen},
title = {{PROFIT: A Specialized Optimizer for Deep Fine Tuning}},
booktitle = {NeurIPS},
year = {2025},
url = {https://arxiv.org/abs/2412.01930}
}
We thank Carl Vondrick, Greg Meyer, Eric Wolff, Siddhartha Srinivasa, Hongge Chen, David Hayden, Yifeng Zeng, Navaneeth Bodla, Ajaya h s Rao, Ankit Raj, Annie Liu, Gweltaz Lever, Raghid Mardini, and Pratik Agarwal for their helpful feedback and insightful discussions.