Introducing vision to the fine-tuning API
Developers can customize the model to have stronger image understanding capabilities which enables applications like enhanced visual search functionality, improved object detection for autonomous vehicles or smart cities, and more accurate medical image analysis.
Since we first introduced fine-tuning on GPT-4o, hundreds of thousands of developers have customized our models using text-only datasets to improve performance on specific tasks. However, for many cases, fine-tuning models on text alone doesn’t provide the performance boost expected.
How it works
Vision fine-tuning follows a similar process to fine-tuning with text—developers can prepare their image datasets to follow the proper format(opens in a new window) and then upload that dataset to our platform. They can improve the performance of GPT-4o for vision tasks with as few as 100 images, and drive even higher performance with larger volumes of text and image data.
>>Click here to continue<<
