Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional output. You may know them for their productivity. Stunning AI art and highly realistic synthetic imagesbut they have also found success in other applications such as Drug design And Continuous control. The key idea behind diffusion models is to iteratively transform random noise into a sample, such as an image or protein structure. This is usually motivated as a Maximum likelihood estimation problem, where the model is trained to produce patterns that match the training data as closely as possible.
However, most use cases for diffusion models are not directly related to matching training data, but instead with a downstream objective. We don't just want an image that looks like existing images, but an image that has a unique appearance. We don't just want a drug molecule that is physiologically plausible, but one that is as effective as possible. In this post, we show how diffusion models can be trained directly on these downstream targets using reinforcement learning (RL). To do this, we finetune Stable diffusion for a number of purposes, including image compression, human-perceived aesthetic quality, and instant image alignment. The last of these purposes uses feedback. A large model of vision language to improve the model's performance on anomalous signals, showing how Powerful AI models can be used to improve each other. Without a human in the loop.