New 'Paint by Inpaint' Method Adds Objects to Images
A new method detailed in a research paper called “Paint by Inpaint” demonstrates how diffusion models can add objects to images by first learning to remove them. The technique enables sophisticated, chained editing workflows. It highlights a trend toward combining specialized models for iterative and non-destructive creative work.
- The "Paint by Inpaint" paper was authored by researchers from the Weizmann Institute of Science and the Technion - Israel Institute of Technology, including Navve Wasserman, Noam Rotstein, Roy Ganz, and Ron Kimmel. Their method was presented at the CVPR 2025 conference. - The technique relies on a newly created large-scale dataset named PIPE (Paint by InPaint Edit), which contains roughly 1 million image pairs. This dataset was generated by taking images from existing segmentation datasets, using a frozen inpainting model to remove objects, and then training a new model to learn the reverse process. - To generate varied, natural-language instructions for adding objects, the researchers chained multiple models together. A Vision-Language Model (VLM) extracts detailed descriptions of the removed objects, and a Large Language Model (LLM) then converts these details into text prompts for the editing model. - This approach contrasts with traditional inpainting, which fills a user-defined mask, by learning to add an object based only on a text prompt, generating the object without requiring a pre-drawn area. It also differs from generative fill tools by being trained specifically on pairs of images with and without objects, ensuring high consistency between the source and the target. - The chaining of multiple specialized AI systems (inpainting models, VLMs, LLMs) exemplifies a growing trend in creative workflows where users orchestrate a series of AI tools to move from high-level concepts to refined production assets. - This method contributes to the ongoing debate on creative agency and authorship in AI-assisted art. By automating not just object generation but also its seamless integration into a scene, the AI performs more of the substantive work, blurring the lines between the user providing a prompt and the algorithm's creative contribution. - The process of training a model by first having it learn to *remove* a concept has precedents in AI safety and moderation research, where models are fine-tuned to "erase" concepts like specific artistic styles or NSFW content. - Such non-destructive, instruction-based editing tools are becoming more common in professional creative software, where AI functions as a collaborative partner, allowing for iterative feedback and refinement rather than replacing human creative judgment.