New 'Paint by Inpaint' Method Adds Objects to Images

A new method detailed in a research paper called “Paint by Inpaint” demonstrates how diffusion models can add objects to images by first learning to remove them. The technique enables sophisticated, chained editing workflows. It highlights a trend toward combining specialized models for iterative and non-destructive creative work.

- The "Paint by Inpaint" paper was authored by researchers from the Weizmann Institute of Science and the Technion - Israel Institute of Technology, including Navve Wasserman, Noam Rotstein, Roy Ganz, and Ron Kimmel. Their method was presented at the CVPR 2025 conference. - The technique relies on a newly created large-scale dataset named PIPE (Paint by InPaint Edit), which contains roughly 1 million image pairs. This dataset was generated by taking images from existing segmentation datasets, using a frozen inpainting model to remove objects, and then training a new model to learn the reverse process. - To generate varied, natural-language instructions for adding objects, the researchers chained multiple models together. A Vision-Language Model (VLM) extracts detailed descriptions of the removed objects, and a Large Language Model (LLM) then converts these details into text prompts for the editing model. - This approach contrasts with traditional inpainting, which fills a user-defined mask, by learning to add an object based only on a text prompt, generating the object without requiring a pre-drawn area. It also differs from generative fill tools by being trained specifically on pairs of images with and without objects, ensuring high consistency between the source and the target. - The chaining of multiple specialized AI systems (inpainting models, VLMs, LLMs) exemplifies a growing trend in creative workflows where users orchestrate a series of AI tools to move from high-level concepts to refined production assets. - This method contributes to the ongoing debate on creative agency and authorship in AI-assisted art. By automating not just object generation but also its seamless integration into a scene, the AI performs more of the substantive work, blurring the lines between the user providing a prompt and the algorithm's creative contribution. - The process of training a model by first having it learn to *remove* a concept has precedents in AI safety and moderation research, where models are fine-tuned to "erase" concepts like specific artistic styles or NSFW content. - Such non-destructive, instruction-based editing tools are becoming more common in professional creative software, where AI functions as a collaborative partner, allowing for iterative feedback and refinement rather than replacing human creative judgment.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.