Existing tools, like Stable Diffusion and Midjourney, can only offer very limited control over the image generation by allowing users to use text prompts, which is because “an image is worth thousands of words”. We design a framework called ControlNOLA to allow users to have finer-grained control over the content of generated product images.
Pre-trained LVMs cannot directly generate product images if the specific products are not in their knowledge base. To address this issue, we design algorithms to customize LVMs for specific products and scenes. This customization process typically takes only a few hours, after which the customized LVMs can generate product visuals in seconds.