OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide inference code so that everyone can explore more functionalities of OmniGen.
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.
Minimum of Large Pro+ or Ultra recommended - Ultra works out cheaper.
1st Gen Text to Image - 200 secs on Large Pro +
Next Genâs about 90 seconds each
Next Genâs Img2Img after running Text to Image - different workflow - 190 /185 seconds
use the right workflow for text or img
Refer to the Img number in the prompt - you DO NOT need all the brackets as used in the APP version just the img ref
Made from - https://github.com/AIFSH/OmniGen-ComfyUI
Example
In the prompt text, you only need to use image_1
(for image 1), instead of needing to use <img><|image_1|></img>
.
text | image_1 | image_2 | image_3 | out_img |
---|---|---|---|---|
A curly-haired man in a red shirt is drinking tea. | -- | -- | -- | |
The woman in image_1 waves her hand happily in the crowd | -- | -- |
Tips
text2img Example Prompt: "A white cat resting on a picnic table."
Image Editing Example Prompt: "image_1 The umbrella should be red."
Segmentation Example Prompt: "Find lamp in the picture image_1 and color them blue."
Try-On Example Prompt:"image_1 wears image_2."
Pose Example Prompt: "Detect the skeleton of human in image_1."