Introduction
Google's latest multimodal large language model Gemini 2.0 Flash EXP can generate and edit images through continuous dialogue, marking an important step forward in the future trend of LLM and intelligent human-computer interaction; a creator optimized its integration with ComfyUI (based on tatookan's original project), fixed image conversion and API key security issues, allowing users to seamlessly explore Gemini 2.0's graphic capabilities in ComfyUI. More graphic features will be added in the future to further enhance the creative task experience.
This feature performs very well in the e-commerce field, greatly reducing the threshold for using ComfyUI. It does not require the use of Flux Fill+ Redux multi-module migration fixes to achieve one-click acquisition of e-commerce product display images in just ten seconds, and because it is directly called by the API, it greatly reduces the demand for VRAM.
https://github.com/tatookan/comfyui_ssl_gemini_EXP/tree/main
Recommended machine:Large-Pro
Workflow Overview
How to use this workflow
This workflow is divided into two parts. You need to call the Google API first, and then run the workflow after setting up the API
Part 1: Get Google API
Step 1: Visit Google AI Studio
https://aistudio.google.com/apikey?hl=zh-cn
If you cannot access it through the link, please refer to the link provided by the author's Readme in the Github link mentioned in the above introduction
Step 2: Apply for API
Click the boxed area to apply for your exclusive API, which will then be displayed in the display bar below
Part 2: Img2Img
Step 1: Upload reference image
Step 2: Input the Prompt
Supports Chinese and English languages, and Chinese comprehension is better than English
You can use this plug-in to do text2image (requires bypass load image), image editing, watermark removal, line drawing, e-commerce promotional images, image background change.
If you want to keep the character consistent, please state it clearly in the prompt.
Step 3: Add the API just generated
This workflow provides a temporary API for MimicPC designer testing. In order to avoid subsequent failure, it is best to follow the process of Part 1.