Introduction
Google's latest multimodal large language model Gemini 2.0 Flash can generate and edit images through continuous dialogue, marking an important step forward in the future trend of LLM and intelligent human-computer interaction; a creator optimized its integration with ComfyUI (based on CY-CHENYUE's original project), fixed image conversion and API key security issues, allowing users to seamlessly explore Gemini 2.0's graphic capabilities in ComfyUI, and easily combine it with plug-ins such as HD Zoom. More graphic features will be added in the future to further enhance the creative task experience.
https://github.com/CY-CHENYUE/ComfyUI-Gemini-API
Recommended machine:Large-Pro
Workflow Overview
How to use this workflow
This workflow is divided into two parts. You need to call the Google API first, and then run the workflow after setting up the API
Part 1: Get Google API
Step 1: Visit Google AI Studio
https://aistudio.google.com/apikey?hl=zh-cn
If you cannot access it through the link, please refer to the link provided by the author's Readme in the Github link mentioned in the above introduction
Step 2: Apply for API
Click the boxed area to apply for your exclusive API, which will then be displayed in the display bar below
Part 2: Img2Img
Step 1: Upload reference image
Step 2: Input the Prompt
Supports Chinese and English languages, and Chinese comprehension is better than English
You can use this plug-in to do text2image (requires bypass load image), image editing, watermark removal, line drawing, e-commerce promotional images, image background change.
If you want to keep the character consistent, please state it clearly in the prompt.
Step 3: Add the API just generated
This workflow provides a temporary API for MimicPC designer testing. In order to avoid subsequent failure, it is best to follow the process of Part 1.
Step 4: Get Image
Step 5: View the Log
The custom_node author only supports Chinese, and does not support other languages for viewing.