Comfyui MCP – Gemini 2.5 Pro
Okay, let’s create a Python script using Flask to act as a server that interacts with the ComfyUI API. This script will:
- Run an HTTP Server: Listen for incoming requests (using Flask).
- Define an Endpoint (
/generate
): Accept POST requests with JSON data containing aprompt
(text) and optionally animage
(base64 encoded). - Interact with ComfyUI API:
- Upload the input image (if provided).
- Load a predefined ComfyUI workflow (in API JSON format).
- Modify the workflow to include the user’s prompt and the uploaded image filename.
- Queue the prompt for execution.
- Monitor the execution progress via WebSocket.
- Retrieve the resulting image/video.
- Return the Output: Send the generated image/video back to the client.
Prerequisites:
- ComfyUI Running: You need a ComfyUI instance running and accessible. Make sure it’s started with the
--enable-cors
flag if your script runs on a different origin, although for server-to-server communication from the same machine or network, it might not be strictly necessary, but it’s good practice. Ensure the API is accessible (usually athttp://127.0.0.1:8188
). - Python Environment: Python 3.6+ installed.
- Required Libraries: Install them using pip:
bash pip install Flask requests Pillow websocket-client
- ComfyUI Workflow (API Format):
- Create a workflow in the ComfyUI interface that takes a text prompt and optionally an image input (e.g., using
LoadImage
) and produces an output (e.g., usingSaveImage
or a video node). - Crucially: Click “Save (API Format)” in ComfyUI and save this JSON file (e.g.,
workflow_api.json
). - Identify Node IDs/Titles: You need to know which nodes in your workflow correspond to:
- The positive text prompt input (e.g., a
CLIPTextEncode
node). Let’s assume its title is “Positive Prompt”. - The input image loader (e.g., a
LoadImage
node). Let’s assume its title is “Input Image”. - The final output node (e.g., a
SaveImage
orSaveAnimatedWEBP
node). Let’s assume its title is “Final Output”. You’ll need itsclass_type
too.
- The positive text prompt input (e.g., a
- Create a workflow in the ComfyUI interface that takes a text prompt and optionally an image input (e.g., using
Python Script (mcp_server.py
):
import websocket # websocket-client library
import uuid
import json
import urllib.request
import urllib.parse
import random
import os
import time
from PIL import Image
import io
import requests
from flask import Flask, request, jsonify, send_file
# --- Configuration ---
COMFYUI_ADDRESS = "127.0.0.1:8188" # Replace with your ComfyUI address if different
SERVER_ADDRESS = "0.0.0.0" # Listen on all network interfaces
SERVER_PORT = 9090 # Port for this MCP server
CLIENT_ID = str(uuid.uuid4()) # Unique ID for this client
WORKSPACE_DIR = "mcp_workspace" # Directory to store temp files and outputs
WORKFLOW_API_JSON = "workflow_api.json" # Path to your ComfyUI workflow in API JSON format
# Node titles/types in your workflow_api.json to modify (ADJUST THESE)
NODE_TITLE_POSITIVE_PROMPT = "Positive Prompt" # Title of the node receiving the positive text prompt
NODE_TITLE_LOAD_IMAGE = "Input Image" # Title of the node loading the input image (if used)
NODE_TITLE_NEGATIVE_PROMPT = "Negative Prompt" # Optional: Title for negative prompt
# NODE_TITLE_OUTPUT = "Final Output" # Title of the final output node (useful for finding output)
OUTPUT_NODE_CLASS_TYPE = "SaveImage" # Class type of the *final* output node (e.g., SaveImage, SaveAnimatedWEBP, etc.)
# --- ComfyUI API Interaction Functions ---
def queue_prompt(prompt_workflow, client_id, comfyui_address):
"""Sends the workflow to ComfyUI for queuing."""
p = {"prompt": prompt_workflow, "client_id": client_id}
data = json.dumps(p).encode('utf-8')
req = urllib.request.Request(f"http://{comfyui_address}/prompt", data=data)
try:
response = urllib.request.urlopen(req)
return json.loads(response.read())
except urllib.error.URLError as e:
print(f"Error queuing prompt: {e}")
print(f"Response body (if any): {e.read().decode() if hasattr(e, 'read') else 'N/A'}")
return None
except Exception as e:
print(f"An unexpected error occurred during queueing: {e}")
return None
def get_image(filename, subfolder, folder_type, comfyui_address):
"""Fetches an image from ComfyUI's /view endpoint."""
data = {"filename": filename, "subfolder": subfolder, "type": folder_type}
url_values = urllib.parse.urlencode(data)
url = f"http://{comfyui_address}/view?{url_values}"
print(f"Fetching image from: {url}")
try:
with urllib.request.urlopen(url) as response:
return response.read()
except Exception as e:
print(f"Error fetching image {filename}: {e}")
return None
def get_history(prompt_id, comfyui_address):
"""Retrieves the execution history for a given prompt ID."""
try:
with urllib.request.urlopen(f"http://{comfyui_address}/history/{prompt_id}") as response:
return json.loads(response.read())
except Exception as e:
print(f"Error getting history for {prompt_id}: {e}")
return None
def upload_image(image_data, comfyui_address, filename_prefix="input_image"):
"""Uploads an image to ComfyUI's /upload/image endpoint."""
try:
# Use PIL to determine format and potentially convert if needed
img = Image.open(io.BytesIO(image_data))
image_format = img.format.lower() if img.format else 'png' # Default to png if format unknown
# Ensure filename has a valid extension
filename = f"{filename_prefix}_{uuid.uuid4()}.{image_format}"
# Prepare multipart/form-data
files = {'image': (filename, image_data, f'image/{image_format}')}
data = {'overwrite': 'true'} # Overwrite if filename exists (UUID makes this unlikely)
response = requests.post(f"http://{comfyui_address}/upload/image", files=files, data=data)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
upload_data = response.json()
print(f"Image uploaded: {upload_data}")
# Expected response format: {"name": "input_image_uuid.png", "subfolder": "", "type": "input"}
return upload_data['name'], upload_data.get('subfolder', ''), upload_data.get('type', 'input')
except requests.exceptions.RequestException as e:
print(f"Error uploading image via requests: {e}")
if e.response is not None:
print(f"Response status: {e.response.status_code}")
print(f"Response body: {e.response.text}")
return None, None, None
except Exception as e:
print(f"An unexpected error occurred during image upload: {e}")
return None, None, None
def find_node_id_by_title(workflow, title):
"""Finds the node ID in the workflow based on its title."""
for node_id, node_data in workflow.items():
if node_data.get("_meta", {}).get("title") == title:
return node_id
return None
def find_node_id_by_class_type(workflow, class_type):
"""Finds the node ID in the workflow based on its class type."""
for node_id, node_data in workflow.items():
if node_data.get("class_type") == class_type:
return node_id
return None
def get_final_outputs(prompt_id, comfyui_address, output_node_class_type):
"""
Gets the final output image/video data by monitoring WebSocket
or falling back to history if WebSocket fails.
"""
output_data = None
start_time = time.time()
ws_failed = False
# --- WebSocket Monitoring ---
ws = websocket.WebSocket()
ws_url = f"ws://{comfyui_address}/ws?clientId={CLIENT_ID}"
print(f"Connecting to WebSocket: {ws_url}")
try:
ws.connect(ws_url)
print("WebSocket connected.")
while True:
out = ws.recv()
if isinstance(out, str):
message = json.loads(out)
# print(f"WS Received: {message}") # Debug: print all messages
if message['type'] == 'status':
status_data = message['data']['status']
execinfo = status_data.get('execinfo')
if execinfo and execinfo.get('queue_remaining') is not None:
print(f"Queue remaining: {execinfo['queue_remaining']}")
elif message['type'] == 'progress':
progress_data = message['data']
print(f"Progress: {progress_data['value']}/{progress_data['max']}")
elif message['type'] == 'executing':
data = message['data']
if data['node'] is None and data['prompt_id'] == prompt_id:
print(f"Execution finished for prompt {prompt_id}")
break # Execution is finished
elif data['prompt_id'] == prompt_id:
print(f"Executing node: {data['node']}")
elif message['type'] == 'executed':
data = message['data']
if data['prompt_id'] == prompt_id and 'outputs' in data:
# Check if this is the output from the final node
node_info = get_history(prompt_id, comfyui_address).get(prompt_id, {}).get('outputs', {}).get(data['node'], {})
if node_info.get('class_type') == output_node_class_type:
print(f"Detected output from final node {data['node']} ({output_node_class_type}): {data['outputs']}")
output_data = data['outputs'] # Store the last relevant output
# Don't break here yet, wait for 'executing' with node=None
else:
# Handle binary messages if necessary (less common for status)
print("Received binary message (unhandled)")
# If the final output is expected as binary via WS, handle it here
# Timeout check
if time.time() - start_time > 120: # 2-minute timeout
print("WebSocket timeout waiting for execution finish.")
ws_failed = True
break
except websocket.WebSocketException as e:
print(f"WebSocket Error: {e}")
ws_failed = True
except Exception as e:
print(f"Error processing WebSocket message: {e}")
ws_failed = True
finally:
if ws.connected:
ws.close()
print("WebSocket closed.")
# --- Fallback to History API if WebSocket failed or didn't find output ---
if output_data is None:
print("WebSocket did not yield output or failed, trying History API...")
history = get_history(prompt_id, comfyui_address)
if not history or prompt_id not in history:
print(f"Error: Could not retrieve history for prompt {prompt_id}")
return None
prompt_history = history[prompt_id]
if 'outputs' not in prompt_history:
print(f"Error: 'outputs' not found in history for prompt {prompt_id}")
return None
# Find the output node in the history
for node_id, node_output in prompt_history['outputs'].items():
# Check if this node's class type matches the expected output type
if node_output.get('class_type') == output_node_class_type:
print(f"Found output in history from node {node_id} ({output_node_class_type}): {node_output}")
output_data = node_output # Use the first match found in history
break # Assuming one main output node of this type
if output_data is None:
print(f"Error: Output node with class type '{output_node_class_type}' not found in history outputs.")
print(f"Available history outputs: {prompt_history['outputs']}")
return None
# --- Process the identified outputs ---
results = []
if output_data:
# Output data structure varies (e.g., {'images': [...]}, {'gifs': [...]})
key_found = None
for key in ['images', 'gifs', 'videos']: # Add other possible keys if needed
if key in output_data:
key_found = key
break
if not key_found:
print(f"Error: Could not find expected output key (images, gifs, videos) in node output: {output_data}")
return None
for output_item in output_data[key_found]:
image_data = get_image(output_item['filename'], output_item.get('subfolder',''), output_item['type'], comfyui_address)
if image_data:
results.append({
"data": image_data,
"filename": output_item['filename'],
"content_type": f"image/{output_item['format']}" if 'format' in output_item else 'application/octet-stream' # Adjust mime type if needed
})
else:
print(f"Warning: Failed to retrieve data for output item {output_item['filename']}")
return results
# --- Flask App ---
app = Flask(__name__)
@app.route('/generate', methods=['POST'])
def generate():
if not request.is_json:
return jsonify({"error": "Request must be JSON"}), 400
data = request.get_json()
prompt_text = data.get('prompt')
negative_prompt_text = data.get('negative_prompt', '') # Optional negative prompt
base64_image = data.get('image') # Optional base64 encoded image
if not prompt_text:
return jsonify({"error": "Missing 'prompt' in request data"}), 400
# --- 1. Load Workflow ---
try:
with open(WORKFLOW_API_JSON, 'r') as f:
prompt_workflow = json.load(f)
except FileNotFoundError:
return jsonify({"error": f"Workflow file '{WORKFLOW_API_JSON}' not found."}), 500
except json.JSONDecodeError:
return jsonify({"error": f"Invalid JSON in workflow file '{WORKFLOW_API_JSON}'."}), 500
except Exception as e:
return jsonify({"error": f"Error loading workflow: {e}"}), 500
# --- 2. Handle Image Upload (if provided) ---
uploaded_filename = None
if base64_image:
try:
import base64
image_data = base64.b64decode(base64_image)
filename, subfolder, img_type = upload_image(image_data, COMFYUI_ADDRESS)
if not filename:
return jsonify({"error": "Failed to upload image to ComfyUI"}), 500
uploaded_filename = filename
print(f"Uploaded image filename: {uploaded_filename}")
except base64.binascii.Error:
return jsonify({"error": "Invalid base64 image data"}), 400
except Exception as e:
return jsonify({"error": f"Error processing image upload: {e}"}), 500
# --- 3. Modify Workflow ---
try:
# Find Positive Prompt Node
pos_prompt_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_POSITIVE_PROMPT)
if not pos_prompt_node_id:
print(f"Warning: Node with title '{NODE_TITLE_POSITIVE_PROMPT}' not found in workflow.")
# If not found by title, try finding a common text input node type
possible_types = ["CLIPTextEncode", "CLIPTextEncodeSDXL"]
for node_id, node_data in prompt_workflow.items():
if node_data["class_type"] in possible_types and "Positive" in node_data.get("_meta", {}).get("title", ""):
pos_prompt_node_id = node_id
print(f"Found potential positive prompt node by type/partial title: {node_id}")
break
if not pos_prompt_node_id:
return jsonify({"error": f"Could not find Positive Prompt node ('{NODE_TITLE_POSITIVE_PROMPT}') in workflow."}), 500
# Update Positive Prompt Text
# The prompt text is usually within inputs -> text
if 'text' in prompt_workflow[pos_prompt_node_id]['inputs']:
prompt_workflow[pos_prompt_node_id]['inputs']['text'] = prompt_text
# Sometimes it might be in widgets_values (older ComfyUI versions?)
elif 'widgets_values' in prompt_workflow[pos_prompt_node_id] and isinstance(prompt_workflow[pos_prompt_node_id]['widgets_values'], list):
# Find the widget by name (usually 'text' or similar)
widget_index = next((i for i, w_name in enumerate(prompt_workflow[pos_prompt_node_id].get('widgets_info', [])) if w_name == 'text'), -1)
if widget_index != -1 and widget_index < len(prompt_workflow[pos_prompt_node_id]['widgets_values']):
prompt_workflow[pos_prompt_node_id]['widgets_values'][widget_index] = prompt_text
else:
print(f"Warning: Could not find 'text' widget input for node {pos_prompt_node_id}. Prompt may not be set.")
else:
print(f"Warning: Could not find 'text' input or 'widgets_values' for node {pos_prompt_node_id}. Prompt may not be set.")
print(f"Updated Positive Prompt Node '{pos_prompt_node_id}' with text: '{prompt_text[:50]}...'")
# Find and Update Negative Prompt Node (Optional)
if negative_prompt_text and NODE_TITLE_NEGATIVE_PROMPT:
neg_prompt_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_NEGATIVE_PROMPT)
if neg_prompt_node_id:
if 'text' in prompt_workflow[neg_prompt_node_id]['inputs']:
prompt_workflow[neg_prompt_node_id]['inputs']['text'] = negative_prompt_text
print(f"Updated Negative Prompt Node '{neg_prompt_node_id}' with text: '{negative_prompt_text[:50]}...'")
else:
print(f"Warning: Could not find 'text' input for negative prompt node {neg_prompt_node_id}.")
else:
print(f"Warning: Negative prompt provided, but node '{NODE_TITLE_NEGATIVE_PROMPT}' not found.")
# Find Load Image Node and Update Filename (if image was uploaded)
if uploaded_filename:
load_image_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_LOAD_IMAGE)
if not load_image_node_id:
# Fallback: Try finding by class type if title doesn't match
load_image_node_id = find_node_id_by_class_type(prompt_workflow, "LoadImage")
if load_image_node_id:
print(f"Found LoadImage node by class type: {load_image_node_id}")
else:
return jsonify({"error": f"Input image provided, but Load Image node ('{NODE_TITLE_LOAD_IMAGE}' or type 'LoadImage') not found in workflow."}), 500
# Update the 'image' input field
if 'image' in prompt_workflow[load_image_node_id]['inputs']:
prompt_workflow[load_image_node_id]['inputs']['image'] = uploaded_filename
print(f"Updated Load Image Node '{load_image_node_id}' with filename: {uploaded_filename}")
else:
return jsonify({"error": f"Could not find 'image' input field for Load Image node '{load_image_node_id}'."}), 500
except Exception as e:
print(f"Error modifying workflow: {e}")
import traceback
traceback.print_exc()
return jsonify({"error": f"Internal server error during workflow modification: {e}"}), 500
# --- 4. Queue Prompt ---
print("\nModified Workflow Snippet (Prompt Nodes):")
if pos_prompt_node_id in prompt_workflow: print(f" {pos_prompt_node_id}: {prompt_workflow[pos_prompt_node_id]['inputs']}")
if 'neg_prompt_node_id' in locals() and neg_prompt_node_id in prompt_workflow: print(f" {neg_prompt_node_id}: {prompt_workflow[neg_prompt_node_id]['inputs']}")
if uploaded_filename and load_image_node_id in prompt_workflow: print(f" {load_image_node_id}: {prompt_workflow[load_image_node_id]['inputs']}")
print("\nQueuing prompt...")
queued_data = queue_prompt(prompt_workflow, CLIENT_ID, COMFYUI_ADDRESS)
if not queued_data or 'prompt_id' not in queued_data:
return jsonify({"error": "Failed to queue prompt in ComfyUI"}), 500
prompt_id = queued_data['prompt_id']
print(f"Prompt queued successfully. Prompt ID: {prompt_id}")
# --- 5. Get Output ---
print(f"Waiting for output for prompt {prompt_id}...")
output_results = get_final_outputs(prompt_id, COMFYUI_ADDRESS, OUTPUT_NODE_CLASS_TYPE)
if not output_results:
return jsonify({"error": "Failed to retrieve output from ComfyUI execution"}), 500
# --- 6. Return Output ---
if len(output_results) == 1:
# Return single image/video directly
result = output_results[0]
# Clean up temp workspace file (optional)
# local_filepath = os.path.join(WORKSPACE_DIR, result["filename"])
# if os.path.exists(local_filepath):
# os.remove(local_filepath)
return send_file(
io.BytesIO(result["data"]),
mimetype=result["content_type"],
as_attachment=False, # Display inline if possible
download_name=result["filename"]
)
else:
# Handle multiple outputs (e.g., return JSON with list of base64 or links?)
# For simplicity, let's just return the first one for now
print(f"Warning: Workflow produced multiple outputs ({len(output_results)}). Returning the first one.")
result = output_results[0]
return send_file(
io.BytesIO(result["data"]),
mimetype=result["content_type"],
as_attachment=False,
download_name=result["filename"]
)
# --- Main Execution ---
if __name__ == '__main__':
# Create workspace directory if it doesn't exist
if not os.path.exists(WORKSPACE_DIR):
os.makedirs(WORKSPACE_DIR)
print(f"MCP Server starting...")
print(f" - ComfyUI API: http://{COMFYUI_ADDRESS}")
print(f" - Listening on: http://{SERVER_ADDRESS}:{SERVER_PORT}")
print(f" - Using Workflow: {WORKFLOW_API_JSON}")
print(f" - Positive Prompt Node Title: '{NODE_TITLE_POSITIVE_PROMPT}'")
print(f" - Negative Prompt Node Title: '{NODE_TITLE_NEGATIVE_PROMPT}' (Optional)")
print(f" - Load Image Node Title: '{NODE_TITLE_LOAD_IMAGE}' (Optional)")
print(f" - Final Output Node Type: '{OUTPUT_NODE_CLASS_TYPE}'")
app.run(host=SERVER_ADDRESS, port=SERVER_PORT, debug=False) # Turn debug=False for production
How to Use:
- Save the script: Save the code above as
mcp_server.py
. - Get Workflow JSON:
- Design your ComfyUI workflow (e.g., LoadImage -> CLIPTextEncode (Prompt) -> KSampler -> SaveImage).
- Make sure the nodes you want to modify have recognizable titles (e.g., “Positive Prompt”, “Input Image”). You can set titles by right-clicking a node -> Title.
- Click “Save (API Format)” in ComfyUI and save the file as
workflow_api.json
in the same directory as the script.
- Configure the Script:
- Edit the
COMFYUI_ADDRESS
if your ComfyUI is not running on127.0.0.1:8188
. - Crucially: Adjust
NODE_TITLE_POSITIVE_PROMPT
,NODE_TITLE_LOAD_IMAGE
,NODE_TITLE_NEGATIVE_PROMPT
(if used), andOUTPUT_NODE_CLASS_TYPE
to match the exact titles/type you used in your ComfyUI workflow. If titles don’t work reliably, you might need to hardcode the node IDs from theworkflow_api.json
file instead.
- Edit the
- Run ComfyUI: Start your ComfyUI instance (remember
--enable-cors
if needed, though maybe not essential for this server-to-server setup). - Run the Script:
bash python mcp_server.py
- Send Requests: Use a tool like
curl
, Postman, or another script to send POST requests tohttp://<your_server_ip>:9090/generate
.
Example curl
requests:
- Text-only:
curl -X POST -H "Content-Type: application/json" \ -d '{"prompt": "A photo of an astronaut riding a horse on the moon"}' \ http://127.0.0.1:9090/generate --output output.png
- Text and Image:
- First, base64 encode your image (e.g.,
input.jpg
):bash # On Linux/macOS: IMAGE_B64=$(base64 -w 0 input.jpg) # On Windows (PowerShell): # $IMAGE_B64 = [Convert]::ToBase64String([IO.File]::ReadAllBytes("input.jpg"))
- Then send the request:
bash curl -X POST -H "Content-Type: application/json" \ -d '{ "prompt": "Make this image look like a watercolor painting", "image": "'"$IMAGE_B64"'" }' \ http://127.0.0.1:9090/generate --output output_watercolor.png
- First, base64 encode your image (e.g.,
This script provides a solid foundation. You might need to adapt the node finding logic, error handling, or output processing depending on the specifics of your workflows and desired behavior. Remember to check the console output of both this script and ComfyUI for debugging information.