Comfyui MCP – Gemini 2.5 Pro

Okay, let’s create a Python script using Flask to act as a server that interacts with the ComfyUI API. This script will:

Run an HTTP Server: Listen for incoming requests (using Flask).
Define an Endpoint (/generate): Accept POST requests with JSON data containing a prompt (text) and optionally an image (base64 encoded).
Interact with ComfyUI API:
- Upload the input image (if provided).
- Load a predefined ComfyUI workflow (in API JSON format).
- Modify the workflow to include the user’s prompt and the uploaded image filename.
- Queue the prompt for execution.
- Monitor the execution progress via WebSocket.
- Retrieve the resulting image/video.
Return the Output: Send the generated image/video back to the client.

Prerequisites:

ComfyUI Running: You need a ComfyUI instance running and accessible. Make sure it’s started with the --enable-cors flag if your script runs on a different origin, although for server-to-server communication from the same machine or network, it might not be strictly necessary, but it’s good practice. Ensure the API is accessible (usually at http://127.0.0.1:8188).
Python Environment: Python 3.6+ installed.
Required Libraries: Install them using pip:
bash pip install Flask requests Pillow websocket-client
ComfyUI Workflow (API Format):
- Create a workflow in the ComfyUI interface that takes a text prompt and optionally an image input (e.g., using LoadImage) and produces an output (e.g., using SaveImage or a video node).
- Crucially: Click “Save (API Format)” in ComfyUI and save this JSON file (e.g., workflow_api.json).
- Identify Node IDs/Titles: You need to know which nodes in your workflow correspond to:
  - The positive text prompt input (e.g., a CLIPTextEncode node). Let’s assume its title is “Positive Prompt”.
  - The input image loader (e.g., a LoadImage node). Let’s assume its title is “Input Image”.
  - The final output node (e.g., a SaveImage or SaveAnimatedWEBP node). Let’s assume its title is “Final Output”. You’ll need its class_type too.

Python Script (mcp_server.py):

import websocket # websocket-client library
import uuid
import json
import urllib.request
import urllib.parse
import random
import os
import time
from PIL import Image
import io
import requests
from flask import Flask, request, jsonify, send_file

# --- Configuration ---
COMFYUI_ADDRESS = "127.0.0.1:8188" # Replace with your ComfyUI address if different
SERVER_ADDRESS = "0.0.0.0"         # Listen on all network interfaces
SERVER_PORT = 9090                 # Port for this MCP server
CLIENT_ID = str(uuid.uuid4())      # Unique ID for this client
WORKSPACE_DIR = "mcp_workspace"    # Directory to store temp files and outputs
WORKFLOW_API_JSON = "workflow_api.json" # Path to your ComfyUI workflow in API JSON format

# Node titles/types in your workflow_api.json to modify (ADJUST THESE)
NODE_TITLE_POSITIVE_PROMPT = "Positive Prompt"  # Title of the node receiving the positive text prompt
NODE_TITLE_LOAD_IMAGE = "Input Image"          # Title of the node loading the input image (if used)
NODE_TITLE_NEGATIVE_PROMPT = "Negative Prompt"  # Optional: Title for negative prompt
# NODE_TITLE_OUTPUT = "Final Output"         # Title of the final output node (useful for finding output)
OUTPUT_NODE_CLASS_TYPE = "SaveImage" # Class type of the *final* output node (e.g., SaveImage, SaveAnimatedWEBP, etc.)


# --- ComfyUI API Interaction Functions ---

def queue_prompt(prompt_workflow, client_id, comfyui_address):
    """Sends the workflow to ComfyUI for queuing."""
    p = {"prompt": prompt_workflow, "client_id": client_id}
    data = json.dumps(p).encode('utf-8')
    req = urllib.request.Request(f"http://{comfyui_address}/prompt", data=data)
    try:
        response = urllib.request.urlopen(req)
        return json.loads(response.read())
    except urllib.error.URLError as e:
        print(f"Error queuing prompt: {e}")
        print(f"Response body (if any): {e.read().decode() if hasattr(e, 'read') else 'N/A'}")
        return None
    except Exception as e:
        print(f"An unexpected error occurred during queueing: {e}")
        return None


def get_image(filename, subfolder, folder_type, comfyui_address):
    """Fetches an image from ComfyUI's /view endpoint."""
    data = {"filename": filename, "subfolder": subfolder, "type": folder_type}
    url_values = urllib.parse.urlencode(data)
    url = f"http://{comfyui_address}/view?{url_values}"
    print(f"Fetching image from: {url}")
    try:
        with urllib.request.urlopen(url) as response:
            return response.read()
    except Exception as e:
        print(f"Error fetching image {filename}: {e}")
        return None

def get_history(prompt_id, comfyui_address):
    """Retrieves the execution history for a given prompt ID."""
    try:
        with urllib.request.urlopen(f"http://{comfyui_address}/history/{prompt_id}") as response:
            return json.loads(response.read())
    except Exception as e:
        print(f"Error getting history for {prompt_id}: {e}")
        return None

def upload_image(image_data, comfyui_address, filename_prefix="input_image"):
    """Uploads an image to ComfyUI's /upload/image endpoint."""
    try:
        # Use PIL to determine format and potentially convert if needed
        img = Image.open(io.BytesIO(image_data))
        image_format = img.format.lower() if img.format else 'png' # Default to png if format unknown

        # Ensure filename has a valid extension
        filename = f"{filename_prefix}_{uuid.uuid4()}.{image_format}"

        # Prepare multipart/form-data
        files = {'image': (filename, image_data, f'image/{image_format}')}
        data = {'overwrite': 'true'} # Overwrite if filename exists (UUID makes this unlikely)

        response = requests.post(f"http://{comfyui_address}/upload/image", files=files, data=data)
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

        upload_data = response.json()
        print(f"Image uploaded: {upload_data}")
        # Expected response format: {"name": "input_image_uuid.png", "subfolder": "", "type": "input"}
        return upload_data['name'], upload_data.get('subfolder', ''), upload_data.get('type', 'input')

    except requests.exceptions.RequestException as e:
        print(f"Error uploading image via requests: {e}")
        if e.response is not None:
            print(f"Response status: {e.response.status_code}")
            print(f"Response body: {e.response.text}")
        return None, None, None
    except Exception as e:
        print(f"An unexpected error occurred during image upload: {e}")
        return None, None, None


def find_node_id_by_title(workflow, title):
    """Finds the node ID in the workflow based on its title."""
    for node_id, node_data in workflow.items():
        if node_data.get("_meta", {}).get("title") == title:
            return node_id
    return None

def find_node_id_by_class_type(workflow, class_type):
    """Finds the node ID in the workflow based on its class type."""
    for node_id, node_data in workflow.items():
        if node_data.get("class_type") == class_type:
            return node_id
    return None

def get_final_outputs(prompt_id, comfyui_address, output_node_class_type):
    """
    Gets the final output image/video data by monitoring WebSocket
    or falling back to history if WebSocket fails.
    """
    output_data = None
    start_time = time.time()
    ws_failed = False

    # --- WebSocket Monitoring ---
    ws = websocket.WebSocket()
    ws_url = f"ws://{comfyui_address}/ws?clientId={CLIENT_ID}"
    print(f"Connecting to WebSocket: {ws_url}")
    try:
        ws.connect(ws_url)
        print("WebSocket connected.")
        while True:
            out = ws.recv()
            if isinstance(out, str):
                message = json.loads(out)
                # print(f"WS Received: {message}") # Debug: print all messages
                if message['type'] == 'status':
                    status_data = message['data']['status']
                    execinfo = status_data.get('execinfo')
                    if execinfo and execinfo.get('queue_remaining') is not None:
                        print(f"Queue remaining: {execinfo['queue_remaining']}")

                elif message['type'] == 'progress':
                    progress_data = message['data']
                    print(f"Progress: {progress_data['value']}/{progress_data['max']}")

                elif message['type'] == 'executing':
                    data = message['data']
                    if data['node'] is None and data['prompt_id'] == prompt_id:
                        print(f"Execution finished for prompt {prompt_id}")
                        break # Execution is finished
                    elif data['prompt_id'] == prompt_id:
                         print(f"Executing node: {data['node']}")

                elif message['type'] == 'executed':
                     data = message['data']
                     if data['prompt_id'] == prompt_id and 'outputs' in data:
                         # Check if this is the output from the final node
                         node_info = get_history(prompt_id, comfyui_address).get(prompt_id, {}).get('outputs', {}).get(data['node'], {})
                         if node_info.get('class_type') == output_node_class_type:
                            print(f"Detected output from final node {data['node']} ({output_node_class_type}): {data['outputs']}")
                            output_data = data['outputs'] # Store the last relevant output
                            # Don't break here yet, wait for 'executing' with node=None
            else:
                # Handle binary messages if necessary (less common for status)
                print("Received binary message (unhandled)")
                # If the final output is expected as binary via WS, handle it here

            # Timeout check
            if time.time() - start_time > 120: # 2-minute timeout
                print("WebSocket timeout waiting for execution finish.")
                ws_failed = True
                break

    except websocket.WebSocketException as e:
        print(f"WebSocket Error: {e}")
        ws_failed = True
    except Exception as e:
        print(f"Error processing WebSocket message: {e}")
        ws_failed = True
    finally:
        if ws.connected:
            ws.close()
            print("WebSocket closed.")

    # --- Fallback to History API if WebSocket failed or didn't find output ---
    if output_data is None:
        print("WebSocket did not yield output or failed, trying History API...")
        history = get_history(prompt_id, comfyui_address)
        if not history or prompt_id not in history:
            print(f"Error: Could not retrieve history for prompt {prompt_id}")
            return None

        prompt_history = history[prompt_id]
        if 'outputs' not in prompt_history:
             print(f"Error: 'outputs' not found in history for prompt {prompt_id}")
             return None

        # Find the output node in the history
        for node_id, node_output in prompt_history['outputs'].items():
             # Check if this node's class type matches the expected output type
            if node_output.get('class_type') == output_node_class_type:
                print(f"Found output in history from node {node_id} ({output_node_class_type}): {node_output}")
                output_data = node_output # Use the first match found in history
                break # Assuming one main output node of this type

        if output_data is None:
             print(f"Error: Output node with class type '{output_node_class_type}' not found in history outputs.")
             print(f"Available history outputs: {prompt_history['outputs']}")
             return None


    # --- Process the identified outputs ---
    results = []
    if output_data:
        # Output data structure varies (e.g., {'images': [...]}, {'gifs': [...]})
        key_found = None
        for key in ['images', 'gifs', 'videos']: # Add other possible keys if needed
             if key in output_data:
                 key_found = key
                 break

        if not key_found:
             print(f"Error: Could not find expected output key (images, gifs, videos) in node output: {output_data}")
             return None

        for output_item in output_data[key_found]:
            image_data = get_image(output_item['filename'], output_item.get('subfolder',''), output_item['type'], comfyui_address)
            if image_data:
                results.append({
                    "data": image_data,
                    "filename": output_item['filename'],
                    "content_type": f"image/{output_item['format']}" if 'format' in output_item else 'application/octet-stream' # Adjust mime type if needed
                })
            else:
                 print(f"Warning: Failed to retrieve data for output item {output_item['filename']}")

    return results

# --- Flask App ---

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate():
    if not request.is_json:
        return jsonify({"error": "Request must be JSON"}), 400

    data = request.get_json()
    prompt_text = data.get('prompt')
    negative_prompt_text = data.get('negative_prompt', '') # Optional negative prompt
    base64_image = data.get('image') # Optional base64 encoded image

    if not prompt_text:
        return jsonify({"error": "Missing 'prompt' in request data"}), 400

    # --- 1. Load Workflow ---
    try:
        with open(WORKFLOW_API_JSON, 'r') as f:
            prompt_workflow = json.load(f)
    except FileNotFoundError:
        return jsonify({"error": f"Workflow file '{WORKFLOW_API_JSON}' not found."}), 500
    except json.JSONDecodeError:
        return jsonify({"error": f"Invalid JSON in workflow file '{WORKFLOW_API_JSON}'."}), 500
    except Exception as e:
         return jsonify({"error": f"Error loading workflow: {e}"}), 500

    # --- 2. Handle Image Upload (if provided) ---
    uploaded_filename = None
    if base64_image:
        try:
            import base64
            image_data = base64.b64decode(base64_image)
            filename, subfolder, img_type = upload_image(image_data, COMFYUI_ADDRESS)
            if not filename:
                return jsonify({"error": "Failed to upload image to ComfyUI"}), 500
            uploaded_filename = filename
            print(f"Uploaded image filename: {uploaded_filename}")
        except base64.binascii.Error:
            return jsonify({"error": "Invalid base64 image data"}), 400
        except Exception as e:
            return jsonify({"error": f"Error processing image upload: {e}"}), 500

    # --- 3. Modify Workflow ---
    try:
        # Find Positive Prompt Node
        pos_prompt_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_POSITIVE_PROMPT)
        if not pos_prompt_node_id:
            print(f"Warning: Node with title '{NODE_TITLE_POSITIVE_PROMPT}' not found in workflow.")
            # If not found by title, try finding a common text input node type
            possible_types = ["CLIPTextEncode", "CLIPTextEncodeSDXL"]
            for node_id, node_data in prompt_workflow.items():
                if node_data["class_type"] in possible_types and "Positive" in node_data.get("_meta", {}).get("title", ""):
                     pos_prompt_node_id = node_id
                     print(f"Found potential positive prompt node by type/partial title: {node_id}")
                     break
            if not pos_prompt_node_id:
                 return jsonify({"error": f"Could not find Positive Prompt node ('{NODE_TITLE_POSITIVE_PROMPT}') in workflow."}), 500

        # Update Positive Prompt Text
        # The prompt text is usually within inputs -> text
        if 'text' in prompt_workflow[pos_prompt_node_id]['inputs']:
             prompt_workflow[pos_prompt_node_id]['inputs']['text'] = prompt_text
        # Sometimes it might be in widgets_values (older ComfyUI versions?)
        elif 'widgets_values' in prompt_workflow[pos_prompt_node_id] and isinstance(prompt_workflow[pos_prompt_node_id]['widgets_values'], list):
             # Find the widget by name (usually 'text' or similar)
             widget_index = next((i for i, w_name in enumerate(prompt_workflow[pos_prompt_node_id].get('widgets_info', [])) if w_name == 'text'), -1)
             if widget_index != -1 and widget_index < len(prompt_workflow[pos_prompt_node_id]['widgets_values']):
                 prompt_workflow[pos_prompt_node_id]['widgets_values'][widget_index] = prompt_text
             else:
                  print(f"Warning: Could not find 'text' widget input for node {pos_prompt_node_id}. Prompt may not be set.")
        else:
            print(f"Warning: Could not find 'text' input or 'widgets_values' for node {pos_prompt_node_id}. Prompt may not be set.")

        print(f"Updated Positive Prompt Node '{pos_prompt_node_id}' with text: '{prompt_text[:50]}...'")

        # Find and Update Negative Prompt Node (Optional)
        if negative_prompt_text and NODE_TITLE_NEGATIVE_PROMPT:
            neg_prompt_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_NEGATIVE_PROMPT)
            if neg_prompt_node_id:
                 if 'text' in prompt_workflow[neg_prompt_node_id]['inputs']:
                    prompt_workflow[neg_prompt_node_id]['inputs']['text'] = negative_prompt_text
                    print(f"Updated Negative Prompt Node '{neg_prompt_node_id}' with text: '{negative_prompt_text[:50]}...'")
                 else:
                     print(f"Warning: Could not find 'text' input for negative prompt node {neg_prompt_node_id}.")
            else:
                 print(f"Warning: Negative prompt provided, but node '{NODE_TITLE_NEGATIVE_PROMPT}' not found.")


        # Find Load Image Node and Update Filename (if image was uploaded)
        if uploaded_filename:
            load_image_node_id = find_node_id_by_title(prompt_workflow, NODE_TITLE_LOAD_IMAGE)
            if not load_image_node_id:
                # Fallback: Try finding by class type if title doesn't match
                load_image_node_id = find_node_id_by_class_type(prompt_workflow, "LoadImage")
                if load_image_node_id:
                     print(f"Found LoadImage node by class type: {load_image_node_id}")
                else:
                     return jsonify({"error": f"Input image provided, but Load Image node ('{NODE_TITLE_LOAD_IMAGE}' or type 'LoadImage') not found in workflow."}), 500

            # Update the 'image' input field
            if 'image' in prompt_workflow[load_image_node_id]['inputs']:
                prompt_workflow[load_image_node_id]['inputs']['image'] = uploaded_filename
                print(f"Updated Load Image Node '{load_image_node_id}' with filename: {uploaded_filename}")
            else:
                 return jsonify({"error": f"Could not find 'image' input field for Load Image node '{load_image_node_id}'."}), 500

    except Exception as e:
        print(f"Error modifying workflow: {e}")
        import traceback
        traceback.print_exc()
        return jsonify({"error": f"Internal server error during workflow modification: {e}"}), 500

    # --- 4. Queue Prompt ---
    print("\nModified Workflow Snippet (Prompt Nodes):")
    if pos_prompt_node_id in prompt_workflow: print(f"  {pos_prompt_node_id}: {prompt_workflow[pos_prompt_node_id]['inputs']}")
    if 'neg_prompt_node_id' in locals() and neg_prompt_node_id in prompt_workflow: print(f"  {neg_prompt_node_id}: {prompt_workflow[neg_prompt_node_id]['inputs']}")
    if uploaded_filename and load_image_node_id in prompt_workflow: print(f"  {load_image_node_id}: {prompt_workflow[load_image_node_id]['inputs']}")
    print("\nQueuing prompt...")

    queued_data = queue_prompt(prompt_workflow, CLIENT_ID, COMFYUI_ADDRESS)
    if not queued_data or 'prompt_id' not in queued_data:
        return jsonify({"error": "Failed to queue prompt in ComfyUI"}), 500

    prompt_id = queued_data['prompt_id']
    print(f"Prompt queued successfully. Prompt ID: {prompt_id}")

    # --- 5. Get Output ---
    print(f"Waiting for output for prompt {prompt_id}...")
    output_results = get_final_outputs(prompt_id, COMFYUI_ADDRESS, OUTPUT_NODE_CLASS_TYPE)

    if not output_results:
        return jsonify({"error": "Failed to retrieve output from ComfyUI execution"}), 500

    # --- 6. Return Output ---
    if len(output_results) == 1:
        # Return single image/video directly
        result = output_results[0]
        # Clean up temp workspace file (optional)
        # local_filepath = os.path.join(WORKSPACE_DIR, result["filename"])
        # if os.path.exists(local_filepath):
        #     os.remove(local_filepath)

        return send_file(
            io.BytesIO(result["data"]),
            mimetype=result["content_type"],
            as_attachment=False, # Display inline if possible
            download_name=result["filename"]
        )
    else:
        # Handle multiple outputs (e.g., return JSON with list of base64 or links?)
        # For simplicity, let's just return the first one for now
        print(f"Warning: Workflow produced multiple outputs ({len(output_results)}). Returning the first one.")
        result = output_results[0]
        return send_file(
            io.BytesIO(result["data"]),
            mimetype=result["content_type"],
            as_attachment=False,
            download_name=result["filename"]
        )

# --- Main Execution ---

if __name__ == '__main__':
    # Create workspace directory if it doesn't exist
    if not os.path.exists(WORKSPACE_DIR):
        os.makedirs(WORKSPACE_DIR)

    print(f"MCP Server starting...")
    print(f" - ComfyUI API: http://{COMFYUI_ADDRESS}")
    print(f" - Listening on: http://{SERVER_ADDRESS}:{SERVER_PORT}")
    print(f" - Using Workflow: {WORKFLOW_API_JSON}")
    print(f"   - Positive Prompt Node Title: '{NODE_TITLE_POSITIVE_PROMPT}'")
    print(f"   - Negative Prompt Node Title: '{NODE_TITLE_NEGATIVE_PROMPT}' (Optional)")
    print(f"   - Load Image Node Title: '{NODE_TITLE_LOAD_IMAGE}' (Optional)")
    print(f"   - Final Output Node Type: '{OUTPUT_NODE_CLASS_TYPE}'")

    app.run(host=SERVER_ADDRESS, port=SERVER_PORT, debug=False) # Turn debug=False for production

How to Use:

Save the script: Save the code above as mcp_server.py.
Get Workflow JSON:
- Design your ComfyUI workflow (e.g., LoadImage -> CLIPTextEncode (Prompt) -> KSampler -> SaveImage).
- Make sure the nodes you want to modify have recognizable titles (e.g., “Positive Prompt”, “Input Image”). You can set titles by right-clicking a node -> Title.
- Click “Save (API Format)” in ComfyUI and save the file as workflow_api.json in the same directory as the script.
Configure the Script:
- Edit the COMFYUI_ADDRESS if your ComfyUI is not running on 127.0.0.1:8188.
- Crucially: Adjust NODE_TITLE_POSITIVE_PROMPT, NODE_TITLE_LOAD_IMAGE, NODE_TITLE_NEGATIVE_PROMPT (if used), and OUTPUT_NODE_CLASS_TYPE to match the exact titles/type you used in your ComfyUI workflow. If titles don’t work reliably, you might need to hardcode the node IDs from the workflow_api.json file instead.
Run ComfyUI: Start your ComfyUI instance (remember --enable-cors if needed, though maybe not essential for this server-to-server setup).
Run the Script:
bash python mcp_server.py
Send Requests: Use a tool like curl, Postman, or another script to send POST requests to http://<your_server_ip>:9090/generate.

Example curl requests:

Text-only: curl -X POST -H "Content-Type: application/json" \ -d '{"prompt": "A photo of an astronaut riding a horse on the moon"}' \ http://127.0.0.1:9090/generate --output output.png
Text and Image:
- First, base64 encode your image (e.g., input.jpg):
  bash # On Linux/macOS: IMAGE_B64=$(base64 -w 0 input.jpg) # On Windows (PowerShell): # $IMAGE_B64 = [Convert]::ToBase64String([IO.File]::ReadAllBytes("input.jpg"))
- Then send the request:
  bash curl -X POST -H "Content-Type: application/json" \ -d '{ "prompt": "Make this image look like a watercolor painting", "image": "'"$IMAGE_B64"'" }' \ http://127.0.0.1:9090/generate --output output_watercolor.png

This script provides a solid foundation. You might need to adapt the node finding logic, error handling, or output processing depending on the specifics of your workflows and desired behavior. Remember to check the console output of both this script and ComfyUI for debugging information.

Popular Tags