Skip to main content

Overview

Generate images directly in chat using AI models. Supports both generation from text prompts and iterative editing of existing images.
Generating an image from a text prompt in chat

Quick Start

Toggle in chat.config.ts:
ai: {
  tools: {
    image: {
      enabled: true, // Requires BLOB_READ_WRITE_TOKEN
    },
  },
}
Image generation requires BLOB_READ_WRITE_TOKEN since generated images are uploaded to Vercel Blob storage.
Configure the default image model:
ai: {
  tools: {
    image: {
      default: "google/gemini-3-pro-image",
    },
  },
}

Modes

The tool operates in two modes based on context:
ModeTriggerBehavior
generateText prompt onlyCreates new image from scratch
editPrompt + attachments or previous generationUses existing images as input
Mode is determined automatically:
const mode = imageParts.length > 0 || lastGeneratedImage ? "edit" : "generate";

Iterative Editing

Users can iterate on generated images without re-uploading. The system automatically tracks the last generated image in the conversation.

How It Works

  1. Extraction: Before each request, the chat agent scans recent messages for the last generated image:
app/(chat)/api/chat/get-recent-generated-image.ts
export function getRecentGeneratedImage(
  messages: ChatMessage[],
): { imageUrl: string; name: string } | null {
  const lastAssistantMessage = messages.findLast(
    (message) => message.role === "assistant",
  );

  if (lastAssistantMessage?.parts && lastAssistantMessage.parts.length > 0) {
    for (const part of lastAssistantMessage.parts) {
      if (
        part.type === "tool-generateImage" &&
        part.state === "output-available" &&
        part.output?.imageUrl
      ) {
        return {
          imageUrl: part.output.imageUrl,
          name: `generated-image-${part.toolCallId}.png`,
        };
      }
    }
  }

  return null;
}
  1. Injection: The extracted image is passed to the tool factory:
lib/ai/core-chat-agent.ts
const lastGeneratedImage = getRecentGeneratedImage(messages);

const tools = getTools({
  // ...other params
  lastGeneratedImage,
});
  1. Edit mode: When lastGeneratedImage exists, the tool fetches it and includes it as input:
lib/ai/tools/generate-image.ts
const inputImages = await collectEditImages({ imageParts, lastGeneratedImage });
promptInput = { text: prompt, images: inputImages };

User Experience

  • User: “Generate a sunset over mountains”
  • AI: generates image
  • User: “Add a lake in the foreground”
  • AI: edits previous image (no re-upload needed)

Image Sources

Edit mode combines images from multiple sources:
SourceDescription
lastGeneratedImageMost recent generated image in conversation
attachmentsUser-uploaded images in current message
Both are fetched and passed to the model:
async function collectEditImages({ imageParts, lastGeneratedImage }) {
  return await Promise.all([
    ...(lastGeneratedImage
      ? [fetchImageBuffer(lastGeneratedImage.imageUrl)]
      : []),
    ...imageParts.map((p) => fetchImageBuffer(p.url)),
  ]);
}

Architecture

Follows the Tool Part pattern:
lib/ai/tools/generate-image.ts → components/part/generate-image.tsx

Tool Output

return { imageUrl: result.url, prompt };
The generated image is uploaded to blob storage and the URL returned.

UI States

StateShows
input-availableSkeleton + “Generating image:
output-availableImage + copy button + prompt

Configuration

Image Model

chat.config.ts
ai: {
  tools: {
    image: {
      default: "google/gemini-3-pro-image",
    },
  },
}

Model Selection Logic

The tool supports two types of models:
TypeDescriptionExample
Image modelStandalone image generation modelsgoogle/gemini-3-pro-image
MultimodalLanguage models with image generation capabilitygoogle/gemini-2.0-flash-exp
Model selection is done in resolveImageModel(selectedModel) in lib/ai/tools/generate-image.ts:
  1. If the user’s selected chat model supports image output (per app model registry, model.output.image), use it
  2. Otherwise, fall back to config.ai.tools.image.default

Image Model vs Multimodal Generation

The tool uses different generation paths based on model type: Image model (generateImage from AI SDK):
  • Uses standalone image models via getImageModel()
  • Supports edit mode with image buffers as input
  • Returns base64-encoded images
Multimodal (generateText with image output):
  • Uses language models via getMultimodalImageModel()
  • Passes images as URL references in message content
  • Requires responseModalities: ["TEXT", "IMAGE"] for Google models
  • Extracts generated image from response files