Gemini Image Analysis

AI image analysis via Gemini. Object detection, segmentation, captioning, classification, and visual Q&A.

POST/v1/media/queue

~$0.02 est.

Overview

Image understanding via Gemini. Object detection, segmentation, captioning, classification, and visual Q&A.

Property	Value
Model ID	`google/gemini-image-analysis`
Context Window	1,000,000 tokens
Billing	Token-based (dynamic)
Input Price	$0.20 / 1M tokens
Output Price	$0.80 / 1M tokens
Token per Image	~258 tokens (at 384px or smaller)

Usage

All media models use the async job queue. Submit a job, then poll for the result.

Step 1: Submit Job

const res = await fetch('https://api.yepapi.com/v1/media/queue', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-image-analysis',
    prompt: 'What objects are in this image? Return bounding boxes.',
    imageData: {
      mimeType: 'image/png',
      base64: '<base64-encoded-image>',
    },
  }),
});
const { data } = await res.json();
// data.jobId — use this to poll for results

curl -X POST https://api.yepapi.com/v1/media/queue \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/gemini-image-analysis", "prompt": "What objects are in this image?", "imageData": {"mimeType": "image/png", "base64": "<base64>"}}'

Step 2: Poll for Result

const status = await fetch(`https://api.yepapi.com/v1/media/status/${data.jobId}`, {
  headers: { 'x-api-key': 'YOUR_API_KEY' },
});
const { data: job } = await status.json();
// job.status — "pending" | "processing" | "completed" | "failed"
// job.result.text — analysis text when completed

curl https://api.yepapi.com/v1/media/status/JOB_ID \
  -H "x-api-key: YOUR_API_KEY"

Request Body

Parameter	Type	Required	Description	Default
`model`	`string`	Yes	`google/gemini-image-analysis`	—
`prompt`	`string`	Yes	Analysis instruction or question	—
`imageData.mimeType`	`string`	Yes	MIME type of the image (e.g. `image/png`)	—
`imageData.base64`	`string`	Yes	Base64-encoded image data	—

Token Consumption

258 tokens per image at 384px or smaller
Larger images tiled into 768x768 sections (258 tokens each)
Up to 3,600 images per request
Max 10MB inline request size

Supported Formats

PNG, JPEG, WEBP, HEIC/HEIF.

Under the Hood