YepAPI
AI Models

Gemini Image Analysis

AI image analysis via Gemini. Object detection, segmentation, captioning, classification, and visual Q&A.

POST/v1/media/queue
$0.02/call

Overview

Image understanding via Gemini. Object detection, segmentation, captioning, classification, and visual Q&A.

PropertyValue
Model IDgoogle/gemini-image-analysis
Context Window1,000,000 tokens
Input Price$0.20 / 1M tokens
Output Price$0.80 / 1M tokens
Token per Image~258 tokens (at 384px or smaller)

Usage

All media models use the async job queue. Submit a job, then poll for the result.

Step 1: Submit Job

const res = await fetch('https://api.yepapi.com/v1/media/queue', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-image-analysis',
    prompt: 'What objects are in this image? Return bounding boxes.',
    imageData: {
      mimeType: 'image/png',
      base64: '<base64-encoded-image>',
    },
  }),
});
const { data } = await res.json();
// data.jobId — use this to poll for results
curl -X POST https://api.yepapi.com/v1/media/queue \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/gemini-image-analysis", "prompt": "What objects are in this image?", "imageData": {"mimeType": "image/png", "base64": "<base64>"}}'

Step 2: Poll for Result

const status = await fetch(`https://api.yepapi.com/v1/media/status/${data.jobId}`, {
  headers: { 'x-api-key': 'YOUR_API_KEY' },
});
const { data: job } = await status.json();
// job.status — "pending" | "processing" | "completed" | "failed"
// job.result.text — analysis text when completed
curl https://api.yepapi.com/v1/media/status/JOB_ID \
  -H "x-api-key: YOUR_API_KEY"

Request Body

ParameterTypeRequiredDescriptionDefault
modelstringYesgoogle/gemini-image-analysis
promptstringYesAnalysis instruction or question
imageData.mimeTypestringYesMIME type of the image (e.g. image/png)
imageData.base64stringYesBase64-encoded image data

Token Consumption

  • 258 tokens per image at 384px or smaller
  • Larger images tiled into 768x768 sections (258 tokens each)
  • Up to 3,600 images per request
  • Max 10MB inline request size

Supported Formats

PNG, JPEG, WEBP, HEIC/HEIF.

Under the Hood

Powered by Google's Gemini API directly.

On this page