YepAPI
AI Models

Gemini 3.5 Flash

Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.

POST/v1/ai/chat
$0.01/call

Overview

Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.

PropertyValue
Model IDgoogle/gemini-3.5-flash
Context Window1,048,576 tokens
Max Output65,536 tokens
Input Price$1.50 / 1M tokens
Output Price$9.00 / 1M tokens

Usage

const res = await fetch('https://api.yepapi.com/v1/ai/chat', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-3.5-flash',
    messages: [{ role: 'user', content: 'Summarize the trade-offs between REST and GraphQL for a mobile backend.' }],
  }),
});
const { data } = await res.json();
console.log(data.message.content);
curl -X POST https://api.yepapi.com/v1/ai/chat \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/gemini-3.5-flash", "messages": [{"role": "user", "content": "Summarize the trade-offs between REST and GraphQL for a mobile backend."}]}'

Request Body

ParameterTypeRequiredDescriptionDefault
modelstringYesModel ID (e.g. google/gemini-3.5-flash)
messagesMessage[]YesArray of { role, content } objects
maxTokensnumberNoMaximum tokens in the responseModel default
temperaturenumberNoSampling temperature (0.0–2.0)1.0
topPnumberNoNucleus sampling threshold1.0
frequencyPenaltynumberNoPenalize repeated tokens0
presencePenaltynumberNoPenalize tokens already present0
streambooleanNoEnable SSE streamingfalse
Info

All AI models use the /v1/ai/chat endpoint. Specify the model with the model field.

Response

{
  "ok": true,
  "data": {
    "model": "google/gemini-3.5-flash",
    "message": {
      "role": "assistant",
      "content": "REST is simpler to cache and operate but tends to over- or under-fetch, forcing extra round trips on mobile. GraphQL lets clients request exactly the fields they need in one query, reducing payloads on flaky networks, at the cost of more complex caching, server resolvers, and rate limiting."
    },
    "usage": {
      "promptTokens": 16,
      "completionTokens": 245,
      "totalTokens": 261
    }
  }
}

Streaming

Set "stream": true to receive Server-Sent Events. Each chunk contains a delta object:

data: {"delta":{"content":"REST"},"model":"google/gemini-3.5-flash","index":0}
data: [DONE]
Under the Hood

We handle auth, billing, and response normalization — you just send messages.

On this page