Gemini 3.5 Flash

Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.

POST/v1/ai/chat

$0.01/call

Overview

Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.

Property	Value
Model ID	`google/gemini-3.5-flash`
Context Window	1,048,576 tokens
Max Output	65,536 tokens
Input Price	$1.50 / 1M tokens
Output Price	$9.00 / 1M tokens

Usage

const res = await fetch('https://api.yepapi.com/v1/ai/chat', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'google/gemini-3.5-flash',
    messages: [{ role: 'user', content: 'Summarize the trade-offs between REST and GraphQL for a mobile backend.' }],
  }),
});
const { data } = await res.json();
console.log(data.message.content);

curl -X POST https://api.yepapi.com/v1/ai/chat \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "google/gemini-3.5-flash", "messages": [{"role": "user", "content": "Summarize the trade-offs between REST and GraphQL for a mobile backend."}]}'

Request Body

Parameter	Type	Required	Description	Default
`model`	`string`	Yes	Model ID (e.g. `google/gemini-3.5-flash`)	—
`messages`	`Message[]`	Yes	Array of `{ role, content }` objects	—
`maxTokens`	`number`	No	Maximum tokens in the response	Model default
`temperature`	`number`	No	Sampling temperature (0.0–2.0)	`1.0`
`topP`	`number`	No	Nucleus sampling threshold	`1.0`
`frequencyPenalty`	`number`	No	Penalize repeated tokens	`0`
`presencePenalty`	`number`	No	Penalize tokens already present	`0`
`stream`	`boolean`	No	Enable SSE streaming	`false`

Info

All AI models use the /v1/ai/chat endpoint. Specify the model with the model field.

Response

{
  "ok": true,
  "data": {
    "model": "google/gemini-3.5-flash",
    "message": {
      "role": "assistant",
      "content": "REST is simpler to cache and operate but tends to over- or under-fetch, forcing extra round trips on mobile. GraphQL lets clients request exactly the fields they need in one query, reducing payloads on flaky networks, at the cost of more complex caching, server resolvers, and rate limiting."
    },
    "usage": {
      "promptTokens": 16,
      "completionTokens": 245,
      "totalTokens": 261
    }
  }
}

Streaming

Set "stream": true to receive Server-Sent Events. Each chunk contains a delta object:

data: {"delta":{"content":"REST"},"model":"google/gemini-3.5-flash","index":0}
data: [DONE]

Under the Hood

We handle auth, billing, and response normalization — you just send messages.