AI Models
Gemini 3.5 Flash
Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.
POST
$0.01/call/v1/ai/chatOverview
Google's fast, cost-efficient multimodal model — strong reasoning and instruction following at low latency, with a 1M+ token context window.
| Property | Value |
|---|---|
| Model ID | google/gemini-3.5-flash |
| Context Window | 1,048,576 tokens |
| Max Output | 65,536 tokens |
| Input Price | $1.50 / 1M tokens |
| Output Price | $9.00 / 1M tokens |
Usage
const res = await fetch('https://api.yepapi.com/v1/ai/chat', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-3.5-flash',
messages: [{ role: 'user', content: 'Summarize the trade-offs between REST and GraphQL for a mobile backend.' }],
}),
});
const { data } = await res.json();
console.log(data.message.content);curl -X POST https://api.yepapi.com/v1/ai/chat \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "google/gemini-3.5-flash", "messages": [{"role": "user", "content": "Summarize the trade-offs between REST and GraphQL for a mobile backend."}]}'Request Body
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
model | string | Yes | Model ID (e.g. google/gemini-3.5-flash) | — |
messages | Message[] | Yes | Array of { role, content } objects | — |
maxTokens | number | No | Maximum tokens in the response | Model default |
temperature | number | No | Sampling temperature (0.0–2.0) | 1.0 |
topP | number | No | Nucleus sampling threshold | 1.0 |
frequencyPenalty | number | No | Penalize repeated tokens | 0 |
presencePenalty | number | No | Penalize tokens already present | 0 |
stream | boolean | No | Enable SSE streaming | false |
Info
All AI models use the /v1/ai/chat endpoint. Specify the model with the model field.
Response
{
"ok": true,
"data": {
"model": "google/gemini-3.5-flash",
"message": {
"role": "assistant",
"content": "REST is simpler to cache and operate but tends to over- or under-fetch, forcing extra round trips on mobile. GraphQL lets clients request exactly the fields they need in one query, reducing payloads on flaky networks, at the cost of more complex caching, server resolvers, and rate limiting."
},
"usage": {
"promptTokens": 16,
"completionTokens": 245,
"totalTokens": 261
}
}
}Streaming
Set "stream": true to receive Server-Sent Events. Each chunk contains a delta object:
data: {"delta":{"content":"REST"},"model":"google/gemini-3.5-flash","index":0}
data: [DONE]Under the Hood
We handle auth, billing, and response normalization — you just send messages.