AI Models
GLM 5.1
Zhipu's latest flagship model. Strong reasoning and multilingual capabilities with a 203K context window.
POST
$0.01/call/v1/ai/chatOverview
Zhipu's latest flagship model. Strong reasoning and multilingual capabilities with a 203K context window.
| Property | Value |
|---|---|
| Model ID | z-ai/glm-5.1 |
| Context Window | 203,000 tokens |
| Max Output | 8,192 tokens |
| Input Price | $1.26 / 1M tokens |
| Output Price | $3.96 / 1M tokens |
Usage
const res = await fetch('https://api.yepapi.com/v1/ai/chat', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'z-ai/glm-5.1',
messages: [{ role: 'user', content: 'Explain the concept of attention mechanisms in neural networks.' }],
}),
});
const { data } = await res.json();
console.log(data.message.content);curl -X POST https://api.yepapi.com/v1/ai/chat \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "z-ai/glm-5.1", "messages": [{"role": "user", "content": "Explain the concept of attention mechanisms in neural networks."}]}'Request Body
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
model | string | Yes | Model ID (e.g. z-ai/glm-5.1) | — |
messages | Message[] | Yes | Array of { role, content } objects | — |
maxTokens | number | No | Maximum tokens in the response | Model default |
temperature | number | No | Sampling temperature (0.0–2.0) | 1.0 |
topP | number | No | Nucleus sampling threshold | 1.0 |
frequencyPenalty | number | No | Penalize repeated tokens | 0 |
presencePenalty | number | No | Penalize tokens already present | 0 |
stream | boolean | No | Enable SSE streaming | false |
Info
All AI models use the /v1/ai/chat endpoint. Specify the model with the model field.
Response
{
"ok": true,
"data": {
"model": "z-ai/glm-5.1",
"message": {
"role": "assistant",
"content": "Attention mechanisms allow neural networks to dynamically focus on the most relevant parts of an input sequence when producing each element of the output, computing weighted sums based on learned relevance scores."
},
"usage": {
"promptTokens": 14,
"completionTokens": 168,
"totalTokens": 182
}
}
}Streaming
Set "stream": true to receive Server-Sent Events. Each chunk contains a delta object:
data: {"delta":{"content":"Attention"},"model":"z-ai/glm-5.1","index":0}
data: [DONE]Under the Hood
We handle auth, billing, and response normalization — you just send messages.