API Documentation
OpenAI-compatible API gateway to top Chinese AI models — DeepSeek, Qwen, GLM, MiniMax. Use your existing SDK, just change the base URL.
Quick Start
1. Python
from openai import OpenAI client = OpenAI( api_key="sk-YOUR_API_KEY", base_url="https://api.tunanapi.com/v1" ) response = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content)
2. cURL
curl https://api.tunanapi.com/v1/chat/completions \ -H "Authorization: Bearer sk-YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-chat", "messages": [{"role": "user", "content": "Hello!"}] }'
3. Node.js
import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'sk-YOUR_API_KEY', baseURL: 'https://api.tunanapi.com/v1' }); const response = await client.chat.completions.create({ model: 'deepseek-chat', messages: [{ role: 'user', content: 'Hello!' }] }); console.log(response.choices[0].message.content);
Authentication
All requests require an API key in the Authorization header:
Authorization: Bearer sk-YOUR_API_KEY
Get your key at tunanapi.com → Sign up with email → Copy API key from dashboard.
export OPENAI_API_KEY="sk-..."
Base URL & SDK Configuration
| Environment | Variable | Value |
|---|---|---|
| OpenAI SDK | base_url / OPENAI_BASE_URL | https://api.tunanapi.com/v1 |
| LangChain | base_url | https://api.tunanapi.com/v1 |
| cURL / HTTP | — | https://api.tunanapi.com/v1/... |
base_url and api_key. Everything else stays the same.
Models & Pricing
All prices per 1M tokens. Input = prompt tokens, Output = completion tokens. Billed per token, no minimums.
Flagship Models
| Model ID | Provider | Input | Output | Context | Best For |
|---|---|---|---|---|---|
deepseek-chat | DeepSeek | $0.20 | $0.40 | 128K | General tasks, best value |
deepseek-reasoner | DeepSeek | $2.50 | $5.00 | 128K | Complex reasoning, math, code |
qwen3.7-max | Qwen | $1.80 | $5.40 | 128K | High-quality generation |
minimax-m3 | MiniMax | $0.43 | $3.31 | 1M | Ultra-long context |
Fast & Affordable
| Model ID | Provider | Input | Output | Context |
|---|---|---|---|---|
qwen3.5-flash | Qwen | $0.07 | $0.22 | 128K |
glm-4-flash | GLM | $0.07 | $0.22 | 128K |
qwen3.7-plus | Qwen | $0.58 | $1.74 | 128K |
glm-4-plus | GLM | $1.80 | $5.40 | 128K |
Specialized Models
| Model ID | Provider | Input | Output | Context | Type |
|---|---|---|---|---|---|
minimax-m2.5 | MiniMax | $0.22 | $1.65 | 197K | Long-context |
minimax-m2.7 | MiniMax | $0.29 | $1.73 | 205K | Long-context |
qwen-coder-plus | Qwen | $0.58 | $1.74 | 128K | Code generation |
qwen-math-plus | Qwen | $0.58 | $1.74 | 128K | Math & science |
deepseek-chat — it's the best all-rounder at the lowest price. Use deepseek-reasoner for hard problems, qwen3.5-flash for speed, and minimax-m3 when you need 1M token context.
Chat Completions
Create a model response for a conversation. Fully compatible with OpenAI's chat completions endpoint.
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model ID from the table above |
messages | array | Yes | — | Array of message objects with role and content |
temperature | number | No | 1 | 0–2. Higher = more creative, lower = more deterministic |
max_tokens | integer | No | auto | Maximum tokens in the completion |
stream | boolean | No | false | Stream partial results via SSE |
top_p | number | No | 1 | Nucleus sampling threshold |
stop | string/array | No | null | Stop sequences (max 4) |
frequency_penalty | number | No | 0 | -2 to 2. Penalize repeated tokens |
presence_penalty | number | No | 0 | -2 to 2. Encourage new topics |
n | integer | No | 1 | Number of completions to generate |
response_format | object | No | — | JSON mode: {"type": "json_object"} |
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1718900000,
"model": "deepseek-chat",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 9,
"total_tokens": 17
}
}
Example: Multi-turn conversation
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a linked list"},
]
)
Example: JSON mode
response = client.chat.completions.create(
model="deepseek-chat",
response_format={"type": "json_object"},
messages=[{"role": "user", "content": "Return a JSON object with name, age, city"}]
)
List Models
List all available models on TunanAPI.
Request
curl https://api.tunanapi.com/v1/models \
-H "Authorization: Bearer sk-YOUR_API_KEY"
Response
{
"object": "list",
"data": [
{"id": "deepseek-chat", "object": "model", "owned_by": "deepseek"},
{"id": "qwen3.7-max", "object": "model", "owned_by": "qwen"},
...
]
}
Embeddings
Generate text embeddings for semantic search, clustering, or classification.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | text-embedding-v3 (Qwen, 1024d) or embedding-3 (GLM, 2048d) |
input | string/array | Yes | Text or array of texts to embed |
Example
response = client.embeddings.create(
model="text-embedding-v3",
input="Hello world"
)
print(response.data[0].embedding[:5]) # [0.0023, -0.0094, ...]
Vision
Analyze images using vision models. Same endpoint — just add image content to messages.
Available Vision Models
| Model ID | Context | Best For |
|---|---|---|
qwen-vl-max | 128K | High-quality image understanding |
qwen-vl-plus | 128K | Fast & affordable vision |
glm-4v | 128K | GLM vision model |
Example: Image URL
response = client.chat.completions.create(
model="qwen-vl-max",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}]
)
Example: Base64 image
response = client.chat.completions.create(
model="qwen-vl-max",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
]
}]
)
Streaming
Set stream: true to receive partial results as Server-Sent Events (SSE). Essential for chat interfaces and long responses.
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Write a poem about the sea"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Raw SSE format (cURL)
curl https://api.tunanapi.com/v1/chat/completions \ -H "Authorization: Bearer sk-YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"deepseek-chat","messages":[{"role":"user","content":"Hi"}],"stream":true}' # Each chunk: data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]} data: [DONE]
Fallback & Routing
How it works
When you call a model like deepseek-chat, TunanAPI routes your request to the primary provider. If that provider returns a 5xx error or times out, the system automatically retries on a configured backup channel. You get:
- Higher uptime — one provider going down doesn't break your app
- Zero code changes — just call the model as usual
- Transparent billing — you're billed by the model that actually served the request
deepseek-chat has backup channels) to maximize uptime.
Rate Limits
| Limit | Free Tier | Paid |
|---|---|---|
| Requests per minute (RPM) | 60 | 60 |
| Tokens per minute (TPM) | ~500K | ~500K |
| Max concurrent requests | 5 | 10 |
Rate limit headers
Every response includes headers to help you track your usage:
| Header | Description |
|---|---|
X-RateLimit-Limit | Your RPM limit |
X-RateLimit-Remaining | Requests remaining in this window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Error Codes
| Status | Type | Description | Action |
|---|---|---|---|
400 | Bad Request | Invalid parameters or malformed JSON | Check request body format |
401 | Unauthorized | Invalid or missing API key | Verify your API key |
402 | Insufficient Quota | Not enough credits | Top up at tunanapi.com |
404 | Not Found | Model not available or endpoint doesn't exist | Check model ID spelling |
429 | Rate Limited | Too many requests | Slow down, use exponential backoff |
500 | Server Error | Internal error | Retry after a moment |
503 | Unavailable | Upstream provider down | Retry; fallback may activate |
Error response format
{
"error": {
"message": "Insufficient quota. Please top up your account.",
"type": "insufficient_quota",
"code": 402
}
}
Retry strategy
Billing & Quota
How billing works
You're billed per token, per the pricing table above. There are no subscriptions or minimums — pay only for what you use.
Check your balance
# Using API curl https://api.tunanapi.com/api/user/self \ -H "Authorization: Bearer sk-YOUR_API_KEY" # Response includes: {"data": {"quota": 500000, "used_quota": 12345, ...}}
Quota is in units where 1 unit = $0.002 / 1K tokens (1,000,000 units = $2).
Top up
Visit tunanapi.com dashboard → Top Up. We accept credit cards and PayPal.
Pricing Tiers
| Tier | Price | Credits | Bonus |
|---|---|---|---|
| Starter | $5 | $5.00 | — |
| Growth | $20 | $21.00 | +5% |
| Business | $50 | $55.00 | +10% |
| Enterprise | $100 | $115.00 | +15% |
Integrations
LangChain
from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="deepseek-chat", api_key="sk-YOUR_API_KEY", base_url="https://api.tunanapi.com/v1" ) response = llm.invoke("Explain quantum computing in one paragraph")
LlamaIndex
from llama_index.llms.openai import OpenAI as LlamaOpenAI llm = LlamaOpenAI( model="deepseek-chat", api_key="sk-YOUR_API_KEY", api_base="https://api.tunanapi.com/v1" )
CrewAI
from crewai import Agent, Task, Crew import os os.environ["OPENAI_API_KEY"] = "sk-YOUR_API_KEY" os.environ["OPENAI_API_BASE"] = "https://api.tunanapi.com/v1" agent = Agent(role="Researcher", goal="Find insights", backstory="Expert researcher")
AutoGen
import autogen config_list = [{ "model": "deepseek-chat", "api_key": "sk-YOUR_API_KEY", "base_url": "https://api.tunanapi.com/v1" }] assistant = autogen.AssistantAgent("assistant", llm_config={"config_list": config_list})
Migration from OpenAI
base_url and api_key. All other code stays the same.
# Before (OpenAI) client = OpenAI(api_key="sk-openai-...") # uses api.openai.com # After (TunanAPI) client = OpenAI( api_key="sk-YOUR_TUNANAPI_KEY", base_url="https://api.tunanapi.com/v1" ) # Everything else is identical!
Model mapping
| OpenAI Model | TunanAPI Equivalent | Savings |
|---|---|---|
gpt-4o | deepseek-chat | ~97% |
gpt-4o-mini | qwen3.5-flash | ~90% |
o1 | deepseek-reasoner | ~95% |
gpt-4-turbo | qwen3.7-max | ~93% |
Changelog
2026-06-14
- Launched comprehensive API documentation
- Added Fallback & Routing documentation
- Added Vision API and Embeddings documentation
- Added integration guides (LangChain, LlamaIndex, CrewAI, AutoGen)
2026-06-09
- Pricing V2.1上线 — 混合毛利率策略,8 models online
- MiniMax M3 (1M context) added
2026-06-07
- I Ching Oracle product launch at oracle.tunanapi.com
- Terms of Service & Privacy Policy published