FAQ
The questions users ask most often. If yours isn't covered, please reach out via the official group.
Billing & balance
How does pricing work?
You're charged per actual token usage at the upstream rate, with a discount applied — no subscription or monthly fee. Every call's input/output tokens, model, and actual cost are visible in Console → Logs.
How do I check my balance?
Your remaining balance is shown at the top of the console home page (in USD-equivalent units).
Do you provide invoices?
Not at the moment. If you need one, please contact us via the group.
Are top-ups refundable?
Refunds are not supported except for clear service failures — please top up only what you need.
API Key & tokens
Can I create multiple tokens?
Yes. We recommend creating separate tokens per app or environment (e.g. app-prod, app-dev) so you can monitor usage independently and isolate risk.
How should I set the token quota?
A quota of 0 means unlimited (still bounded by account balance). If you're worried about a buggy script burning through your balance, set a per-token cap.
What if a token is leaked?
Delete the token immediately in Console → Tokens — your balance stays in the account, just create a new token.
Picking a model
Which model should I pick?
- Daily chat / coding assistance:
deepseek-chat,claude-haiku-4-5(best cost-performance) - Complex reasoning / long documents:
claude-sonnet-4-6,claude-opus-4-7,deepseek-reasoner - Math / chain-of-thought reasoning:
deepseek-reasoner,claude-opus-4-7(strong reasoning) - Multimodal (images):
claude-sonnet-4-6,claude-opus-4-7
What's each model's context window?
We follow each provider's official limits (Claude 200K, DeepSeek 64K, etc.). See the Model list for details.
Performance & rate limits
Are there concurrency limits?
The default is 100 concurrent requests per token; account-level limits are scheduled dynamically based on upstream quotas. Contact support if you need a higher cap.
What about occasional timeouts?
- Set the client HTTP timeout to at least 60s (180s for reasoning models)
- Add automatic retries with exponential backoff for idempotent operations (max 3 attempts)
- For non-critical paths, fall back across models (e.g. retry on
claude-haiku-4-5whenclaude-sonnet-4-6fails)
Is it slower than going to the upstream provider directly?
Our hosting is in mainland China, so from inside China access is usually faster than calling the upstream directly (which has to route around the Great Firewall). Access from outside China may be slightly slower than direct.
Common developer errors
401 invalid_api_key
Check: 1) the Authorization header spelling; 2) whether the token has been disabled or deleted; 3) whether the header includes the Bearer prefix.
404 model_not_found
Model name typo, or the model has been retired. Check the latest available names at main site → Models.
402 insufficient_quota
Account balance or token quota exhausted. Top up or raise the per-token cap in its settings.
429 rate_limit_exceeded
Rate limit hit. Lower concurrency or add exponential-backoff retries on the client side.
Other
Do you provide Embedding / TTS / Whisper?
Not yet — this is on the roadmap. The OpenAI-family Embedding (text-embedding-3-*), TTS (tts-1), Whisper, etc. will be exposed once the OpenAI channel goes live, with the same calling convention as upstream OpenAI. If you have a strong need, please raise it in the group so we can prioritize.
Do you support image generation (DALL-E / Midjourney)?
Not yet — this is on the roadmap. We plan to enable dall-e-3, gpt-image-1, doubao-seedream-4-0 and similar image-generation models; Midjourney has no firm timeline because it requires a special integration. Please don't depend on image generation in production right now.
Are requests logged or used for training?
This site only logs metadata (timestamp, model, token usage) — we do not store messages content. Each upstream provider's privacy policy applies independently for the data they handle.