API Documentation
Complete reference for the RooyaLLM API. OpenAI-compatible interface that works with every major AI provider.
https://internal-api.automationss.onlineOverview
RooyaLLM provides a unified, OpenAI-compatible API that routes requests to 100+ AI models from all major providers (OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek, and more).
| Feature | Description |
|---|---|
| Unified API | Single endpoint for every LLM — OpenAI-compatible interface |
| Streaming | Full SSE streaming support with real-time token delivery |
| Cost Tracking | Automatic per-request cost calculation and budget enforcement |
| Rate Limiting | Per-key RPM + TPD limits with Redis-backed sliding windows |
| 3-Tier Auth Cache | In-memory → Redis → Database key validation (<1μs hot path) |
| Model Health | Auto-synced model catalog with live health monitoring |
Authentication
All API requests (except /v1/models and /health) require a valid API key via the Authorization header using the Bearer scheme.
Authorization: Bearer sk-your-api-keyObtaining an API Key
- Sign up at the registration page
- Navigate to Dashboard → API Keys
- Click Create Key, set a name and optional budget limit
- Copy the key immediately — it is shown only once
Key Format
sk-<random-alphanumeric-string>Quick Start
The fastest way to get started — just point your OpenAI SDK to your gateway URL and set your API key:
from openai import OpenAI
client = OpenAI(
base_url="https://internal-api.automationss.online/v1",
api_key="sk-your-api-key"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)That's it. Every OpenAI SDK feature works — streaming, function calling, vision, and more.
Chat Completions
Creates a chat completion for a conversation. This is the primary endpoint for interacting with LLMs.
/v1/chat/completionsRequest Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | ✓ | — | Model ID (e.g., gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) |
messages | array | ✓ | — | Array of message objects with role and content |
temperature | number | — | 1.0 | Sampling temperature between 0 and 2 |
top_p | number | — | 1.0 | Nucleus sampling between 0 and 1 |
n | integer | — | 1 | Number of completions to generate (1–128) |
stream | boolean | — | false | Enable Server-Sent Events streaming |
stop | string | array | — | null | Up to 4 stop sequences |
max_tokens | integer | — | — | Maximum tokens to generate |
presence_penalty | number | — | 0 | Penalty for new topics (−2 to 2) |
frequency_penalty | number | — | 0 | Penalty for repetition (−2 to 2) |
user | string | — | — | Unique end-user identifier |
Example Request
curl https://internal-api.automationss.online/v1/chat/completions \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 500
}'{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses qubits instead of traditional bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 150,
"total_tokens": 178
}
}Completions (Legacy)
Creates a text completion. This is the legacy completions endpoint for non-chat models.
/v1/completions| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | ✓ | — | Model ID |
prompt | string | array | ✓ | — | The prompt(s) to complete |
suffix | string | — | null | Text after the completion |
max_tokens | integer | — | 16 | Maximum tokens to generate |
temperature | number | — | 1.0 | Sampling temperature (0–2) |
top_p | number | — | 1.0 | Nucleus sampling (0–1) |
n | integer | — | 1 | Number of completions (1–128) |
stream | boolean | — | false | Enable SSE streaming |
stop | string | array | — | null | Up to 4 stop sequences |
curl https://internal-api.automationss.online/v1/completions \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "gpt-3.5-turbo-instruct",
"prompt": "Write a haiku about programming:",
"max_tokens": 50,
"temperature": 0.8
}'Embeddings
Generate vector embeddings for text input.
/v1/embeddings| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | ✓ | — | Embedding model (e.g., text-embedding-3-small) |
input | string | array | ✓ | — | Text(s) to embed |
encoding_format | string | — | float | float or base64 |
user | string | — | — | End-user identifier |
curl https://internal-api.automationss.online/v1/embeddings \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"model": "text-embedding-3-small",
"usage": { "prompt_tokens": 9, "total_tokens": 9 }
}List Models
Returns a list of all available models. No authentication required.
/v1/models{
"object": "list",
"data": [
{ "id": "gpt-4o", "object": "model", "created": 1714000000, "owned_by": "openai" },
{ "id": "claude-3.5-sonnet", "object": "model", "created": 1714000000, "owned_by": "anthropic" }
]
}Get Model
Retrieve details for a specific model. No authentication required.
/v1/models/{model_id}| Parameter | Description |
|---|---|
model_id | The ID of the model (e.g., gpt-4o) |
{
"id": "gpt-4o",
"object": "model",
"created": 1714000000,
"owned_by": "openai"
}Model Health Status
Returns the full model catalog with live health status, availability, and latency data. No authentication required.
/v1/models/status{
"object": "list",
"data": [
{
"id": "uuid",
"model_id": "gpt-4o",
"provider": "openai",
"model_type": "chat",
"is_available": true,
"health_status": "healthy",
"last_health_latency_ms": 245
}
],
"total": 150, "healthy": 142, "down": 3, "unknown": 5
}Health Check
Server health endpoint for monitoring and load balancers.
/health{
"status": "ok",
"timestamp": "2026-04-25T02:00:00.000Z",
"uptime": 86400,
"memory": { "rss": 128, "heap": 64 }
}Streaming (SSE)
Set "stream": true to receive responses as Server-Sent Events in real time.
How It Works
- The server responds with
Content-Type: text/event-stream - Each chunk is a
data:line containing a JSON delta - The stream terminates with
data: [DONE]
cURL Example
curl https://internal-api.automationss.online/v1/chat/completions \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Count to 5"}],
"stream": true
}'Rate Limiting
The gateway enforces rate limits at two levels:
1. IP-Based Rate Limit
120 requests per minute per IP address. Applies to all requests regardless of authentication.
2. Per-Key Rate Limits
Each API key has configurable limits set at creation time:
| Limit | Free | Pro | Enterprise |
|---|---|---|---|
| Requests per minute (RPM) | 60 | 300 | 10,000 |
| Tokens per day (TPD) | 100,000 | 1,000,000 | 100,000,000 |
Rate Limit Headers
X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 55
X-RateLimit-Limit-TPD: 100000
X-RateLimit-Remaining-TPD: 98500When Rate Limited
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error"
}
}HTTP Status: 429 Too Many Requests · Header: Retry-After: <seconds>
Budget Controls
API keys can have an optional budget limit to cap spending.
- Set during key creation in the dashboard
- Enforced in real time via Redis
- When exceeded, requests return
402 Payment Required
{
"error": {
"message": "Budget limit exceeded for this API key",
"type": "budget_exceeded"
}
}Error Handling
All errors follow a consistent format:
{
"error": {
"message": "Human-readable error description",
"type": "error_type"
}
}Error Types & HTTP Status Codes
| Status | Type | Description |
|---|---|---|
400 | invalid_request_error | Malformed request body or invalid parameters |
401 | invalid_request_error | Missing, invalid, inactive, or expired API key |
402 | budget_exceeded | API key budget limit reached |
429 | rate_limit_error | RPM/TPD or IP rate limit exceeded |
500 | server_error | Internal server error |
504 | timeout_error | Request timed out (30s default, 300s for streams) |
SDK Integration
Python
pip install openaifrom openai import OpenAI
client = OpenAI(
base_url="https://internal-api.automationss.online/v1",
api_key="sk-your-api-key"
)
# Chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
# Embeddings
embedding = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world"
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")JavaScript / TypeScript
npm install openaiimport OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://internal-api.automationss.online/v1',
apiKey: 'sk-your-api-key',
});
// Chat completion
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Explain REST APIs' }],
max_tokens: 500,
});
console.log(response.choices[0].message.content);
// Streaming
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}cURL
# Chat completion
curl https://internal-api.automationss.online/v1/chat/completions \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# List models (no auth required)
curl https://internal-api.automationss.online/v1/models
# Embeddings
curl https://internal-api.automationss.online/v1/embeddings \\
-H "Authorization: Bearer sk-your-api-key" \\
-H "Content-Type: application/json" \\
-d '{
"model": "text-embedding-3-small",
"input": "Hello world"
}'Best Practices
1. Use Streaming for Long Responses
Streaming provides a better UX and avoids timeouts for long completions. The stream timeout is 300 seconds vs 30 seconds for non-streaming.
2. Set Budget Limits
Protect against runaway costs by setting a budget limit on each API key in the dashboard.
3. Handle Rate Limits Gracefully
Implement exponential backoff when you receive 429 responses:
import time
def call_with_retry(fn, max_retries=3):
for attempt in range(max_retries):
try:
return fn()
except RateLimitError:
wait = 2 ** attempt
time.sleep(wait)
raise Exception("Max retries exceeded")4. Monitor Your Usage
Check the Dashboard for real-time cost tracking, per-model breakdowns, and request volume charts.
5. Rotate Keys Periodically
Create new keys and deactivate old ones regularly. Manage all keys from the dashboard without downtime.
6. Check Model Health Before Calling
Use GET /v1/models/status to check model availability before making requests.
Supported Models
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo |
| Anthropic | claude-3.5-sonnet, claude-3-opus, claude-3-haiku |
| gemini-1.5-pro, gemini-1.5-flash, gemini-pro | |
| Meta | llama-3.1-70b, llama-3.1-8b |
| Mistral | mistral-large, mistral-medium, mistral-small |
| DeepSeek | deepseek-chat, deepseek-coder |
GET /v1/models| Request Type | Timeout |
|---|---|
Non-streaming requests | 30 seconds |
Streaming requests | 300 seconds (5 minutes) |
| Header | Description |
|---|---|
X-RateLimit-Limit-RPM | Your RPM limit for this key |
X-RateLimit-Remaining-RPM | Remaining requests this minute |
X-RateLimit-Limit-TPD | Your TPD limit for this key |
X-RateLimit-Remaining-TPD | Remaining tokens today |
X-Request-ID | Unique request identifier for debugging |
Retry-After | Seconds to wait (on 429 responses) |
RooyaLLM — One API for Every AI Model