API Documentation

Complete reference for the RooyaLLM API. OpenAI-compatible interface that works with every major AI provider.

v1
HTTPS
JSON
Base URLhttps://internal-api.automationss.online

Overview

RooyaLLM provides a unified, OpenAI-compatible API that routes requests to 100+ AI models from all major providers (OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek, and more).

FeatureDescription
Unified APISingle endpoint for every LLM — OpenAI-compatible interface
StreamingFull SSE streaming support with real-time token delivery
Cost TrackingAutomatic per-request cost calculation and budget enforcement
Rate LimitingPer-key RPM + TPD limits with Redis-backed sliding windows
3-Tier Auth CacheIn-memory → Redis → Database key validation (<1μs hot path)
Model HealthAuto-synced model catalog with live health monitoring

Authentication

All API requests (except /v1/models and /health) require a valid API key via the Authorization header using the Bearer scheme.

Authorization: Bearer sk-your-api-key

Obtaining an API Key

  1. Sign up at the registration page
  2. Navigate to Dashboard → API Keys
  3. Click Create Key, set a name and optional budget limit
  4. Copy the key immediately — it is shown only once
⚠️
Security: API keys are hashed with SHA-256 before storage. The plaintext key cannot be recovered. Store it securely.

Key Format

sk-<random-alphanumeric-string>

Quick Start

The fastest way to get started — just point your OpenAI SDK to your gateway URL and set your API key:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://internal-api.automationss.online/v1",
    api_key="sk-your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

That's it. Every OpenAI SDK feature works — streaming, function calling, vision, and more.

Chat Completions

Creates a chat completion for a conversation. This is the primary endpoint for interacting with LLMs.

POST/v1/chat/completions

Request Body

ParameterTypeRequiredDefaultDescription
modelstringModel ID (e.g., gpt-4o, claude-3.5-sonnet, gemini-1.5-pro)
messagesarrayArray of message objects with role and content
temperaturenumber1.0Sampling temperature between 0 and 2
top_pnumber1.0Nucleus sampling between 0 and 1
ninteger1Number of completions to generate (1–128)
streambooleanfalseEnable Server-Sent Events streaming
stopstring | arraynullUp to 4 stop sequences
max_tokensintegerMaximum tokens to generate
presence_penaltynumber0Penalty for new topics (−2 to 2)
frequency_penaltynumber0Penalty for repetition (−2 to 2)
userstringUnique end-user identifier

Example Request

bash
curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'
json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits instead of traditional bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 150,
    "total_tokens": 178
  }
}

Completions (Legacy)

Creates a text completion. This is the legacy completions endpoint for non-chat models.

POST/v1/completions
ParameterTypeRequiredDefaultDescription
modelstringModel ID
promptstring | arrayThe prompt(s) to complete
suffixstringnullText after the completion
max_tokensinteger16Maximum tokens to generate
temperaturenumber1.0Sampling temperature (0–2)
top_pnumber1.0Nucleus sampling (0–1)
ninteger1Number of completions (1–128)
streambooleanfalseEnable SSE streaming
stopstring | arraynullUp to 4 stop sequences
bash
curl https://internal-api.automationss.online/v1/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "Write a haiku about programming:",
    "max_tokens": 50,
    "temperature": 0.8
  }'

Embeddings

Generate vector embeddings for text input.

POST/v1/embeddings
ParameterTypeRequiredDefaultDescription
modelstringEmbedding model (e.g., text-embedding-3-small)
inputstring | arrayText(s) to embed
encoding_formatstringfloatfloat or base64
userstringEnd-user identifier
bash
curl https://internal-api.automationss.online/v1/embeddings \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

List Models

Returns a list of all available models. No authentication required.

GET/v1/models
json
{
  "object": "list",
  "data": [
    { "id": "gpt-4o", "object": "model", "created": 1714000000, "owned_by": "openai" },
    { "id": "claude-3.5-sonnet", "object": "model", "created": 1714000000, "owned_by": "anthropic" }
  ]
}

Get Model

Retrieve details for a specific model. No authentication required.

GET/v1/models/{model_id}
ParameterDescription
model_idThe ID of the model (e.g., gpt-4o)
json
{
  "id": "gpt-4o",
  "object": "model",
  "created": 1714000000,
  "owned_by": "openai"
}

Model Health Status

Returns the full model catalog with live health status, availability, and latency data. No authentication required.

GET/v1/models/status
json
{
  "object": "list",
  "data": [
    {
      "id": "uuid",
      "model_id": "gpt-4o",
      "provider": "openai",
      "model_type": "chat",
      "is_available": true,
      "health_status": "healthy",
      "last_health_latency_ms": 245
    }
  ],
  "total": 150, "healthy": 142, "down": 3, "unknown": 5
}

Health Check

Server health endpoint for monitoring and load balancers.

GET/health
json
{
  "status": "ok",
  "timestamp": "2026-04-25T02:00:00.000Z",
  "uptime": 86400,
  "memory": { "rss": 128, "heap": 64 }
}

Streaming (SSE)

Set "stream": true to receive responses as Server-Sent Events in real time.

How It Works

  1. The server responds with Content-Type: text/event-stream
  2. Each chunk is a data: line containing a JSON delta
  3. The stream terminates with data: [DONE]

cURL Example

bash
curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Rate Limiting

The gateway enforces rate limits at two levels:

1. IP-Based Rate Limit

120 requests per minute per IP address. Applies to all requests regardless of authentication.

2. Per-Key Rate Limits

Each API key has configurable limits set at creation time:

LimitFreeProEnterprise
Requests per minute (RPM)6030010,000
Tokens per day (TPD)100,0001,000,000100,000,000

Rate Limit Headers

X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 55
X-RateLimit-Limit-TPD: 100000
X-RateLimit-Remaining-TPD: 98500

When Rate Limited

json
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}

HTTP Status: 429 Too Many Requests · Header: Retry-After: <seconds>

Budget Controls

API keys can have an optional budget limit to cap spending.

  • Set during key creation in the dashboard
  • Enforced in real time via Redis
  • When exceeded, requests return 402 Payment Required
json
{
  "error": {
    "message": "Budget limit exceeded for this API key",
    "type": "budget_exceeded"
  }
}

Error Handling

All errors follow a consistent format:

json
{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type"
  }
}

Error Types & HTTP Status Codes

StatusTypeDescription
400invalid_request_errorMalformed request body or invalid parameters
401invalid_request_errorMissing, invalid, inactive, or expired API key
402budget_exceededAPI key budget limit reached
429rate_limit_errorRPM/TPD or IP rate limit exceeded
500server_errorInternal server error
504timeout_errorRequest timed out (30s default, 300s for streams)

SDK Integration

Python

bash
pip install openai
python
from openai import OpenAI

client = OpenAI(
    base_url="https://internal-api.automationss.online/v1",
    api_key="sk-your-api-key"
)

# Chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=500
)
print(response.choices[0].message.content)

# Embeddings
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")

JavaScript / TypeScript

bash
npm install openai
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://internal-api.automationss.online/v1',
  apiKey: 'sk-your-api-key',
});

// Chat completion
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain REST APIs' }],
  max_tokens: 500,
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

cURL

bash
# Chat completion
curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List models (no auth required)
curl https://internal-api.automationss.online/v1/models

# Embeddings
curl https://internal-api.automationss.online/v1/embeddings \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'

Best Practices

1. Use Streaming for Long Responses

Streaming provides a better UX and avoids timeouts for long completions. The stream timeout is 300 seconds vs 30 seconds for non-streaming.

2. Set Budget Limits

Protect against runaway costs by setting a budget limit on each API key in the dashboard.

3. Handle Rate Limits Gracefully

Implement exponential backoff when you receive 429 responses:

python
import time

def call_with_retry(fn, max_retries=3):
    for attempt in range(max_retries):
        try:
            return fn()
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise Exception("Max retries exceeded")

4. Monitor Your Usage

Check the Dashboard for real-time cost tracking, per-model breakdowns, and request volume charts.

5. Rotate Keys Periodically

Create new keys and deactivate old ones regularly. Manage all keys from the dashboard without downtime.

6. Check Model Health Before Calling

Use GET /v1/models/status to check model availability before making requests.

Supported Models

ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropicclaude-3.5-sonnet, claude-3-opus, claude-3-haiku
Googlegemini-1.5-pro, gemini-1.5-flash, gemini-pro
Metallama-3.1-70b, llama-3.1-8b
Mistralmistral-large, mistral-medium, mistral-small
DeepSeekdeepseek-chat, deepseek-coder
💡
Full list available via GET /v1/models
Request TypeTimeout
Non-streaming requests30 seconds
Streaming requests300 seconds (5 minutes)
HeaderDescription
X-RateLimit-Limit-RPMYour RPM limit for this key
X-RateLimit-Remaining-RPMRemaining requests this minute
X-RateLimit-Limit-TPDYour TPD limit for this key
X-RateLimit-Remaining-TPDRemaining tokens today
X-Request-IDUnique request identifier for debugging
Retry-AfterSeconds to wait (on 429 responses)

RooyaLLM — One API for Every AI Model