API Documentation

Complete reference for the RooyaLLM API. OpenAI-compatible interface that works with every major AI provider.

HTTPS

JSON

Base URLhttps://internal-api.automationss.online

Overview

RooyaLLM provides a unified, OpenAI-compatible API that routes requests to 100+ AI models from all major providers (OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek, and more).

Feature	Description
Unified API	Single endpoint for every LLM — OpenAI-compatible interface
Streaming	Full SSE streaming support with real-time token delivery
Cost Tracking	Automatic per-request cost calculation and budget enforcement
Rate Limiting	Per-key RPM + TPD limits with Redis-backed sliding windows
3-Tier Auth Cache	In-memory → Redis → Database key validation (<1μs hot path)
Model Health	Auto-synced model catalog with live health monitoring

Authentication

All API requests (except /v1/models and /health) require a valid API key via the Authorization header using the Bearer scheme.

Authorization: Bearer sk-your-api-key

Obtaining an API Key

Sign up at the registration page
Navigate to Dashboard → API Keys
Click Create Key, set a name and optional budget limit
Copy the key immediately — it is shown only once

⚠️

Security: API keys are hashed with SHA-256 before storage. The plaintext key cannot be recovered. Store it securely.

Key Format

sk-<random-alphanumeric-string>

Quick Start

The fastest way to get started — just point your OpenAI SDK to your gateway URL and set your API key:

python

from openai import OpenAI

client = OpenAI(
    base_url="https://internal-api.automationss.online/v1",
    api_key="sk-your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

That's it. Every OpenAI SDK feature works — streaming, function calling, vision, and more.

Chat Completions

Creates a chat completion for a conversation. This is the primary endpoint for interacting with LLMs.

POST/v1/chat/completions

Request Body

Parameter	Type	Required	Default	Description
`model`	string	✓	—	Model ID (e.g., gpt-4o, claude-3.5-sonnet, gemini-1.5-pro)
`messages`	array	✓	—	Array of message objects with role and content
`temperature`	number	—	1.0	Sampling temperature between 0 and 2
`top_p`	number	—	1.0	Nucleus sampling between 0 and 1
`n`	integer	—	1	Number of completions to generate (1–128)
`stream`	boolean	—	false	Enable Server-Sent Events streaming
`stop`	string \| array	—	null	Up to 4 stop sequences
`max_tokens`	integer	—	—	Maximum tokens to generate
`presence_penalty`	number	—	0	Penalty for new topics (−2 to 2)
`frequency_penalty`	number	—	0	Penalty for repetition (−2 to 2)
`user`	string	—	—	Unique end-user identifier

Example Request

bash

curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits instead of traditional bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 150,
    "total_tokens": 178
  }
}

Completions (Legacy)

Creates a text completion. This is the legacy completions endpoint for non-chat models.

POST/v1/completions

Parameter	Type	Required	Default	Description
`model`	string	✓	—	Model ID
`prompt`	string \| array	✓	—	The prompt(s) to complete
`suffix`	string	—	null	Text after the completion
`max_tokens`	integer	—	16	Maximum tokens to generate
`temperature`	number	—	1.0	Sampling temperature (0–2)
`top_p`	number	—	1.0	Nucleus sampling (0–1)
`n`	integer	—	1	Number of completions (1–128)
`stream`	boolean	—	false	Enable SSE streaming
`stop`	string \| array	—	null	Up to 4 stop sequences

bash

curl https://internal-api.automationss.online/v1/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "Write a haiku about programming:",
    "max_tokens": 50,
    "temperature": 0.8
  }'

Embeddings

Generate vector embeddings for text input.

POST/v1/embeddings

Parameter	Type	Required	Default	Description
`model`	string	✓	—	Embedding model (e.g., text-embedding-3-small)
`input`	string \| array	✓	—	Text(s) to embed
`encoding_format`	string	—	float	float or base64
`user`	string	—	—	End-user identifier

bash

curl https://internal-api.automationss.online/v1/embeddings \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

List Models

Returns a list of all available models. No authentication required.

GET/v1/models

json

{
  "object": "list",
  "data": [
    { "id": "gpt-4o", "object": "model", "created": 1714000000, "owned_by": "openai" },
    { "id": "claude-3.5-sonnet", "object": "model", "created": 1714000000, "owned_by": "anthropic" }
  ]
}

Get Model

Retrieve details for a specific model. No authentication required.

GET/v1/models/{model_id}

Parameter	Description
`model_id`	The ID of the model (e.g., gpt-4o)

json

{
  "id": "gpt-4o",
  "object": "model",
  "created": 1714000000,
  "owned_by": "openai"
}

Model Health Status

Returns the full model catalog with live health status, availability, and latency data. No authentication required.

GET/v1/models/status

json

{
  "object": "list",
  "data": [
    {
      "id": "uuid",
      "model_id": "gpt-4o",
      "provider": "openai",
      "model_type": "chat",
      "is_available": true,
      "health_status": "healthy",
      "last_health_latency_ms": 245
    }
  ],
  "total": 150, "healthy": 142, "down": 3, "unknown": 5
}

Health Check

Server health endpoint for monitoring and load balancers.

GET/health

json

{
  "status": "ok",
  "timestamp": "2026-04-25T02:00:00.000Z",
  "uptime": 86400,
  "memory": { "rss": 128, "heap": 64 }
}

Streaming (SSE)

Set "stream": true to receive responses as Server-Sent Events in real time.

How It Works

The server responds with Content-Type: text/event-stream
Each chunk is a data: line containing a JSON delta
The stream terminates with data: [DONE]

cURL Example

bash

curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Rate Limiting

The gateway enforces rate limits at two levels:

1. IP-Based Rate Limit

120 requests per minute per IP address. Applies to all requests regardless of authentication.

2. Per-Key Rate Limits

Each API key has configurable limits set at creation time:

Limit	Free	Pro	Enterprise
Requests per minute (RPM)	60	300	10,000
Tokens per day (TPD)	100,000	1,000,000	100,000,000

Rate Limit Headers

X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 55
X-RateLimit-Limit-TPD: 100000
X-RateLimit-Remaining-TPD: 98500

When Rate Limited

json

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error"
  }
}

HTTP Status: 429 Too Many Requests · Header: Retry-After: <seconds>

Budget Controls

API keys can have an optional budget limit to cap spending.

Set during key creation in the dashboard
Enforced in real time via Redis
When exceeded, requests return 402 Payment Required

json

{
  "error": {
    "message": "Budget limit exceeded for this API key",
    "type": "budget_exceeded"
  }
}

Error Handling

All errors follow a consistent format:

json

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type"
  }
}

Error Types & HTTP Status Codes

Status	Type	Description
`400`	invalid_request_error	Malformed request body or invalid parameters
`401`	invalid_request_error	Missing, invalid, inactive, or expired API key
`402`	budget_exceeded	API key budget limit reached
`429`	rate_limit_error	RPM/TPD or IP rate limit exceeded
`500`	server_error	Internal server error
`504`	timeout_error	Request timed out (30s default, 300s for streams)

SDK Integration

Python

bash

pip install openai

python

from openai import OpenAI

client = OpenAI(
    base_url="https://internal-api.automationss.online/v1",
    api_key="sk-your-api-key"
)

# Chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=500
)
print(response.choices[0].message.content)

# Embeddings
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
print(f"Dimensions: {len(embedding.data[0].embedding)}")

JavaScript / TypeScript

bash

npm install openai

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://internal-api.automationss.online/v1',
  apiKey: 'sk-your-api-key',
});

// Chat completion
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain REST APIs' }],
  max_tokens: 500,
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

cURL

bash

# Chat completion
curl https://internal-api.automationss.online/v1/chat/completions \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List models (no auth required)
curl https://internal-api.automationss.online/v1/models

# Embeddings
curl https://internal-api.automationss.online/v1/embeddings \\
  -H "Authorization: Bearer sk-your-api-key" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'

Best Practices

1. Use Streaming for Long Responses

Streaming provides a better UX and avoids timeouts for long completions. The stream timeout is 300 seconds vs 30 seconds for non-streaming.

2. Set Budget Limits

Protect against runaway costs by setting a budget limit on each API key in the dashboard.

3. Handle Rate Limits Gracefully

Implement exponential backoff when you receive 429 responses:

python

import time

def call_with_retry(fn, max_retries=3):
    for attempt in range(max_retries):
        try:
            return fn()
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise Exception("Max retries exceeded")

4. Monitor Your Usage

Check the Dashboard for real-time cost tracking, per-model breakdowns, and request volume charts.

5. Rotate Keys Periodically

Create new keys and deactivate old ones regularly. Manage all keys from the dashboard without downtime.

6. Check Model Health Before Calling

Use GET /v1/models/status to check model availability before making requests.

Supported Models

Provider	Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic	claude-3.5-sonnet, claude-3-opus, claude-3-haiku
Google	gemini-1.5-pro, gemini-1.5-flash, gemini-pro
Meta	llama-3.1-70b, llama-3.1-8b
Mistral	mistral-large, mistral-medium, mistral-small
DeepSeek	deepseek-chat, deepseek-coder

💡

Full list available via GET /v1/models

Request Type	Timeout
`Non-streaming requests`	30 seconds
`Streaming requests`	300 seconds (5 minutes)

Header	Description
`X-RateLimit-Limit-RPM`	Your RPM limit for this key
`X-RateLimit-Remaining-RPM`	Remaining requests this minute
`X-RateLimit-Limit-TPD`	Your TPD limit for this key
`X-RateLimit-Remaining-TPD`	Remaining tokens today
`X-Request-ID`	Unique request identifier for debugging
`Retry-After`	Seconds to wait (on 429 responses)

RooyaLLM — One API for Every AI Model