API Documentation

AITokenPass issues real Venice AI API keys. Use them directly with Venice's OpenAI-compatible API — no proxy, no middleware.

How It Works

AITokenPass is a marketplace for discounted Venice AI API credits. When you purchase credits, we generate a real Venice INFERENCE API key for you via Venice's key management API. You then call Venice's API directly — we are not a proxy.

Flow

  1. Buy credits on AITokenPass (choose diem/day and dates)
  2. Receive a Venice API key with your diem consumption limit
  3. Call Venice's API at api.venice.ai
  4. Key auto-expires at the end of your purchased dates

Authentication

Use the Venice API key you received from AITokenPass in the Authorization header as a Bearer token.

Header
Authorization: Bearer YOUR_VENICE_API_KEY

Your key has a diem consumption limit set by your purchase. Venice tracks usage automatically. The key expires at the end of your last purchased date.

Base URL

All API calls go directly to Venice. This is not an AITokenPass URL — you call Venice directly.

https://api.venice.ai/api/v1

If you are using the OpenAI SDK, set the base_url (Python) or baseURL (Node.js) configuration option.

Python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_VENICE_API_KEY",
    base_url="https://api.venice.ai/api/v1"
)
Node.js
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_VENICE_API_KEY',
  baseURL: 'https://api.venice.ai/api/v1',
});

Chat Completions

Create a chat completion. The request format is identical to the OpenAI Chat Completions API.

cURL
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
Response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Models

Venice offers a variety of open-source and proprietary models. Use the models endpoint to list available options.

Request
GET https://api.venice.ai/api/v1/models
Authorization: Bearer YOUR_VENICE_API_KEY

Popular models include llama-3.3-70b, deepseek-r1-671b, and others. Check Venice's documentation for the full list.

Streaming

Set "stream": true to receive responses as Server-Sent Events (SSE).

cURL Example
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Rate Limits

Rate limits are managed by Venice based on your API key type. Your key has a diem consumption limit set at purchase time.

LimitDetails
Diem consumptionPer your purchase
Key expiryEnd of last purchased date
Request limitsSet by Venice

Error Codes

Venice uses standard HTTP status codes. Error responses include a JSON body with details.

CodeDescription
401Invalid or missing API key
403Key expired or consumption limit reached
429Rate limit exceeded
500Internal server error

Ready to get started?

Buy discounted Venice AI API credits and start making calls in minutes.

Buy Credits