API Documentation

AITokenPass issues real Venice AI API keys. Use them directly with Venice's OpenAI-compatible API — no proxy, no middleware.

How It Works

AITokenPass is a marketplace for discounted Venice AI API credits. When you purchase credits, we generate a real Venice INFERENCE API key for you via Venice's key management API. You then call Venice's API directly — we are not a proxy.

Flow

Buy credits on AITokenPass (choose diem/day and dates)
Receive a Venice API key with your diem consumption limit
Call Venice's API at api.venice.ai
Key auto-expires at the end of your purchased dates

Authentication

Use the Venice API key you received from AITokenPass in the Authorization header as a Bearer token.

Header

Authorization: Bearer YOUR_VENICE_API_KEY

Your key has a diem consumption limit set by your purchase. Venice tracks usage automatically. The key expires at the end of your last purchased date.

Base URL

All API calls go directly to Venice. This is not an AITokenPass URL — you call Venice directly.

https://api.venice.ai/api/v1

If you are using the OpenAI SDK, set the base_url (Python) or baseURL (Node.js) configuration option.

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_VENICE_API_KEY",
    base_url="https://api.venice.ai/api/v1"
)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_VENICE_API_KEY',
  baseURL: 'https://api.venice.ai/api/v1',
});

Chat Completions

Create a chat completion. The request format is identical to the OpenAI Chat Completions API.

cURL

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "llama-3.3-70b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Models

Venice offers a variety of open-source and proprietary models. Use the models endpoint to list available options.

Request

GET https://api.venice.ai/api/v1/models
Authorization: Bearer YOUR_VENICE_API_KEY

Popular models include llama-3.3-70b, deepseek-r1-671b, and others. Check Venice's documentation for the full list.

Streaming

Set "stream": true to receive responses as Server-Sent Events (SSE).

cURL Example

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Rate Limits

Rate limits are managed by Venice based on your API key type. Your key has a diem consumption limit set at purchase time.

Limit	Details
Diem consumption	Per your purchase
Key expiry	End of last purchased date
Request limits	Set by Venice

Error Codes

Venice uses standard HTTP status codes. Error responses include a JSON body with details.

Code	Description
401	Invalid or missing API key
403	Key expired or consumption limit reached
429	Rate limit exceeded
500	Internal server error

Ready to get started?

Buy discounted Venice AI API credits and start making calls in minutes.

Buy Credits