Skip to content

API Overview

Vectorcache provides a RESTful API for semantic caching of LLM responses.

Base URL

https://api.vectorcache.ai

Authentication

All API requests require authentication using an API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

Get your API key from the dashboard.

Endpoints

Method Endpoint Description
POST /v1/cache/query Query the semantic cache

Request Format

All requests must include:

  • Content-Type: application/json header
  • JSON request body with required parameters
  • Bearer token authentication

Response Format

All successful responses return JSON with:

{
  "cache_hit": boolean,
  "response": string,
  "similarity_score": number | null,
  "cost_saved": number,
  "llm_provider": string
}

Rate Limits

  • Free tier: 100 requests/minute
  • Pro tier: 1,000 requests/minute
  • Enterprise: Custom limits

Rate limit headers are included in responses:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1640000000

Versioning

The API is versioned via the URL path (/v1/). Breaking changes will result in a new version.

Current version: v1

Error Handling

All errors return a JSON response with a detail field:

{
  "detail": "Error message describing what went wrong"
}

See Error Handling for complete error codes and handling.

Quick Example

curl -X POST "https://api.vectorcache.ai/v1/cache/query" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is machine learning?",
    "model": "gpt-4o",
    "similarity_threshold": 0.85
  }'

Next Steps