API Overview¶
Vectorcache provides a RESTful API for semantic caching of LLM responses.
Base URL¶
Authentication¶
All API requests require authentication using an API key in the Authorization header:
Get your API key from the dashboard.
Endpoints¶
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/cache/query |
Query the semantic cache |
Request Format¶
All requests must include:
Content-Type: application/jsonheader- JSON request body with required parameters
- Bearer token authentication
Response Format¶
All successful responses return JSON with:
{
"cache_hit": boolean,
"response": string,
"similarity_score": number | null,
"cost_saved": number,
"llm_provider": string
}
Rate Limits¶
- Free tier: 100 requests/minute
- Pro tier: 1,000 requests/minute
- Enterprise: Custom limits
Rate limit headers are included in responses:
Versioning¶
The API is versioned via the URL path (/v1/). Breaking changes will result in a new version.
Current version: v1
Error Handling¶
All errors return a JSON response with a detail field:
See Error Handling for complete error codes and handling.
Quick Example¶
curl -X POST "https://api.vectorcache.ai/v1/cache/query" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is machine learning?",
"model": "gpt-4o",
"similarity_threshold": 0.85
}'
Next Steps¶
- Authentication - API key management
- Cache Endpoints - Detailed endpoint documentation
- Error Handling - Error codes and handling