Vectorcache Documentation¶

Welcome to Vectorcache - the intelligent semantic caching layer for LLM applications.

What is Vectorcache?¶

Vectorcache is an AI-powered caching solution that uses semantic similarity to cache and retrieve LLM responses. Instead of exact-match caching, Vectorcache understands the meaning of queries, dramatically improving cache hit rates and reducing API costs.

Key Features¶

🎯 Semantic Matching - Uses vector embeddings to match similar queries, not just identical ones
💰 Cost Reduction - Save up to 90% on LLM API costs with intelligent caching
⚡ Fast Response Times - Serve cached responses in milliseconds instead of seconds
🔒 Secure & Private - Your data is encrypted and isolated per project
🛠 Easy Integration - Drop-in SDK for JavaScript/TypeScript and Python
📊 Analytics Dashboard - Track cache performance, costs, and usage metrics

Quick Example¶

JavaScript/TypeScriptPythoncURL

import { VectorcacheClient } from 'vectorcache';

const client = new VectorcacheClient({
  apiKey: 'your_api_key',
  baseUrl: 'https://api.vectorcache.ai'
});

const result = await client.query({
  prompt: 'What is machine learning?',
  model: 'gpt-4o',
  similarityThreshold: 0.85
});

console.log(`Cache hit: ${result.cache_hit}`);
console.log(`Response: ${result.response}`);

import requests

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

data = {
    "prompt": "What is machine learning?",
    "model": "gpt-4o",
    "similarity_threshold": 0.85
}

response = requests.post(
    "https://api.vectorcache.ai/v1/cache/query",
    json=data,
    headers=headers
)

result = response.json()
print(f"Cache hit: {result['cache_hit']}")

curl -X POST "https://api.vectorcache.ai/v1/cache/query" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is machine learning?",
    "model": "gpt-4o",
    "similarity_threshold": 0.85
  }'

How It Works¶

Query Submission - Your application sends a prompt to Vectorcache
Semantic Search - Vectorcache searches for semantically similar cached queries
Cache Hit/Miss - Returns cached response if similarity exceeds threshold, otherwise calls your LLM
Cost Savings - Track savings and performance in real-time dashboard

Getting Started¶

Quick Start

Get up and running in 5 minutes

Quick Start Guide
Installation

Install the SDK for your platform

Installation Guide
API Reference

Complete API documentation

API Docs
FAQ

Common questions and answers

View FAQ

Use Cases¶

Customer Support Chatbots - Cache common questions and responses
Educational Platforms - Reduce costs for frequently asked educational queries
Documentation Search - Serve similar documentation queries from cache
Content Generation - Cache similar content requests
Data Analysis - Reuse responses for similar analytical queries

Why Vectorcache?¶

Traditional caching only works for exact matches. If a user asks "What is ML?" after someone asked "What is machine learning?", traditional caching misses. Vectorcache understands these are the same question and serves the cached response.

Result: 5-10x higher cache hit rates compared to traditional caching.

Support¶

Need help? We're here for you:

📧 Email: support@vectorcache.com
💬 Discord: Join our community
🐛 Issues: GitHub Issues

Ready to reduce your LLM costs? Get started now →