Vectorcache Documentation¶
Welcome to Vectorcache - the intelligent semantic caching layer for LLM applications.
What is Vectorcache?¶
Vectorcache is an AI-powered caching solution that uses semantic similarity to cache and retrieve LLM responses. Instead of exact-match caching, Vectorcache understands the meaning of queries, dramatically improving cache hit rates and reducing API costs.
Key Features¶
- 🎯 Semantic Matching - Uses vector embeddings to match similar queries, not just identical ones
- 💰 Cost Reduction - Save up to 90% on LLM API costs with intelligent caching
- ⚡ Fast Response Times - Serve cached responses in milliseconds instead of seconds
- 🔒 Secure & Private - Your data is encrypted and isolated per project
- 🛠 Easy Integration - Drop-in SDK for JavaScript/TypeScript and Python
- 📊 Analytics Dashboard - Track cache performance, costs, and usage metrics
Quick Example¶
import { VectorcacheClient } from 'vectorcache';
const client = new VectorcacheClient({
apiKey: 'your_api_key',
baseUrl: 'https://api.vectorcache.ai'
});
const result = await client.query({
prompt: 'What is machine learning?',
model: 'gpt-4o',
similarityThreshold: 0.85
});
console.log(`Cache hit: ${result.cache_hit}`);
console.log(`Response: ${result.response}`);
import requests
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"prompt": "What is machine learning?",
"model": "gpt-4o",
"similarity_threshold": 0.85
}
response = requests.post(
"https://api.vectorcache.ai/v1/cache/query",
json=data,
headers=headers
)
result = response.json()
print(f"Cache hit: {result['cache_hit']}")
How It Works¶
- Query Submission - Your application sends a prompt to Vectorcache
- Semantic Search - Vectorcache searches for semantically similar cached queries
- Cache Hit/Miss - Returns cached response if similarity exceeds threshold, otherwise calls your LLM
- Cost Savings - Track savings and performance in real-time dashboard
Getting Started¶
-
Quick Start
Get up and running in 5 minutes
-
Installation
Install the SDK for your platform
-
API Reference
Complete API documentation
-
FAQ
Common questions and answers
Use Cases¶
- Customer Support Chatbots - Cache common questions and responses
- Educational Platforms - Reduce costs for frequently asked educational queries
- Documentation Search - Serve similar documentation queries from cache
- Content Generation - Cache similar content requests
- Data Analysis - Reuse responses for similar analytical queries
Why Vectorcache?¶
Traditional caching only works for exact matches. If a user asks "What is ML?" after someone asked "What is machine learning?", traditional caching misses. Vectorcache understands these are the same question and serves the cached response.
Result: 5-10x higher cache hit rates compared to traditional caching.
Support¶
Need help? We're here for you:
- 📧 Email: support@vectorcache.com
- 💬 Discord: Join our community
- 🐛 Issues: GitHub Issues
Ready to reduce your LLM costs? Get started now →