Cost Optimization¶
Maximize your ROI with Vectorcache by optimizing cost savings.
Understanding Costs¶
LLM API Costs¶
Typical LLM API costs per query:
| Model | Cost per Query | Annual Cost (1000 queries/day) |
|---|---|---|
| GPT-4o | $0.002-0.005 | $730-1,825 |
| GPT-4o-mini | $0.0003-0.001 | $110-365 |
| Claude 3.5 Sonnet | $0.003-0.008 | $1,095-2,920 |
| Claude 3.5 Haiku | $0.0005-0.001 | $183-365 |
Vectorcache Savings¶
With 50% cache hit rate:
| Model | Without Cache | With Vectorcache (50% hits) | Annual Savings |
|---|---|---|---|
| GPT-4o | $1,825/year | $913/year | $912/year |
| Claude 3.5 Sonnet | $2,920/year | $1,460/year | $1,460/year |
Assuming $29/month Vectorcache subscription
Calculating ROI¶
Formula¶
ROI = (Cost Saved - Vectorcache Cost) / Vectorcache Cost × 100%
Cost Saved = (Cache Hits × Avg Query Cost)
Example Calculation¶
interface ROICalculation {
monthlyQueries: number;
cacheHitRate: number;
avgQueryCost: number;
vectorcacheCost: number;
}
function calculateROI(params: ROICalculation): number {
const cacheHits = params.monthlyQueries * params.cacheHitRate;
const costSaved = cacheHits * params.avgQueryCost;
const netSavings = costSaved - params.vectorcacheCost;
const roi = (netSavings / params.vectorcacheCost) * 100;
return roi;
}
// Example: 30,000 queries/month, 50% hit rate, $0.003/query, $29/month
const roi = calculateROI({
monthlyQueries: 30000,
cacheHitRate: 0.50,
avgQueryCost: 0.003,
vectorcacheCost: 29
});
console.log(`ROI: ${roi.toFixed(0)}%`); // ~55% ROI
Maximizing Cache Hit Rate¶
1. Optimize Similarity Threshold¶
Lower threshold = Higher hit rate (but less accuracy):
// Test different thresholds
const thresholds = [0.80, 0.85, 0.90];
for (const threshold of thresholds) {
const metrics = await testThreshold(threshold, testQueries);
console.log(`Threshold ${threshold}: ${metrics.hitRate}% hits, $${metrics.saved}`);
}
// Choose threshold with best ROI
2. Use Context Segmentation¶
Segment cache by use case for better matches:
// Separate caches for different contexts
await client.query({
prompt: 'Reset password',
context: 'customer-support-auth',
model: 'gpt-4o'
});
await client.query({
prompt: 'Reset password',
context: 'admin-documentation',
model: 'gpt-4o'
});
3. Normalize User Input¶
Preprocess queries for better matching:
function normalizePrompt(prompt: string): string {
return prompt
.toLowerCase()
.trim()
.replace(/[^\w\s]/g, ' ') // Remove punctuation
.replace(/\s+/g, ' '); // Normalize whitespace
}
const result = await client.query({
prompt: normalizePrompt(userInput),
model: 'gpt-4o'
});
Cost Tracking¶
Track Actual Savings¶
class CostTracker {
private totalSaved = 0;
private totalQueries = 0;
private cacheHits = 0;
async query(request: CacheQueryRequest) {
const result = await client.query(request);
this.totalQueries++;
if (result.cache_hit) {
this.cacheHits++;
this.totalSaved += result.cost_saved;
}
return result;
}
getMetrics() {
const hitRate = (this.cacheHits / this.totalQueries * 100).toFixed(1);
const avgSaved = this.totalSaved / this.cacheHits;
return {
totalQueries: this.totalQueries,
cacheHits: this.cacheHits,
hitRate: `${hitRate}%`,
totalSaved: `$${this.totalSaved.toFixed(2)}`,
avgSavedPerHit: `$${avgSaved.toFixed(4)}`
};
}
}
Monthly Cost Analysis¶
function analyzeMonthCosts(metrics: CacheMetrics) {
const vectorcacheCost = 29; // Monthly subscription
const costSaved = metrics.totalCostSaved;
const netSavings = costSaved - vectorcacheCost;
const roi = (netSavings / vectorcacheCost) * 100;
return {
vectorcacheCost: `$${vectorcacheCost}`,
llmCostSaved: `$${costSaved.toFixed(2)}`,
netSavings: `$${netSavings.toFixed(2)}`,
roi: `${roi.toFixed(0)}%`,
breakEven: netSavings >= 0
};
}
When Vectorcache Makes Sense¶
✅ Great Fit¶
- High query volume: 10,000+ queries/month
- Repetitive queries: Customer support, FAQs, documentation
- Expensive models: GPT-4o, Claude 3.5 Sonnet
- Similar user questions: Educational platforms, chatbots
⚠️ May Not Be Worth It¶
- Low query volume: <1,000 queries/month
- Unique queries: Each query is completely different
- Cheap models only: Using only GPT-4o-mini or similar
- Real-time data: Queries require latest information
Break-Even Analysis¶
Minimum queries needed to break even (at different hit rates):
| Model | Cost/Query | 30% Hit Rate | 50% Hit Rate | 70% Hit Rate |
|---|---|---|---|---|
| GPT-4o ($0.003) | $0.003 | ~32,000 | ~19,000 | ~14,000 |
| GPT-4o-mini ($0.0005) | $0.0005 | ~193,000 | ~116,000 | ~83,000 |
| Claude 3.5 Sonnet ($0.005) | $0.005 | ~19,000 | ~12,000 | ~8,000 |
Monthly queries needed to break even at $29/month
Cost Optimization Strategies¶
1. Use Cheaper Models for Cache Misses¶
async function smartQuery(prompt: string) {
// Try cache first
const result = await client.query({
prompt,
model: 'gpt-4o',
similarityThreshold: 0.85
});
// If cache miss and query is simple, use cheaper model
if (!result.cache_hit && isSimpleQuery(prompt)) {
return await client.query({
prompt,
model: 'gpt-4o-mini', // Cheaper alternative
similarityThreshold: 0.85
});
}
return result;
}
2. Batch Similar Queries¶
Group related queries to maximize cache hits:
async function batchQuery(prompts: string[]) {
// Group similar prompts
const groups = groupSimilarPrompts(prompts);
// Query each group once, reuse for similar prompts
const results = [];
for (const group of groups) {
const result = await client.query({
prompt: group[0], // Use first as representative
model: 'gpt-4o'
});
// Reuse result for all similar prompts
group.forEach(prompt => {
results.push({ prompt, result });
});
}
return results;
}
3. Implement Smart Caching Logic¶
async function intelligentCache(prompt: string, userContext: any) {
// Don't cache unique/time-sensitive queries
if (isTimeSensitive(prompt) || isUserSpecific(prompt, userContext)) {
return await directLLMCall(prompt);
}
// Use cache for general queries
return await client.query({
prompt,
model: 'gpt-4o',
similarityThreshold: 0.85
});
}
function isTimeSensitive(prompt: string): boolean {
const timeKeywords = ['today', 'now', 'current', 'latest', 'recent'];
return timeKeywords.some(kw => prompt.toLowerCase().includes(kw));
}
Monitoring ROI¶
Dashboard Metrics¶
Track these metrics in your dashboard:
interface ROIDashboard {
period: string;
totalQueries: number;
cacheHits: number;
hitRate: string;
llmCostSaved: string;
vectorcacheCost: string;
netSavings: string;
roi: string;
}
function generateROIDashboard(metrics: CacheMetrics): ROIDashboard {
const vectorcacheCost = 29;
const netSavings = metrics.totalCostSaved - vectorcacheCost;
const roi = (netSavings / vectorcacheCost) * 100;
return {
period: 'This Month',
totalQueries: metrics.totalQueries,
cacheHits: metrics.cacheHits,
hitRate: `${(metrics.cacheHits / metrics.totalQueries * 100).toFixed(1)}%`,
llmCostSaved: `$${metrics.totalCostSaved.toFixed(2)}`,
vectorcacheCost: `$${vectorcacheCost}`,
netSavings: `$${netSavings.toFixed(2)}`,
roi: `${roi.toFixed(0)}%`
};
}
Alerts for Poor ROI¶
function checkROI(metrics: CacheMetrics) {
const vectorcacheCost = 29;
const netSavings = metrics.totalCostSaved - vectorcacheCost;
if (netSavings < 0 && metrics.totalQueries > 1000) {
alert('Negative ROI detected', {
netSavings: `$${netSavings.toFixed(2)}`,
suggestion: 'Consider lowering similarity threshold or increasing query volume'
});
}
}
Real-World Examples¶
Example 1: Customer Support Chatbot¶
Scenario: - 50,000 queries/month - 60% cache hit rate - GPT-4o ($0.003/query) - $29/month Vectorcache
Calculation:
Cache hits: 50,000 × 0.60 = 30,000
Cost saved: 30,000 × $0.003 = $90
Net savings: $90 - $29 = $61/month
ROI: ($61 / $29) × 100 = 210%
Annual savings: $732
Example 2: Educational Platform¶
Scenario: - 100,000 queries/month - 70% cache hit rate (high due to repetitive educational questions) - GPT-4o ($0.003/query) - $29/month Vectorcache
Calculation:
Cache hits: 100,000 × 0.70 = 70,000
Cost saved: 70,000 × $0.003 = $210
Net savings: $210 - $29 = $181/month
ROI: ($181 / $29) × 100 = 624%
Annual savings: $2,172
Best Practices¶
- Track metrics religiously - Know your exact hit rate and cost savings
- Test thresholds - Find the sweet spot for your use case
- Segment caches - Use context for better organization
- Monitor ROI - Alert when ROI drops below acceptable level
- Optimize continuously - Adjust based on real data
Next Steps¶
- Best Practices - Production tips
- Similarity Tuning - Optimize hit rate
- API Reference - API documentation