Cost Optimization¶

Maximize your ROI with Vectorcache by optimizing cost savings.

Understanding Costs¶

LLM API Costs¶

Typical LLM API costs per query:

Model	Cost per Query	Annual Cost (1000 queries/day)
GPT-4o	$0.002-0.005	$730-1,825
GPT-4o-mini	$0.0003-0.001	$110-365
Claude 3.5 Sonnet	$0.003-0.008	$1,095-2,920
Claude 3.5 Haiku	$0.0005-0.001	$183-365

Vectorcache Savings¶

With 50% cache hit rate:

Model	Without Cache	With Vectorcache (50% hits)	Annual Savings
GPT-4o	$1,825/year	$913/year	$912/year
Claude 3.5 Sonnet	$2,920/year	$1,460/year	$1,460/year

Assuming $29/month Vectorcache subscription

Calculating ROI¶

Formula¶

ROI = (Cost Saved - Vectorcache Cost) / Vectorcache Cost × 100%

Cost Saved = (Cache Hits × Avg Query Cost)

Example Calculation¶

interface ROICalculation {
  monthlyQueries: number;
  cacheHitRate: number;
  avgQueryCost: number;
  vectorcacheCost: number;
}

function calculateROI(params: ROICalculation): number {
  const cacheHits = params.monthlyQueries * params.cacheHitRate;
  const costSaved = cacheHits * params.avgQueryCost;
  const netSavings = costSaved - params.vectorcacheCost;
  const roi = (netSavings / params.vectorcacheCost) * 100;

  return roi;
}

// Example: 30,000 queries/month, 50% hit rate, $0.003/query, $29/month
const roi = calculateROI({
  monthlyQueries: 30000,
  cacheHitRate: 0.50,
  avgQueryCost: 0.003,
  vectorcacheCost: 29
});

console.log(`ROI: ${roi.toFixed(0)}%`); // ~55% ROI

Maximizing Cache Hit Rate¶

1. Optimize Similarity Threshold¶

Lower threshold = Higher hit rate (but less accuracy):

// Test different thresholds
const thresholds = [0.80, 0.85, 0.90];

for (const threshold of thresholds) {
  const metrics = await testThreshold(threshold, testQueries);
  console.log(`Threshold ${threshold}: ${metrics.hitRate}% hits, $${metrics.saved}`);
}

// Choose threshold with best ROI

2. Use Context Segmentation¶

Segment cache by use case for better matches:

// Separate caches for different contexts
await client.query({
  prompt: 'Reset password',
  context: 'customer-support-auth',
  model: 'gpt-4o'
});

await client.query({
  prompt: 'Reset password',
  context: 'admin-documentation',
  model: 'gpt-4o'
});

3. Normalize User Input¶

Preprocess queries for better matching:

function normalizePrompt(prompt: string): string {
  return prompt
    .toLowerCase()
    .trim()
    .replace(/[^\w\s]/g, ' ')  // Remove punctuation
    .replace(/\s+/g, ' ');     // Normalize whitespace
}

const result = await client.query({
  prompt: normalizePrompt(userInput),
  model: 'gpt-4o'
});

Cost Tracking¶

Track Actual Savings¶

class CostTracker {
  private totalSaved = 0;
  private totalQueries = 0;
  private cacheHits = 0;

  async query(request: CacheQueryRequest) {
    const result = await client.query(request);

    this.totalQueries++;
    if (result.cache_hit) {
      this.cacheHits++;
      this.totalSaved += result.cost_saved;
    }

    return result;
  }

  getMetrics() {
    const hitRate = (this.cacheHits / this.totalQueries * 100).toFixed(1);
    const avgSaved = this.totalSaved / this.cacheHits;

    return {
      totalQueries: this.totalQueries,
      cacheHits: this.cacheHits,
      hitRate: `${hitRate}%`,
      totalSaved: `$${this.totalSaved.toFixed(2)}`,
      avgSavedPerHit: `$${avgSaved.toFixed(4)}`
    };
  }
}

Monthly Cost Analysis¶

function analyzeMonthCosts(metrics: CacheMetrics) {
  const vectorcacheCost = 29; // Monthly subscription
  const costSaved = metrics.totalCostSaved;
  const netSavings = costSaved - vectorcacheCost;
  const roi = (netSavings / vectorcacheCost) * 100;

  return {
    vectorcacheCost: `$${vectorcacheCost}`,
    llmCostSaved: `$${costSaved.toFixed(2)}`,
    netSavings: `$${netSavings.toFixed(2)}`,
    roi: `${roi.toFixed(0)}%`,
    breakEven: netSavings >= 0
  };
}

When Vectorcache Makes Sense¶

✅ Great Fit¶

High query volume: 10,000+ queries/month
Repetitive queries: Customer support, FAQs, documentation
Expensive models: GPT-4o, Claude 3.5 Sonnet
Similar user questions: Educational platforms, chatbots

⚠️ May Not Be Worth It¶

Low query volume: <1,000 queries/month
Unique queries: Each query is completely different
Cheap models only: Using only GPT-4o-mini or similar
Real-time data: Queries require latest information

Break-Even Analysis¶

Minimum queries needed to break even (at different hit rates):

Model	Cost/Query	30% Hit Rate	50% Hit Rate	70% Hit Rate
GPT-4o ($0.003)	$0.003	~32,000	~19,000	~14,000
GPT-4o-mini ($0.0005)	$0.0005	~193,000	~116,000	~83,000
Claude 3.5 Sonnet ($0.005)	$0.005	~19,000	~12,000	~8,000

Monthly queries needed to break even at $29/month

Cost Optimization Strategies¶

1. Use Cheaper Models for Cache Misses¶

async function smartQuery(prompt: string) {
  // Try cache first
  const result = await client.query({
    prompt,
    model: 'gpt-4o',
    similarityThreshold: 0.85
  });

  // If cache miss and query is simple, use cheaper model
  if (!result.cache_hit && isSimpleQuery(prompt)) {
    return await client.query({
      prompt,
      model: 'gpt-4o-mini', // Cheaper alternative
      similarityThreshold: 0.85
    });
  }

  return result;
}

2. Batch Similar Queries¶

Group related queries to maximize cache hits:

async function batchQuery(prompts: string[]) {
  // Group similar prompts
  const groups = groupSimilarPrompts(prompts);

  // Query each group once, reuse for similar prompts
  const results = [];
  for (const group of groups) {
    const result = await client.query({
      prompt: group[0], // Use first as representative
      model: 'gpt-4o'
    });

    // Reuse result for all similar prompts
    group.forEach(prompt => {
      results.push({ prompt, result });
    });
  }

  return results;
}

3. Implement Smart Caching Logic¶

async function intelligentCache(prompt: string, userContext: any) {
  // Don't cache unique/time-sensitive queries
  if (isTimeSensitive(prompt) || isUserSpecific(prompt, userContext)) {
    return await directLLMCall(prompt);
  }

  // Use cache for general queries
  return await client.query({
    prompt,
    model: 'gpt-4o',
    similarityThreshold: 0.85
  });
}

function isTimeSensitive(prompt: string): boolean {
  const timeKeywords = ['today', 'now', 'current', 'latest', 'recent'];
  return timeKeywords.some(kw => prompt.toLowerCase().includes(kw));
}

Monitoring ROI¶

Dashboard Metrics¶

Track these metrics in your dashboard:

interface ROIDashboard {
  period: string;
  totalQueries: number;
  cacheHits: number;
  hitRate: string;
  llmCostSaved: string;
  vectorcacheCost: string;
  netSavings: string;
  roi: string;
}

function generateROIDashboard(metrics: CacheMetrics): ROIDashboard {
  const vectorcacheCost = 29;
  const netSavings = metrics.totalCostSaved - vectorcacheCost;
  const roi = (netSavings / vectorcacheCost) * 100;

  return {
    period: 'This Month',
    totalQueries: metrics.totalQueries,
    cacheHits: metrics.cacheHits,
    hitRate: `${(metrics.cacheHits / metrics.totalQueries * 100).toFixed(1)}%`,
    llmCostSaved: `$${metrics.totalCostSaved.toFixed(2)}`,
    vectorcacheCost: `$${vectorcacheCost}`,
    netSavings: `$${netSavings.toFixed(2)}`,
    roi: `${roi.toFixed(0)}%`
  };
}

Alerts for Poor ROI¶

function checkROI(metrics: CacheMetrics) {
  const vectorcacheCost = 29;
  const netSavings = metrics.totalCostSaved - vectorcacheCost;

  if (netSavings < 0 && metrics.totalQueries > 1000) {
    alert('Negative ROI detected', {
      netSavings: `$${netSavings.toFixed(2)}`,
      suggestion: 'Consider lowering similarity threshold or increasing query volume'
    });
  }
}

Real-World Examples¶

Example 1: Customer Support Chatbot¶

Scenario: - 50,000 queries/month - 60% cache hit rate - GPT-4o ($0.003/query) - $29/month Vectorcache

Calculation:

Cache hits: 50,000 × 0.60 = 30,000
Cost saved: 30,000 × $0.003 = $90
Net savings: $90 - $29 = $61/month
ROI: ($61 / $29) × 100 = 210%
Annual savings: $732

Example 2: Educational Platform¶

Scenario: - 100,000 queries/month - 70% cache hit rate (high due to repetitive educational questions) - GPT-4o ($0.003/query) - $29/month Vectorcache

Calculation:

Cache hits: 100,000 × 0.70 = 70,000
Cost saved: 70,000 × $0.003 = $210
Net savings: $210 - $29 = $181/month
ROI: ($181 / $29) × 100 = 624%
Annual savings: $2,172

Best Practices¶

Track metrics religiously - Know your exact hit rate and cost savings
Test thresholds - Find the sweet spot for your use case
Segment caches - Use context for better organization
Monitor ROI - Alert when ROI drops below acceptable level
Optimize continuously - Adjust based on real data

Next Steps¶

Best Practices - Production tips
Similarity Tuning - Optimize hit rate
API Reference - API documentation